An Enhanced Ensemble Classifier for Hate and Offensive Content Identification

An Enhanced Ensemble Classifier for Hate and Offensive Content Identification Journal of E - Technology Rajalakshmi R, Yashwant Reddy B 11 2 2020 https://doi.org/10.6025/jet/2020/11/2/70-76 http://www.dline.info/jet/fulltext/v11n2/jetv11n2_5.pdf Recent advancements in the Internet technologies have made a tremendous change in the social media. Hate Speech is an attack that is directed towards a group of people based on their religion, gender, colour etc. The offensive content in social media poses a threat to democracy. As these kind of hate speech and offensive content on the web increases day by day, manually monitoring or controlling such hate crimes is a highly challenging task. Most of the existing methodologies focus on English language tweets and only limited work has been reported for Hindi and German language posts. Also, the importance of feature selection methods is not explored much for this problem. In this research work, an enhanced ensemble classifier approach is proposed to identify hate and offensive content posted in Hindi or German languages. In the proposed approach, CHI square based feature selection method is combined with a Random Forest Classifier to classify the tweets. This work was submitted to Hate and Offensive Content Identification (HASOC) task@FIRE2019. From the various experiments conducted on the released HASOC dataset, it is shown that an accuracy of 81% and 64% was achieved on German and Hindi language tweets.