Home| Contact Us| New Journals| Browse Journals| Journal Prices| For Authors|

Print ISSN: 0976-3503
Online ISSN:
0976-2930


  About JET
  DLINE Portal Home
Home
Aims & Scope
Editorial Board
Current Issue
Next Issue
Previous Issue
Sample Issue
Upcoming Conferences
Self-archiving policy
Alert Services
Be a Reviewer
Publisher
Paper Submission
Subscription
Contact us
 
  How To Order
  Order Online
Price Information
Request for Complimentary
Print Copy
 
  For Authors
  Guidelines for Contributors
Online Submission
Call for Papers
Author Rights
 
 
RELATED JOURNALS
Journal of Digital Information Management (JDIM)
International Journal of Computational Linguistics Research (IJCL)
International Journal of Web Application (IJWA)

 

 
Journal of E-Technology

An Enhanced Ensemble Classifier for Hate and Offensive Content Identification
Rajalakshmi R, Yashwant Reddy B
School of Computing Science and Engineering Vellore Institute of Technology Chennai, India
Abstract: Recent advancements in the Internet technologies have made a tremendous change in the social media. Hate Speech is an attack that is directed towards a group of people based on their religion, gender, colour etc. The offensive content in social media poses a threat to democracy. As these kind of hate speech and offensive content on the web increases day by day, manually monitoring or controlling such hate crimes is a highly challenging task. Most of the existing methodologies focus on English language tweets and only limited work has been reported for Hindi and German language posts. Also, the importance of feature selection methods is not explored much for this problem. In this research work, an enhanced ensemble classifier approach is proposed to identify hate and offensive content posted in Hindi or German languages. In the proposed approach, CHI square based feature selection method is combined with a Random Forest Classifier to classify the tweets. This work was submitted to Hate and Offensive Content Identification (HASOC) task@FIRE2019. From the various experiments conducted on the released HASOC dataset, it is shown that an accuracy of 81% and 64% was achieved on German and Hindi language tweets.
Keywords: Hate Speech Identification, Ensemble Classifier, Chi Square Feature Selection, German, Hindi, Social Media An Enhanced Ensemble Classifier for Hate and Offensive Content Identification
DOI:https://doi.org/10.6025/jet/2020/11/2/70-76
Full_Text   PDF 43 KB   Download:   314  times
References:

[1] Burnap, P., Williams, M. L. (2015). Cyber hate speech on twitter: An appication of machine classification and statistical modeling for policy and decision making. In: Policy and Internet, Vol.7.2, p 223–242.

[2] Kwok, I., Wang, Y. (2013). Locate the hate: Detecting tweets against blacks. In: Twenty-Seventh AAAI Conference on Artificial Intelligence, p 1621-1622.

[3] de Gibert, O., Perez, N., Garc’ia-Pablos, A., Cuadros, M. (2018). Hate Speech Dataset from a White Supremacy Forum. In: 2nd Workshop on Abusive Language Online, p 11-20 (2018).

[4] Warner, W., Hirschberg, J. (2012). Detecting Hate Speech on the World Wide Web. In: Proceedings of the Second Workshop on Language in Social Media, p 19-26.

[5] Greevy, E., Smeaton, A. F. (2004). Classifying racist texts using a support vector machine. In: Proceedings of the 27th annual international conference on Research and development in information retrieval SIGIR ’04, p 468 – 469.

[6] Davidson, T., Warmsley, D., Macy, M., Weber, I. (2017). Automated Hate Speech Detection and the Problem of Offensive Language. In: Proceedings of the Eleventh International AAAI Conference on Web and Social Media (ICWSM 2017), p 512-515.

[7] Nobata, C., Tetreault, J., Thomas, A., Mehdad, Y.,Chang, Y. (2016). Abusive Language Detection in Online User Content. In: Proceedings of the 25th International Conference on World Wide Web (WWW 2016), p 145-153.

[8] Gitari, D., Zuping, Z., Damien, H., Long, J. (2015). A Lexicon-based Ap oach for Hate Speech Detection. In: International Journal of Multimedia and Ubiquitous Engineering, vol.10.4, p 215-230.

[9] Hall, M., Smith L. (1998). Practical feature subset selection for machine learning. In: Proceedings of the 21st Australasian Conference on Computer Science, p 181-191. 

[10] Wu, H., Gu, X. (2017). Balancing Between Over-Weighting and Under-Weighting in Supervised Term Weighting. In: International Journal of Information Processing and Management, vol.53, p 547-557.

[11] Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W.P. (2002). SMOTE: Synthetic Minority Over-Sampling Technique. In: Journal of Artificial Intelligence Research, vol.16, p 321-357.

[12] Han, H., Wang, W. Y., Mao, B. H. (2005). Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: International Conference on Intelligent Computing, p 878-887.

[13] He, H., Garica, E. A. (2009). Learning from imbalanced data. In: IEEE Transactions On Knowledge and Data Engineering, vol. 21, p 1263-1284.

[14] Rajalakshmi, R., Agrawal, R. (2017). Borrowing Likeliness Ranking based on Relevance Factor, In: Proceedings of the Fourth ACM IKDD Conferences on Data Sciences, CODS 2017, India, p 12:1–12:2

[15] Rajalakshmi, R., Xaviar, S. (2017). Experimental Study of Feature Weighting Techniques for URL Based Webpage Classification, Procedia Computer Science, Vol. 115, p 218-225. 

[16] Sivakumar, S., Rajalakshmi, R. (2019). Comparative evaluation of various feature weighting methods on movie reviews, Advances in Intelligent Systems and Computing, Vol-711, p 721-730.

[17] Rajalakshmi, R., Aravindan, C. (2018). Naive Bayes ap oach for URL classification with supervised feature selection and rejection framework, Computational Intelligence, 34(1), p 363-396.

[18] Rajalakshmi, R., Aravindan, C. (2018). An Effective and Discriminative Feature Learning for URL Based Web Page Classification, 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Miyazaki, Japan, 2018, p 1374-1379.

[19] Rajalakshmi, R., Ramraj, S., Ramesh Kannan, R. (2019). Transfer learning ap oach for identification of malicious domain names, Communications in Computer and Information Science, Vol. 969, p 656-666.

[20] Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R. (2019). SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval). In: Proceedings of the 13th International Workshop on Semantic Evaluation, p 75-86.

[21] Wiegand, M., Siegel, M., Rup nhofer, J. (2018). Overview of the germeval 2018 shared task on the identification of offensive language.

[22] Kumar, R., Ojha, A. K., Malmasi, S., Zampieri, M. (2018). Benchmarking Aggression Identification in Social Media. In: Proceedings of TRAC.

[23] Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R. (2019). Predicting the Type and Target of Offensive Posts in Social Media. In: Proceedings of NAACL.

[24] Johnson, Melvin., Schuster, Mike., Le, Quoc V., Krikun, Maxim., Wu, Yonghui., Chen, Zhifeng., Thorat, Nikhil., Vi’egas, Fernanda., Wattenberg, Martin., Corrado, Greg., Hughes, Macduff., Dean, Jeffrey. (2017). Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation, Vol 5, p 339—351.

[25] Modha, S., Mandl, T., Majumder, P., Patel, D. (2019). Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo European Languages. In: Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation (December).


Home | Aim & Scope | Editorial Board | Author Guidelines | Publisher | Subscription | Previous Issue | Contact Us |Upcoming Conferences|Sample Issues|Library Recommendation Form|

 

Copyright © 2011 dline.info