Volume 11 Number 2 May 2020


Visualization-based Machine Learning Model for Relational Databases

Dihia Lanasri, Carlos Ordonez, Ladjel Bellatreche, Selma Khouri

https://doi.org/10.6025/jet/2020/11/2/49-53

Abstract Transforming several relational tables into a data set to be used as input to a Machine Learning (ML) model is a complex task since the data scientist has to derive many intermediate tables, queries and views in a disorganized manner. This process creates many SQL queries to facilitate the exploration task of the data scientist. Because the provenance of the... Read More


A Pre-trained BERT Model for Arabic Author Profiling

Chiyu Zhang, Muhammad Abdul-Mageed

https://doi.org/10.6025/jet/2020/11/2/54-59

Abstract We report our models for detecting age, language variety, and gender from social media data in the context of the Arabic author profiling and deception detection shared task (APDA) [32].We build simple models based on pre-trained bidirectional encoders from transformers (BERT). We first fine-tune the pre-trained BERT model on each of the three datasets with shared task released data. Then we augment shared... Read More


Author Profiling in Arabic Tweets: An Approach based on Multi-Classification with Word and Character Features

Yutong Sun, Hui Ning, Kaisheng Chen, Leilei Kong, Yunpeng Yang, Jiexi Wang, Haoliang Qi

https://doi.org/10.6025/jet/2020/11/2/60-63

Abstract This paper focuses on the author profiling task published in the FIRE 2019 (Forum for Information Retrieval Evaluation), which includes automatic identification of the age, gender, and language variety of Arabic tweets. We think the author profiling task as a multi-Classification problem. We have used word and character based on TFIDF features, learned the logistic regression classifier to predict the labels. In the... Read More


An Ensemble Learning-based Model for Classification of Insincere Question

Zhongyuan Han, Jiaming Gao, Huilin Sun, Ruifeng Liu, Chengzhe Huang, Leilei Kong, Haoliang Qi

https://doi.org/10.6025/jet/2020/11/2/64-69

Abstract This paper describes the method for the Classification of Insincere Question(CIQ) in FIRE 2019. In this evaluation, we use an ensemble learning method to unite multiple classification models, including logistic regression model, support vector machine, Naive Bayes, decision tree, K-Nearest Neighbor, Random Forest. The result shows that our classification achieves the 67.32% accuracy rate(rank top 1) on the test dataset.... Read More


An Enhanced Ensemble Classifier for Hate and Offensive Content Identification

Rajalakshmi R, Yashwant Reddy B

https://doi.org/10.6025/jet/2020/11/2/70-76

Abstract Recent advancements in the Internet technologies have made a tremendous change in the social media. Hate Speech is an attack that is directed towards a group of people based on their religion, gender, colour etc. The offensive content in social media poses a threat to democracy. As these kind of hate speech and offensive content on the web increases day by day, manually... Read More