Journal of Digital Information Management


Vol No. 19 ,Issue No. 3 2021

Efficient Measuring of Readability to Improve Documents Accessibility for Arabic Language Learners
Sadik Bessou, Ghozlane Chenni
Department of Computer Science, Faculty of Sciences University of Ferhat Abbas Sétif 1 Algeria
Abstract: This paper presents an approach based on supervised machine learning methods to build a classifier that can identify text complexity in order to present Arabic language learners with texts suitable to their levels. The approach is based on machine learning classification methods to discriminate between the different levels of difficulty in reading and understanding a text. Several models were trained on a large corpus mined from online Arabic websites and manually annotated. The model uses both Count and TF-IDF representations and applies five machine learning algorithms; Multinomial Naïve Bayes, Bernoulli Naïve Bayes, Logistic Regression, Support Vector Machine and Random Forest, using unigrams and bigrams features. With the goal of extracting the text complexity, the problem is usually addressed by formulating the ‘level identification’ as a classification task. Experimental results showed that n-gram features could be indicative of the reading level of a text and could substantially improve performance, and showed that SVM and Multinomial Naïve Bayes are the most accurate in predicting the complexity level. Best results were achieved using TF-IDF Vectors trained by a combination of word-based unigrams and bigrams with an overall accuracy of 87.14% over four classes of complexity.
Keywords: Text Readability, Lexical Complexity, Machine Learning, Natural Language Processing, Arabic Efficient Measuring of Readability to Improve Documents Accessibility for Arabic Language Learners
DOI:https://doi.org/10.6025/jdim/2021/19/3/75-82
Full_Text   PDF 512 KB   Download:   203  times
References:

[1] Zambrano, J. O., Tapia, E. V. (2018). Reading comprehension in university texts: the metrics of lexical complexity in corpus analysis in Spanish. In: International Conference on Computer and Communication Engineering (p. 111-123). Springer, Cham, (October).
[2] Zhang, Lixiao, Liu, Zaiying., Ni, Jun. (2013). Feature- based assessment of text readability. In: 2013 Seventh International Conference on Internet Computing for Engineering and Science, p. 51-54. IEEE.
[3] Silpa, K. S., Irshad, M. (2018). Lexical Simplification of Complex Scientific Terms. In: 2018 International Conference on Emerging Trends and Innovations In Engineering and Technological Research (ICETIETR), p. 1- 5. IEEE.
[4] Wibowo, Satrio, Muhammad., Romadhony, Ade., Siti Sa’adah. (2019). Lexical and Syntactic Simplification for Indonesian Text. In: 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), p. 64-68. IEEE.
[5] Nandhini, Kumaresh., Ramakrishnan Balasundaram, Sadhu. (2012). Significance of learner dependent features for improving text readability using extractive summarization. In: 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), p. 1-5. IEEE.
[6] Daowadung, Patcharanut., Yaw-Huei Chen. (2012). Stop word in readability assessment of Thai text. In: 2012 IEEE 12th International Conference on Advanced Learning Technologies, p. 497-499. IEEE.
[7] Luong, An-Vinh, Nguyen, Diep., DienDinh. (2017). Examining the text-length factor in evaluating the readability of literary texts in Vietnamese textbooks. In: 2017 9th International Conference on Knowledge and Systems Engineering (KSE), p. 36-41. IEEE.
[8] Luong, An-Vinh, Nguyen, Diep., DienDinh. (2018). Assessing the Readability of Literary Texts in Vietnamese Textbooks. In: 2018 5th NAFOSTED Conference on Information and Computer Science (NICS), p. 231-236. IEEE.
[9] Mukherjee, Partha, Leroy, Gondy., Kauchak, David. (2018). Using Lexical Chains to Identify Text Difficulty: A Corpus Statistics and Classification Study. IEEE journal of biomedical and health informatics, 23 (5) 2164-2173.
[10] Naderi, Babak, SalarMohtaj, Karan, Karan., Möller, Sebastian. (2019). Automated Text Readability Assessment for German Language: A Quality of Experience Approach. In: 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), p. 1-3. IEEE.
[11] Narasinh, Vishwaas. (2019). Readability Analysis of Kannada Language. In: 2019 1st International Conference on Advances in Information Technology (ICAIT),p. 45-49. IEEE.
[12] Zhang, Yusha, Lin, Nankai., Jiang, Shengyi. (2019). A Study on Syntactic Complexity and Text Readability of ASEAN English News. In: 2019 International Conference on Asian Language Processing (IALP), p. 313-318. IEEE.
[13] Al-Ajlan, Amani, A., Hend, S., Al-Khalifa, Abdul Malik S., Al-Salman. (2008). Towards the development of an automatic readability measurements for Arabic language. In: 2008 Third International Conference on Digital Information Management, p. 506-511. IEEE.
[14] Saddiki, Hind, Bouzoubaa, Karim., Cavalli-Sforza, Violetta. (2015). Text readability for Arabic as a foreign language. In: 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), p. 1-8. IEEE.
[15] Wilkens, R., DallaVecchia, A., Boito, M. Z., Padró, M., Villavicencio, A. (2014). Size does not matter. Frequency does. A study of features for measuring lexical complexity. In: Ibero-American Conference on Artificial Intelligence (p. 129-140). Springer, Cham, (November).
[16] Wang, Huiping., Yang, Lijiao., Xiao, Huimin. (2019). Construction of Quantitative Index System of Vocabulary Difficulty in Chinese Grade Reading. In: 2019 International Conference on Asian Language Processing (IALP), p. 480-486. IEEE.
[17] Khoja, Shereen., Garside, Roger. (1999). Stemming arabic text. Lancaster, UK, Computing Department, Lancaster University.
[18] Larkey, Leah, S., Margaret, E., Connell. (2001). Arabic information retrieval at UMass in TREC-10. In TREC. 2001.
[19] Bessou, Sadik., Touahria, Mohamed. (2014). An accuracy- enhanced stemming algorithm for Arabic information retrieval. Neural Network World, 24 (2) 117.
[20] Schütze, Hinrich., Christopher, D., Manning, Raghavan, Prabhakar. (2008). Introduction to information retrieval. Vol. 39. Cambridge: Cambridge University Press.