Journal of Information Technology Review

DLINE Journals portal

Home

New Journals

Browse Journals

Journal Prices

For Authors

Print ISSN: 0976-898X
Online ISSN: 0976-8998

About JITR
	DLINE Portal Home Home Aims & Scope Editorial Board Current Issue Next Issue Previous Issue Sample Issue Upcoming Conferences Self-archiving policy Alert Services Be a Reviewer Publisher Paper Submission Subscription Contact us

How To Order
	Order Online Price Information Request for Complimentary Print Copy

For Authors
	Guidelines for Contributors Online Submission Call for Papers Author Rights

RELATED JOURNALS

Journal of Digital Information Management (JDIM)

International Journal of Computational Linguistics Research (IJCL)

International Journal of Web Application (IJWA)

Journal of Information Technology Review

Automatic Indexation of Large Text and Datasets

Mohamed Salim El Bazzi, Abdelatif Ennaji, Driss Mammass
IRF-SIC Laboratory Ibn Zohr University Agadir, Morocco., LITIS Laboratory University of Rouen France

Abstract: When the text corpus is huge, it is somewhat difficult to effectively manage the collection with good indexing. When the text has complex datasets, the classification and indexing is a challenging issue. We in this exercise, has proposed an efficiently automatic indexing system for large datasets. We have also tested its effectiveness in large collection of real texts. To make evaluation, we have applied KNN and SVM classifiers. The proposed solution easily outperforms the traditional indexing pattern based on TFIDF system. Even the evaluation was carried out in the environment of Arabic language, it is applied to any language.

Keywords: ConIText, Text Mining, Indexation, Context, Data Analysis, Classification Automatic Indexation of Large Text and Datasets

DOI:https://doi.org/10.6025/jitr/2020/11/3/83-93

Full_Text PDF 834 KB Download: 91 times

References:

[1] Al-Anzi, F.S., AbuZeina, D. (2018). Beyond vector space model for hierarchical arabic text classification: A markov chain approach. Information Processing & Management, 54(1), 105–115.
[2] Alami, N., Meknassi, M., Ouatik, S.A., Ennahnahi, N. (2016). Impact of stemming on arabic text summarization. In: 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt). p 338–343. IEEE.
[3] Bahassine, S., Madani, A., Al-Sarem, M., Kissi, M. (2018). Feature selection using an improved chi-square for arabic text classification. Journal of King Saud University- Computer and Information Sciences.
[4] Dhar, A., Dash, N.S., Roy, K. (2018). Application of tf-idf feature for categorizing documents of online bangla web text corpus. In: Intelligent Engineering Informatics, p .51–59. Springer.
[5] Dhar, A., Dash, N.S., Roy, K. (2018). Categorization of bangla web text documents based on tf-idf-icf text analysis scheme.
In: Annual Convention of the Computer Society of India. p 477–484. Springer.
[6] El Bazzi, M.S., Mammass, D., Ennaji, A., Zaki, T. (2018). Toward a complex system for context discovery to index arabic documents. JCP, 13(8), 955–962.
[7] Ganiz, M.C., Tutkan, M., Akyokuº, S. (2015). A novel classifier based on meaning for text classification. In: 2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA). p 1–5. IEEE (2015)
[8] Goncalves, C.A., Iglesias, E.L., Borrajo, L., Camacho, R., Vieira, A.S., Goncalves, C.T. (2019). Comparative study of feature selection methods for medical full text classification In: International Work-Conference on Bioinformatics and Biomedical Engineering. p 550–560. Springer.
[9] Hamza, A., En-Nahnahi, N., Zidani, K. A., Ouatik, S. E. A. (2019). An arabic question classification method based on new taxonomy and continuous distributed representation of words. Journal of King Saud University-Computer and Information Sciences.
[10] Herskovic, J. R., Cohen, T., Subramanian, D., Iyengar, M. S., Smith, J.W., Bernstam, E.V. (2011). Medrank: Using graph based concept ranking to index biomedical texts. International journal of medical informatics, 80 (6), 431–441.
[11] Kumar, B.S., Ravi, V. (2017). Text document classification with pca and one-class svm. In: Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications. p 107–115. Springer.
[12] Labani, M., Moradi, P., Ahmadizar, F., Jalili, M. (2018). A novel multivariate filter method for feature selection in text classification problems. Engineering Applications of Artificial Intelligence, 70, 25–37.
[13] Mohamed, R.,Watada, J. (2010). An evidential reasoning based lsa approach to document classification for knowledge acquisition. In: 2010 IEEE International Conference on Industrial Engineering and Engineering Management. p 1092– 1096. IEEE.
[14] Mohammad, A.H., Alwada’n, T., Al-Momani, O. (2016). Arabic text categorization using support vector machine, naive bayes and neural network. GSTF Journal on Computing (JoC), 5(1), 108 (2016)
[15] Patel, D.B., Shah, S., Chhinkaniwala, H. R. (2019). Fuzzy logic based multi document summarization with improved sentence scoring and redundancy removal technique. Expert Systems with Applications.
[16] Saiyad, N. Y., Prajapati, H. B., Dabhi, V. K. (2016). A survey of document clustering using semantic approach. In: 2016
International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT). p 2555–2562. IEEE.
[17] Sangaiah, A. K., Fakhry, A. E., Abdel-Basset, M., El-henawy, I. (2018). Arabic text clustering using improved clustering algorithms with dimensionality reduction. Cluster Computing, p 1–15.
[18] Taspýnar, M., Ganiz, M. C., Acarman, T. (2017). A feature based simple machine learning approach with word embeddings to named entity recognition on tweets. In: International Conference on Applications of Natural Language to Information Systems. p 254–259. Springer.

DLINE Journals portal