Home| Contact Us| New Journals| Browse Journals| Journal Prices| For Authors|

Print ISSN: 0976-898X
Online ISSN:
0976-8998


  About JITR
  DLINE Portal Home
Home
Aims & Scope
Editorial Board
Current Issue
Next Issue
Previous Issue
Sample Issue
Upcoming Conferences
Self-archiving policy
Alert Services
Be a Reviewer
Publisher
Paper Submission
Subscription
Contact us
 
  How To Order
  Order Online
Price Information
Request for Complimentary
Print Copy
 
  For Authors
  Guidelines for Contributors
Online Submission
Call for Papers
Author Rights
 
 
RELATED JOURNALS
Journal of Digital Information Management (JDIM)
International Journal of Computational Linguistics Research (IJCL)
International Journal of Web Application (IJWA)

 

 
Journal of Information Technology Review
 

Automatic Indexation of Large Text and Datasets
Mohamed Salim El Bazzi, Abdelatif Ennaji, Driss Mammass
IRF-SIC Laboratory Ibn Zohr University Agadir, Morocco., LITIS Laboratory University of Rouen France
Abstract: When the text corpus is huge, it is somewhat difficult to effectively manage the collection with good indexing. When the text has complex datasets, the classification and indexing is a challenging issue. We in this exercise, has proposed an efficiently automatic indexing system for large datasets. We have also tested its effectiveness in large collection of real texts. To make evaluation, we have applied KNN and SVM classifiers. The proposed solution easily outperforms the traditional indexing pattern based on TFIDF system. Even the evaluation was carried out in the environment of Arabic language, it is applied to any language.
Keywords: ConIText, Text Mining, Indexation, Context, Data Analysis, Classification Automatic Indexation of Large Text and Datasets
DOI:https://doi.org/10.6025/jitr/2020/11/3/83-93
Full_Text   PDF 834 KB   Download:   91  times
References:

[1] Al-Anzi, F.S., AbuZeina, D. (2018). Beyond vector space model for hierarchical arabic text classification: A markov chain approach. Information Processing & Management, 54(1), 105–115.
[2] Alami, N., Meknassi, M., Ouatik, S.A., Ennahnahi, N. (2016). Impact of stemming on arabic text summarization. In: 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt). p 338–343. IEEE.
[3] Bahassine, S., Madani, A., Al-Sarem, M., Kissi, M. (2018). Feature  selection using an improved chi-square for arabic text classification. Journal of King Saud University- Computer and Information Sciences.
[4] Dhar, A., Dash, N.S., Roy, K. (2018). Application of tf-idf feature for categorizing documents of online bangla web text corpus. In: Intelligent Engineering Informatics, p .51–59. Springer.
[5] Dhar, A., Dash, N.S., Roy, K. (2018). Categorization of bangla web text documents based on tf-idf-icf text analysis scheme.
In: Annual Convention of the Computer Society of India. p 477–484.  Springer.
[6] El Bazzi, M.S., Mammass, D., Ennaji, A., Zaki, T. (2018). Toward a complex system for context discovery to index arabic documents. JCP, 13(8), 955–962.
[7] Ganiz, M.C., Tutkan, M., Akyokuº, S. (2015). A novel classifier based on meaning for text classification. In: 2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA). p 1–5. IEEE (2015)
[8] Goncalves, C.A., Iglesias, E.L., Borrajo, L., Camacho, R., Vieira, A.S., Goncalves, C.T. (2019). Comparative study of feature selection methods for medical full text classification In: International Work-Conference on Bioinformatics and Biomedical Engineering. p 550–560. Springer.
[9] Hamza, A., En-Nahnahi, N., Zidani, K. A., Ouatik, S. E. A. (2019). An arabic question classification method based on new taxonomy and continuous distributed representation of words. Journal of King Saud University-Computer and Information Sciences.
[10] Herskovic, J. R., Cohen, T., Subramanian, D., Iyengar, M. S., Smith, J.W., Bernstam, E.V. (2011). Medrank: Using graph based concept ranking to index biomedical texts. International journal of medical informatics, 80 (6), 431–441.
[11] Kumar, B.S., Ravi, V. (2017). Text document classification with pca and one-class svm. In: Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications. p 107–115. Springer.
[12] Labani, M., Moradi, P., Ahmadizar, F., Jalili, M. (2018). A novel multivariate filter method for feature selection in text classification problems. Engineering Applications of Artificial Intelligence, 70, 25–37.
[13] Mohamed, R.,Watada, J. (2010). An evidential reasoning based lsa approach to document classification for knowledge acquisition. In: 2010 IEEE International Conference on Industrial Engineering and Engineering  Management. p 1092– 1096. IEEE.
[14] Mohammad, A.H., Alwada’n, T., Al-Momani, O. (2016). Arabic text categorization using support vector machine, naive bayes and neural network. GSTF Journal on Computing (JoC), 5(1), 108 (2016)
[15] Patel, D.B., Shah, S., Chhinkaniwala, H. R. (2019). Fuzzy logic based multi document summarization with improved sentence scoring and redundancy removal technique. Expert Systems with Applications.
[16] Saiyad, N. Y., Prajapati, H. B., Dabhi, V. K. (2016). A survey of document clustering using semantic approach. In: 2016
International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT). p 2555–2562. IEEE.
[17] Sangaiah, A. K., Fakhry, A. E., Abdel-Basset, M., El-henawy, I. (2018). Arabic text clustering using improved clustering algorithms with dimensionality reduction. Cluster Computing, p 1–15.
[18] Taspýnar, M., Ganiz, M. C., Acarman, T. (2017). A feature based simple machine learning approach with word embeddings to named entity recognition on tweets. In: International Conference on Applications of Natural Language to Information Systems. p 254–259. Springer.


Home | Aim & Scope | Editorial Board | Author Guidelines | Publisher | Subscription | Previous Issue | Contact Us |Upcoming Conferences|Sample Issues|Library Recommendation Form|

 

Copyright © 2011 dline.info