Home| Contact Us| New Journals| Browse Journals| Journal Prices| For Authors|

Print ISSN: 0976-416X
Online ISSN:
0976-4178


  About IJCLR
  DLINE Portal Home
Home
Aims & Scope
Editorial Board
Current Issue
Next Issue
Previous Issue
Sample Issue
Upcoming Conferences
Self-archiving policy
Alert Services
Be a Reviewer
Publisher
Paper Submission
Subscription
Contact us
 
  How To Order
  Order Online
Price Information
Request for Complimentary
Print Copy
 
  For Authors
  Guidelines for Contributors
Online Submission
Call for Papers
Author Rights
 
 
RELATED JOURNALS
Journal of Digital Information Management (JDIM)
Journal of Multimedia Processing and Technologies (JMPT)
International Journal of Web Application (IJWA)

 

 
International Journal of Computational Linguistics Research
 

 

Analysis of Classification Algorithms for using in Vertical Retrieval Systems
Nemanja Popovic, Suzana Stojkovic
University of Niš Aleksandra Medvedeva 14, 1800 Niš
Abstract: Classification is the most solved and the most used machine learning problem. In a last few decades many classification algorithms have been developed. Because of that, when classification is needed in some problem solving, the best algorithm should always be selected. The problem that is analysed in this paper is choosing classification algorithms that can be used in vertical retrieval system for both document and query classification. We compared SVM, Multinomial Naïve Bayes algorithm, Bernoulli Naïve Bayes algorithm and Random forest. The experiments presented in the paper, show that in the long documents classification SVM and Multinomial Naïve Bayes algorithms have a similar precision (SVM is a little better), but the Multinomial Naive Bayes algorithm correctly classified 93.14% of queries, while SVM only 22.55%.
Keywords: Information Retrieval, Vertical Retrieval, Text Classification, Naïve Bayes Classifier, SVM Classifier, Random Forest Analysis of Classification Algorithms for using in Vertical Retrieval Systems
DOI:https://doi.org/10.6025/jcl/2021/12/1/1-8
Full_Text   PDF 214 KB   Download:   127  times
References:

[1] Manning, C.D., Raghavan, P., Schutze, H. (2009) An Introduction to Information Retrieval, Cambridge University Press, Cam bridge, England, 2009.
[2] John, B. (2005). The Search: How Google and its Rivals Rewrote the Rules of Business and Transformed Our Culture. New York: Portfolio, 2005.
[3] Duda, R.O., Hart, P. E., Stork, D. G. (2000). Pattern Classification, Second edition, Wieley Interscience, 2000.
[4] Han, J., Kamber, M. (2006). Data Mining: Concepts and Techniques, Second edition, Elsevier, 2006.
[5] Tan, P.-N., Steinbach, M., Kumar, V. (2006). Introduction to Data Mining, Addison Wesley, 2006.
[6] Mahinovs, A., Tiwari, A. (2005). Text Classification Method Review, Decision Engineering Report Series, Edited by Rajkumar Roy and David Baxter, Cranfield University, 2005.
[7] Li, Y. H., Jain, K. (1998). Classification of Text Documents, The Computer Journal, 4(18) 537-546.
[8] Le, D.-T., Bernardi, R. (2012). Query Classification Using Topic Models and Support Vector Machine, 2012 Student Research Workshop, p19-24, Jeju, Republic of Korea, 2012.
[9] Alemzadeh, M., Karray, F., Khoury, R. (2012). Query Classification using Wikipedia’s Category Graph, Journal of Emerging Technologies in Web Intelligence, 4 (3), p 207-220, 2012.
[10] Xia, C., Wang, X. (2015). Graph-Based Web Query Classification, 12th Web Information System and Application Conference, p 241-244, 2015.
[11] Cortes, C., Vapnik, V. (1995). Support-Vector Networks, Machine Learning, vol. 20, p. 273-297.
[12] Vapnik, V.N. (1995). The Nature of Statistical Learning Theory, Springer, New York.
[13] Ho, T.K. (1995). Random Decision Forest, 3rd International Conference on Document Analysis and Recognition, Montreal, QC, p 278-282,1995.
[14] Reuters-21578 Text Categorization Collection available at http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html(last access 14.3.2017)
[15] TREC Retrieval Conference, available on http://trec.nist.gov/ (last access 14.3.2017)
[16] Dumais, S.T. (2005). Latent Semantic Analysis, Annual Review of Information Science and Technology, 2005.
[17] Sahlgren, M., Cöster. R. (2004). Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization, 20th Intern. Conf. on the Computational Linguistics, Article no. 487, Geneva, Switzerland, 2004.


Home | Aim & Scope | Editorial Board | Author Guidelines | Publisher | Subscription | Previous Issue | Contact Us |Upcoming Conferences|Sample Issues|Library Recommendation Form|

 

Copyright © 2011 dline.info