International Journal of Computational Linguistics Research

DLINE Journals portal

Home

New Journals

Browse Journals

Journal Prices

For Authors

Print ISSN: 0976-416X
Online ISSN: 0976-4178

About IJCLR
	DLINE Portal Home Home Aims & Scope Editorial Board Current Issue Next Issue Previous Issue Sample Issue Upcoming Conferences Self-archiving policy Alert Services Be a Reviewer Publisher Paper Submission Subscription Contact us

How To Order
	Order Online Price Information Request for Complimentary Print Copy

For Authors
	Guidelines for Contributors Online Submission Call for Papers Author Rights

RELATED JOURNALS

Journal of Digital Information Management (JDIM)

Journal of Multimedia Processing and Technologies (JMPT)

International Journal of Web Application (IJWA)

International Journal of Computational Linguistics Research

Analysis of Classification Algorithms for using in Vertical Retrieval Systems

Nemanja Popovic, Suzana Stojkovic
University of NiÃ…Â¡ Aleksandra Medvedeva 14, 1800 NiÃ…Â¡

Abstract: Classification is the most solved and the most used machine learning problem. In a last few decades many classification algorithms have been developed. Because of that, when classification is needed in some problem solving, the best algorithm should always be selected. The problem that is analysed in this paper is choosing classification algorithms that can be used in vertical retrieval system for both document and query classification. We compared SVM, Multinomial NaÃƒÂ¯ve Bayes algorithm, Bernoulli NaÃƒÂ¯ve Bayes algorithm and Random forest. The experiments presented in the paper, show that in the long documents classification SVM and Multinomial NaÃƒÂ¯ve Bayes algorithms have a similar precision (SVM is a little better), but the Multinomial Naive Bayes algorithm correctly classified 93.14% of queries, while SVM only 22.55%.

Keywords: Information Retrieval, Vertical Retrieval, Text Classification, NaÃƒÂ¯ve Bayes Classifier, SVM Classifier, Random Forest Analysis of Classification Algorithms for using in Vertical Retrieval Systems

DOI:https://doi.org/10.6025/jcl/2021/12/1/1-8

Full_Text PDF 214 KB Download: 127 times

References:

[1] Manning, C.D., Raghavan, P., Schutze, H. (2009) An Introduction to Information Retrieval, Cambridge University Press, Cam bridge, England, 2009.
[2] John, B. (2005). The Search: How Google and its Rivals Rewrote the Rules of Business and Transformed Our Culture. New York: Portfolio, 2005.
[3] Duda, R.O., Hart, P. E., Stork, D. G. (2000). Pattern Classification, Second edition, Wieley Interscience, 2000.
[4] Han, J., Kamber, M. (2006). Data Mining: Concepts and Techniques, Second edition, Elsevier, 2006.
[5] Tan, P.-N., Steinbach, M., Kumar, V. (2006). Introduction to Data Mining, Addison Wesley, 2006.
[6] Mahinovs, A., Tiwari, A. (2005). Text Classification Method Review, Decision Engineering Report Series, Edited by Rajkumar Roy and David Baxter, Cranfield University, 2005.
[7] Li, Y. H., Jain, K. (1998). Classification of Text Documents, The Computer Journal, 4(18) 537-546.
[8] Le, D.-T., Bernardi, R. (2012). Query Classification Using Topic Models and Support Vector Machine, 2012 Student Research Workshop, p19-24, Jeju, Republic of Korea, 2012.
[9] Alemzadeh, M., Karray, F., Khoury, R. (2012). Query Classification using Wikipedia’s Category Graph, Journal of Emerging Technologies in Web Intelligence, 4 (3), p 207-220, 2012.
[10] Xia, C., Wang, X. (2015). Graph-Based Web Query Classification, 12th Web Information System and Application Conference, p 241-244, 2015.
[11] Cortes, C., Vapnik, V. (1995). Support-Vector Networks, Machine Learning, vol. 20, p. 273-297.
[12] Vapnik, V.N. (1995). The Nature of Statistical Learning Theory, Springer, New York.
[13] Ho, T.K. (1995). Random Decision Forest, 3rd International Conference on Document Analysis and Recognition, Montreal, QC, p 278-282,1995.
[14] Reuters-21578 Text Categorization Collection available at http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html(last access 14.3.2017)
[15] TREC Retrieval Conference, available on http://trec.nist.gov/ (last access 14.3.2017)
[16] Dumais, S.T. (2005). Latent Semantic Analysis, Annual Review of Information Science and Technology, 2005.
[17] Sahlgren, M., Cöster. R. (2004). Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization, 20th Intern. Conf. on the Computational Linguistics, Article no. 487, Geneva, Switzerland, 2004.

DLINE Journals portal