DLINE Journals

Journal of Digital Information Management

Vol No. 19 ,Issue No. 2 2021

Incorporating Quality Measurement into Scientific Document Retrieval

Nedra Ibrahim, Anja Habacha Chaibi, Henda Ben GhÃ©zala
RIADI Laboratory/ENSI Tunisia

Abstract: One of the challenges facing today's researchers is how to find qualitative information that meets their needs. In scientific research, the quality of information is very important for institution quality improvement and research validation. The main purpose of the paper is the proposal of a scientometric annotation approach to improve retrieval system performance and meet researchersâ€™ needs. In this work, we discuss how to use scientometrics in document annotation to improve information quality. One possible solution to this problem is to automate and facilitate the selection of qualitative scientific documents by enriching the document annotation process with scientometric criterion. Our approach provided better performance for retrieval system compared to BM25 retrieval model. The best performance was supplied by the integration of document citation number and journal or conference ranking. The best improvement rate was 34.21% in F-measure, 52.22% in nDCG, 27.45% in MAP and 83.33% in P(k). An important implication of this finding is the existence of correlation between research paper quality and paper relevance.

Keywords: Scientometric Retrieval, Scientometric Annotation, Scientific Quality, Qualitative Evaluation, Scientometric Indicator Incorporating Quality Measurement into Scientific Document Retrieval

DOI:https://doi.org/10.6025/jdim/2021/19/2/47-58

Full_Text PDF 3.12 MB Download: 9 times

References:

[1] Azeroual, O., Saake, G., Abuosba, M. (2018). Data quality measures and data cleansing for research information systems, Journal of digital information management, 16 (1), p 12-21.
[2] Azeroual, O. (2019). Text and data quality mining in CRIS, Information, Vol. 10, p 374.

[3] Azeroual, O., Saake, G., Abuosba, M., Schöpfel, J. (2020). Data Quality as a Critical Success Factor for User Acceptance of Research Information Systems. Data, 5 (2), p 35.
[4] Bornmann, L., Williams, R. (2017). Can the journal impact factor be used as a criterion for the selection of junior researchers? A large-scale empirical study based on ResearcherID data. Journal of Informetrics, Vol. 11, p 788–799.
[5] Bornmann, L., Mutz, R., Hug, S., Daniel, H. D. (2011). A multilevel meta-analysis of studies reporting correlations between the h index and 37 different h index variants,
Journal of Informetrics, Vol. 5, p 346–359.

[6] Boudin, F., Gallina, Y., Aizawa, A. (2020, July). Keyphrase Generation for Scientific Document Retrieval In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL).
[7] Bras-Amorós, M., Domingo-Ferrer, J., Torra, V. (2011). A bibliometric index based on the collaboration distance between cited and citing authors, Journal of Informetrics,
Vol. 5, p 248–264.
[8] Brody, T. (2003). Citebase Search: Autonomous Citation Database, In Third international technical workshop and conference of the project SIN, Oldenburg, Germany.
[9] Carterette, B., Voorhees, E. (2011). Overview of Information Retrieval Evaluation, Current Challenges in Patent Information Retrieval, Vol. 29, p 69–85.

[10] De Ribaupierre, H., Falquet, G. (2013). A user-centric model to semantically annotate and retrieve scientific documents, In: Proceedings of the sixth international workshop
on Exploiting semantic annotations in information retrieval, p 21–24. ACM.
[11] De Silva, P. U., Vance, C. K. (2017). Measuring the impact of scientific research, In Scientific Scholarly Communication, p 101–115. Springer, Cham.
[12] Egghe, L. (2006). Theory and practise of the g-index, Scientometrics, Vol. 69, p 121–129.

[13] Fisas, B., Ronzano, F., Saggion, H. (2016). A multilayered annotated corpus of scientific papers, In: Proceedings of the Tenth International Conference on Language
Resources and Evaluation LREC2016, Paris, France.
[14] Gábor, K., Zargayouna, H., Buscaldi, D., Tellier , I. and Charnois, T. (2016). Semantic Annotation of the ACL Anthology Corpus for the Automatic Analysis of Scientific
Literature, In: Proceedings of the LREC Conference, Portoroz, Slovenia.
[15] Galke, L., Mai, F., Schelten, A., Brunsch, D., Scherp, A. (2017). Using Titles vs. Full-text as Source for Automated Semantic Document Annotation, In: Proceedings
of the Knowledge Capture Conference, Austin, USA, p20–28. ACM.
[16] Halevi, G., Moed, H., Bar-Ilan, J. (2017). Suitability of Google Scholar as a source of scientific information and as a source of data for scientific evaluation-Review of
the literature, Journal of Informetrics, Volume 11, p 823–834.
[17] Hammarfelt, B., Rushforth, A. D. (2017). Indicators as judgment devices: An empirical study of citizen bibliometrics in research evaluation, Research Evaluation,
Volume 26, p 169–180.
[18] Harzing, A. W. (2011). The publish or perish book: your guide to effective and responsible citation analysis. Tarma software research, Australia.
[19] Harzing, A. W., Alakangas, S. (2016). Google Scholar, Scopus and the Web of Science: a longitudinal and crossdisciplinary comparison, Scientometrics, Volume
106, p 787–804.
[20] Haustein, S. (2016). Grand challenges in altmetrics: heterogeneity, data quality and dependencies, Scientometrics, Volume 108, p 413–423.
[21] Herrera, N. P., Gomez, F. L., Bucheli, V. A., Pabón, O. S. (2017). Semantic annotation and retrieval of scientific documents in a big data environment, In the 7th
Latin American Conference on Networked and Electronic Media (LACNEM2017). Valparaiso, Chile.
[22] Hirsch, J. (2005). An index to quantify an individual?s scientific research output, In: Proceedings of the National Academy of Science, Volume 102, p 16569–16572.
[23] Hood, W. W., Wilson, C. (2001). The literature of bibliometrics, scientometrics, and informetrics, Scientometrics, 52, p 291–314.
[24] Huggins-Hoyt, K. Y. (2018). African American Faculty in Social Work Schools: A Citation Analysis of Scholarship, Research on Social Work Practice, Volume 28,
p 300-308.
[25] Ibrahim, N., Habacha Chaibi, A., Ben Ghézela, H. (2017). Scientometric re-ranking approach to improve search results, In the 21st International Conference on
Knowledge-Based and Intelligent Information Engineering Systems, Marseille, France, p 447–456. IEEE.

[26] Ibrahim, N., Habacha Chaibi, A., Ben Ahmed, M. (2015). New scientometric indicator for the qualitative evaluation of scientific production, New Library World
Journal, Volume 116, p 661–676.
[27] Kboubi, F., Habacha, A C., Ben Ahmed, M. (2012). Semantic Visualization and Navigation in Textual Corpus, International Journal of Information Sciences and Techniques
(IJIST). Vol. 2, p 53–63.
[28] Manis, J. G. (1951). Some academic influences upon publication productivity Social Forces, Vol. 29, p 267– 272.
[29] Mayr, P., Scharnhorst, A. (2015). Scientometrics and information retrieval: weak-links revitalized, Scientometrics, Vol. 102, p 2193–2199.

[30] Milojevic, S., Radicchi, F., Bar-Ilan, J. (2017). Citation success index- An intuitive pair-wise journal comparison metric, Journal of Informetrics, Vol. 11, p 223–
231.
[31] Moed, H. F. (2017). From Journal Impact Factor to SJR, Eigenfactor, SNIP, CiteScore and Usage Factor, in Applied Evaluative Informetrics, p 229–244. Springer,
Cham.
[32] Noyons, E. C., Moed, H. F., Van Raan, A. F. (1999). Integrating research performance analysis and science mapping, Scientometrics, Vol. 46, p 591–604.
[33] Pal, S., Moore, T. J., Ramanathan, R., Swami, A. (2017). Comparative Topological Signatures of Growing Collaboration Networks, in Workshop on Complex Networks
CompleNet , p 201–209. Springer, Cham.

[34] Robertson, S. E. (1997). The probability ranking principle in IR, Journal of documentation, Volume 33, p 294– 304.
[35] Schöpfel, J., Azeroual, O., Saake, G. (2019). Implementation and user acceptance of research information systems: An empirical survey of German universities and
research organisations, Data Technologies and Applications, 54 (1), p 1-15.
[36] Singhal, A., Kasturi, R., Srivastava, J. (2013). Automating document annotation using open source knowledge, In: Proceedings of the IEEE/WIC/ACM International
Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT). p 199–204. IEEE Computer Society.
[37] Strohman, T., Metzler, D., Turtle, H., Croft, W.B. (2005). Indri: A language model-based search engine for complex queries, In: Proceedings of the International
Conference on Intelligent Analysis, p 2–6.
[38] Thelwall, M. (2018). Microsoft Academic automatic document searches: accuracy for journal articles and suitability for citation analysis, Journal of Informetrics,
Volume 12, p 1–9.
[39] Van Raan, A. F. J. (2013). Handbook of quantitative studies of science and technology. Elsevier.
[40] Walker, J. (2002). CrossRef and SFX: complementary linking services for libraries, New Library World, Vol. 3, p 83–89.
[41] Walters, W. H. (2017). Do subjective journal ratings represent whole journals or typical articles?, Journal of Informetrics, Vol. 11, p 730–744.
[42] Zahedi, Z., Costas, R., Wouters, P. (2014). How well developed are altmetrics? A crossdisciplinary analysis of the presence of alternative metrics? In: Scientific
publications, Scientometrics, Volume 101, p 1491–1513.
[43] Zhang, C. T. (2009). The e-index, complementing the h-index for excess citations, PLoS One, Volume 4, p 29–54.
[44] Zhao, H., Luo, Z., Feng, C., Ye, Y. (2019, July). A context-based framework for resource citation classification in scientific literatures, In: Proceedings of the 42nd
International ACM SIGIR Conference on Research and Development in Information Retrieval, p 1041-1044.