Semantic Similarity, Phrase Analysis, and Expert Evaluation of Human versus LLM-Generated Abstracts

Author: Pit Pichappan

Journal: Journal of Digital Information Management

Year: 2026

Abstract

This research examines abstracts in scientific papers and how AI generates them. Abstracts are crucial to
information use because they are the first sources researchers consult to decide whether a paper is worth
checking out. The study analyses the abstracts from the December 2025 issues of Antioxidants and PLOS
Computational Biology. These abstracts are generated by the authors themselves, ChatGPT, or Qwen. To evaluate,
we used semantic similarity (Jaccard index), phrase occurrence frequency, and expert scores. It seems
this covers quality, detectability, and what it all means for science writing.
The results showed that the AI-generated abstracts were more similar to one another than to human generated
abstracts. The mean Jaccard index was around 0.66 to 0.68 for the AIs compared to themselves, but lower
with the author written stuff. That points to both AIs writing in a similar style, regardless. Domain specific
terms appeared in both humans and AIs, but the way they used them, such as frequency and exact types,
differed between ChatGPT and Qwen. Expert scoring assigned higher grades to AI abstracts based on clarity,
structure, scientific sense, and originality or relevance. Qwen got a mean of 9.29, ChatGPT 9.02, while all
human authored ones averaged 7.75. The ANOVA test reflects that both human and resulted in about 79 per
cent of the variation in the scores. This finding suggests that AI can generate more impressive and
comprehensive summaries than humans. Still, there are ethical problems to consider, such as how AI might
fabricate references, spread misinformation, or even hijack peer review. The analysis estimates that 10 to 14
per cent of recent biomedical abstracts indicate AI assistance. It indicates that we need better ways to detect
it, clearer rules, and to focus more on the actual research ideas rather than on how slick the writing is.

Full Content

[1] Tullu, M. S., Karande, S. (2017). Writing a model research paper: A roadmap, J Postgrad Med Jul-Sep 63 (3) 143- 146.
[2] Franz, E., Babl, Maximilian., Babl, P. (2023). Generative artiûcial intelligence: Can ChatGPT write a quality abstract, Emergency Medicine Australasia 35, 809–811.
[3] Altmäe, Signe., et al. (2023). Artificial intelligence in scientific writing: a friend or a foe, Reproductive BioMedicine Online, Vol 47, (1) p 3-9 July.
[4] Amy, E., Manley, Rachel, Perry., Paul, Moran., Sarah, Dawson., Lucy, Biddle., Jelena, Savoviæ. (2025). Effect of medical school initiatives on help seeking for mental health problems among medical students: a systematic review and meta analysis, BMJ Open. 2026 Feb 9 16(2) e111351.
[5] Catherine, A., Gao, Frederick, M., Howard, 2., Nikolay, S., Markov, 1., Emma, C., Dyer 2., Siddhi, Ramesh, 2, Yuan, Luo Alexander. T,. Pearson. (2023). Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers, npj Digital Medicine (2023) 6:75.
[6] Karl-Patrik, Kresoja., Anne, Rebecca Schöber., Thomas, Lüscher., Tharusan, Thevathasan., Philipp, Lurz., Konstantinos, Papoutsis., Stephan, Baldus., Stefan, Blankenberg., Rabea, Hinkel., Holger, Thiele. (2025). Performance of artificial intelligence generated vs human-authored abstracts in a real-world setting, European Heart Journal, ehaf654.
[7] Evelyn, T., Pan, Maria., Florian-Rodriguez. (2024). Human vs machine: identifying ChatGPT-generated abstracts in Gynaecology and Urogynecology, American Journal of Obstetrics and Gynaecology, Vol 231, (2), 2024, Pages 276.e1-276.e10.
[8] Alperen, Elek., Hatice, Sude Yildiz., Benan, Akca., Nisa, Cem Oren., Batuhan, Gundogdu. (2025). Evaluating the Efficacy of Perplexity Scores in Distinguishing AI-Generated and Human Written Abstracts, Academic Radiology, Vol 32, (4) Pages 1785-1790.
[9] Ahmed, M., Elkhatat, Khaled Elsaid., Saeed, Almeer. (2023). Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text, International Journal for Educational Integrity 19 1 7 . [10] Weber, Wulff et al. (2023). Testing of detection tools for AI generated text, International Journal for Educational Integrity.
[11] Voss, Erik. Can Ai Write an Abstract for Me: A Genre-Based Comparison of Published and AI-Generated Research Abstracts. Available at SSRN: https://ssrn.com/abstract=5371798 or http://dx.doi.org/10.2139/ssrn. 5371798.
[12] Kumar, Vikas., Bharti, Amisha., Verma, Devanshu., Bhatnagar, Vasudha. (2024). Deep dive into language traits of AI-generated Abstracts. Association for Computing Machinery. New York, NY, USA. In:Proceedings of the 7th Joint International Conference on Data Science Management of Data (11th ACM IKDD CODS and 29th COMAD)}, p.{237–241 Bangalore, India}.
[13] Kobak, D., González-Márquez, R., Horvát, E. A., Lause, J. (2025). Preprint at arXiv https://doi.org/10.48550/ arXiv.2406.07016.
[14] Smriti, Mallapaty., (02 July 2025), Signs of AI-generated text were found in 14% of biomedical abstracts last year, Nature. Jul 2. doi: 10.1038/d41586-025-02097-6.
[15] Dmitry, Kobak., Rita, González-Márquez., Emõke-Ágnes Horvát., Jan, Lause. (2025). Delving into LLM-assisted writing in biomedical publications through excess vocabulary, Science Advances, 2 Jul 2025, Vol. 11 (27). arXiv:2406.07016v5.
[16] Mingmeng, Geng., Roberto, Trotta. (2025) Human-LLM Coevolution: Evidence from Academic Writing arXiv: 2502.09606v2 . cs.CL] 17 Feb.
[17] Paul, Arnold., (2025). Scientists who use AI tools are publishing more papers than ever before. Phys.org, In <a href="https://phys.org/news/2025-12-scientists-ai-tools-publishing-papers.html">https://phys.org/news/2025-12-scientists-ai-tools-publishing-papers.html</a>.
[18] Cheng, S. L., Tsai, S. J., Bai, Y. M., Ko, CH., Hsu, C. W., Yang, F. C., Tsai, C. K., Tu, Y. K., Yang, S. N., Tseng, P.T., Hsu, T, W., Liang, C. S., Su, K. P. (2023). Comparisons of Quality, Correctness, and Similarity Between ChatGPTGenerated and Human Written Abstracts for Basic Research: Cross Sectional Study. J Med Internet Res. Dec 25;25:e51229.
[19] Kocak, B., Onur, M. R., Park, S. H., P. Baltzer, M., Dietzel. (2025). Ensuring peer review integrity in the era of large language models: A critical stocktaking of challenges, red flags, and recommendations,” European Journal of Radiology Artificial Intelligence, vol. 2, p. 100018.
[20] Kanchon, Gharami., Sanjiv Kumar, Sarkar., Yongxin, Liu., Shafika, Showkat. (2025). MoniChatGPT: Excellent Paper Accept It. Editor: Imposter Found Review Rejected. arXiv:2512.20405v2.
[21] Maloyan, N., Ashinov, B., Namiot, D. (2025). Investigating the vulnerability of llmas a judge architectures to prompt-injection attacks, arXivpreprintarXiv:2505.13348.
[22] Gibney, E. (2025). Scientists hide messages in papers to game AI peer review, Nature, vol. 643, no. 8073, p. 887–888.
[23] Lin, Z. (2025). Hidden prompts in manuscripts exploit AI-assisted peer review, arXivpreprint arXiv: 2507.06185.
[24] Soumya, Akkureddy. (April 2024). Is It Ethical to Use AI-Generated Abstracts Without Altering It In: https:/ /paperpal.com/blog/researcher/isitethicaltouseaigeneratedabstractswithoutalteringit.
[25] Ufnalska, S., Hartley, J. (2009). How can we evaluate the quality of abstracts European Science Editing, 35(3), 69-72.
[26] Tcherni, Buzzeo, M., Pyrczak, F. (2024). Evaluating Abstracts. In: Evaluating Research in Academic Journalsp. 49-64). Routledge.
[27] Pickens, J., Croft, W. B. (2000, April). An exploratory analysis of phrases in text retrieval. In RIAO p. 1179- 1195.
[28] Bedathur, S., Berberich, K., Dittrich, J., Mamoulis, N., Weikum, G. (2010). Interesting-phrase mining for adhoc text analytics. Proceedings of the VLDB Endowment, 3(1-2), 1348-1357.