Semantic Similarity, Phrase Analysis, and Expert Evaluation of Human versus LLM-Generated Abstracts

Pit Pichappan

Pit Pichappan Digital Information Research Labs Chennai. Tamil Nadu. India

Abstract

This research examines abstracts in scientific papers and how AI generates them. Abstracts are crucial to information use because they are the first sources researchers consult to decide whether a paper is worth checking out. The study analyses the abstracts from the December 2025 issues of Antioxidants and PLOS Computational Biology. These abstracts are generated by the authors themselves, ChatGPT, or Qwen. To evaluate, we used semantic similarity (Jaccard index), phrase occurrence frequency, and expert scores. It seems this covers quality, detectability, and what it all means for science writing. The results showed that the AI-generated abstracts were more similar to one another than to human generated abstracts. The mean Jaccard index was around 0.66 to 0.68 for the AIs compared to themselves, but lower with the author written stuff. That points to both AIs writing in a similar style, regardless. Domain specific terms appeared in both humans and AIs, but the way they used them, such as frequency and exact types, differed between ChatGPT and Qwen. Expert scoring assigned higher grades to AI abstracts based on clarity, structure, scientific sense, and originality or relevance. Qwen got a mean of 9.29, ChatGPT 9.02, while all human authored ones averaged 7.75. The ANOVA test reflects that both human and resulted in about 79 per cent of the variation in the scores. This finding suggests that AI can generate more impressive and comprehensive summaries than humans. Still, there are ethical problems to consider, such as how AI might fabricate references, spread misinformation, or even hijack peer review. The analysis estimates that 10 to 14 per cent of recent biomedical abstracts indicate AI assistance. It indicates that we need better ways to detect it, clearer rules, and to focus more on the actual research ideas rather than on how slick the writing is.

References

[1] Tullu, M. S., Karande, S. (2017). Writing a model research paper: A roadmap, J Postgrad Med Jul-Sep 63 (3) 143146. [2] Franz, E., Babl, Maximilian., Babl, P. (2023). Generative artiûcial intelligence: Can ChatGPT write a quality abstract, Emergency Medicine Australasia 35, 809–811. [3] Altmäe, Signe., et al. (2023). Artificial intelligence in scientific writing: a friend or a foe, Reproductive BioMedicine Online, Vol 47, (1) p 3-9 July. [4] Amy, E., Manley, Rachel, Perry., Paul, Moran., Sarah, Dawson., Lucy, Biddle., Jelena, Savoviæ. (2025). Effect of medical school initiatives on help seeking for mental health problems among medical students: a systematic review and meta analysis, BMJ Open. 2026 Feb 9 16(2) e111351. [5] Catherine, A., Gao, Frederick, M., Howard, 2., Nikolay, S., Markov, 1., Emma, C., Dyer 2., Siddhi, Ramesh, 2, Yuan, Luo Alexander. T,. Pearson. (2023). Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers, npj Digital Medicine (2023) 6:75. [6] Karl-Patrik, Kresoja., Anne, Rebecca Schöber., Thomas, Lüscher., Tharusan, Thevathasan., Philipp, Lurz., Konstantinos, Papoutsis., Stephan, Baldus., Stefan, Blankenberg., Rabea, Hinkel., Holger, Thiele. (2025). Performance of artificial intelligence generated vs human-authored abstracts in a real-world setting, European Heart Journal, ehaf654. [7] Evelyn, T., Pan, Maria., Florian-Rodriguez. (2024). Human vs machine: identifying ChatGPT-generated abstracts in Gynaecology and Urogynecology, American Journal of Obstetrics and Gynaecology, Vol 231, (2), 2024, Pages 276.e1-276.e10. [8] Alperen, Elek., Hatice, Sude Yildiz., Benan, Akca., Nisa, Cem Oren., Batuhan, Gundogdu. (2025). Evaluating the Efficacy of Perplexity Scores in Distinguishing AI-Generated and Human Written Abstracts, Academic Radiology, Vol 32, (4) Pages 1785-1790. [9] Ahmed, M., Elkhatat, Khaled Elsaid., Saeed, Almeer. (2023). Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text, International Journal for Educational Integrity 19 17. [10] Weber, Wulff et al. (2023). Testing of detection tools for AI generated text, International Journal for Educational Integrity. [11] Voss, Erik. Can Ai Write an Abstract for Me: A Genre-Based Comparison of Published and AI-Generated Research Abstracts. Available at SSRN: https://ssrn.com/abstract=5371798 or http://dx.doi.org/10.2139/ssrn.5371798. [12] Kumar, Vikas., Bharti, Amisha., Verma, Devanshu., Bhatnagar, Vasudha. (2024). Deep dive into language traits of AI-generated Abstracts. Association for Computing Machinery. New York, NY, USA. In:Proceedings of the 7th Joint International Conference on Data Science Management of Data (11th ACM IKDD CODS and 29th COMAD)}, p.{237–241 Bangalore, India}. [13] Kobak, D., González-Márquez, R., Horvát, E. A., Lause, J. (2025). Preprint at arXiv https://doi.org/10.48550/ arXiv.2406.07016. [14] Smriti, Mallapaty., (02 July 2025), Signs of AI-generated text were found in 14% of biomedical abstracts last year, Nature. Jul 2. doi: 10.1038/d41586-025-02097-6. [15] Dmitry, Kobak., Rita, González-Márquez., Emõke-Ágnes Horvát., Jan, Lause. (2025). Delving into LLM-assisted writing in biomedical publications through excess vocabulary, Science Advances, 2 Jul 2025, Vol. 11 (27). arXiv:2406.07016v5. [16] Mingmeng, Geng., Roberto, Trotta. (2025) Human-LLM Coevolution: Evidence from Academic Writing arXiv: 2502.09606v2 . cs.CL] 17 Feb. [17] Paul, Arnold., (2025). Scientists who use AI tools are publishing more papers than ever before. Phys.org, In https://phys.org/news/2025-12-scientists-ai-tools-publishing-papers.html. [18] Cheng, S. L., Tsai, S. J., Bai, Y. M., Ko, CH., Hsu, C. W., Yang, F. C., Tsai, C. K., Tu, Y. K., Yang, S. N., Tseng, P.T., Hsu, T, W., Liang, C. S., Su, K. P. (2023). Comparisons of Quality, Correctness, and Similarity Between ChatGPTGenerated and Human Written Abstracts for Basic Research: Cross Sectional Study. J Med Internet Res. Dec 25;25:e51229. [19] Kocak, B., Onur, M. R., Park, S. H., P. Baltzer, M., Dietzel. (2025). Ensuring peer review integrity in the era of large language models: A critical stocktaking of challenges, red flags, and recommendations, European Journal of Radiology Artificial Intelligence, vol. 2, p. 100018. [20] Kanchon, Gharami., Sanjiv Kumar, Sarkar., Yongxin, Liu., Shafika, Showkat. (2025). MoniChatGPT: Excellent Paper Accept It. Editor: Imposter Found Review Rejected. arXiv:2512.20405v2. [21] Maloyan, N., Ashinov, B., Namiot, D. (2025). Investigating the vulnerability of llmas a judge architectures to prompt-injection attacks, arXivpreprintarXiv:2505.13348. [22] Gibney, E. (2025). Scientists hide messages in papers to game AI peer review, Nature, vol. 643, no. 8073, p. 887–888. [23] Lin, Z. (2025). Hidden prompts in manuscripts exploit AI-assisted peer review, arXivpreprint arXiv: 2507.06185. [24] Soumya, Akkureddy. (April 2024). Is It Ethical to Use AI-Generated Abstracts Without Altering It In: https:/ /paperpal.com/blog/researcher/isitethicaltouseaigeneratedabstractswithoutalteringit.[25] Ufnalska, S., Hartley, J. (2009). How can we evaluate the quality of abstracts European Science Editing, 35(3), 69-72. [26] Tcherni, Buzzeo, M., Pyrczak, F. (2024). Evaluating Abstracts. In: Evaluating Research in Academic Journals p. 49-64). Routledge. [27] Pickens, J., Croft, W. B. (2000, April). An exploratory analysis of phrases in text retrieval. In RIAO p. 11791195. [28] Bedathur, S., Berberich, K., Dittrich, J., Mamoulis, N., Weikum, G. (2010). Interesting-phrase mining for adhoc text analytics. In: Proceedings of the VLDB Endowment, 3(1-2), 1348-1357.