Can AI Replace Human Peer Reviewers? A Comparative Analysis of AI-Generated and Human Expert Reviews
Abstract
This work analyses the performance of AI-generated peer reviews compared to human expert reviews across 62 manuscripts in the Real-Time Intelligent Systems track of the Springer Lecture Notes in Networks and Systems. Using four large language models ChatGPT 3.5, Perplexity AI, Qwen-3 Max, and DeepSeek the study evaluated 141 reviews using both AI and human reviews against 12 quality criteria, scored by five domain experts. Results show human reviews scored higher overall (mean = 3.98 vs. 3.15 for AI) with greater consistency and depth, particularly in methodological critique, literature contextualisation, and review confidence. AI reviews were more generic and less specific, struggled with scholarly subtlety, though they excelled at summarisation and formatting checks. While the difference approached statistical significance (p = 0.08), the effect size (Cohen's d = 0.56) indicated a moderate practical gap. The study concludes AI cannot replace human reviewers but may ethically augment the process in hybrid models under human oversight.
References
[1] Larsen, P. O., Ins, von, M. (2010). The Rate of growth in scientific publication and the decline in coverage provided by the Science Citation Index. Scientometrics. 84:575–603.
[2] Palmer, Kathryn. (2025). AI-Enabled Cheating Points to ‘Untenable’ Peer Review System, Inside Higher Ed, July 2025.
[3] Gaston, Thomas, E., Ounsworth, Francesca., Senders, Tessa., Ritchie, Sarah., Jones, Emma. (2020). Factors Affecting Journal Submission Numbers: Impact Factor and Peer Review Reputation. Author manuscript. doi: 10.1002/leap.1285
[4] American Journal Experts. (2018). Peer review: How we found 15 million hours of lost time. www.aje.com/en/ arc/peer-review-process-15-million-hours-lost-time/
[5] Tedford, A. (2015). Rolling out our new editorial system: EVISE®: Find out how the new system will help reviewers streamline their workload.
[6] Burley, R., Moylan, E. (2017). What might Peer Rev look like in 2030? A report from BioMed Central and Digital Science. https://figshare.com/articles/journal_contribution/What_might_peer_review_look_like_in_2030_/ 4884878/1
[7] ICMJE. (2025). Issues Latest Guidelines for Medical Journal Publication with Emphasis on AI Vigilance. Accessed on Feb. 17,]. Available at: https://lifesciences.enago.com/blogs/latest-guidelines-for-medical- journa l-ublication-with-emphasis-on-ai- ila nce# :~:tex t=Rest ricts%20 reviewers% 20on %20us ing%20 AI,notif y%2- 0the %20guideline%20committee’s%20secretariat.
[8] Kousha, Kayvan., Thelwall, Mike. (2024). Artiûcial intelligence to support publishing and peer review: A summary and review. Learned Publishing. 37 (1) p. 1-68
[9] Silva, Julio., C. M. C. Gouveia, Rafael., P. Zielinski, Kallil., M. C. Cristina, Maria., F. Diego, Oliveira., R. Odemir, Amancio, M. Bruno, Oliveira, Osvaldo., N. (2025). Jr. AI-Assisted Tools for Scientific Review Writing: Opportunities and CautionsACS Applied Materials Interfaces 17 (34), 47795-47805
[10] Juric, Mario., Rydahl, Mads., Reckman, Hilke. (2019). Comparing UNSILO concept extractionto leading NLP cloud solutions, (PDF) White Paper Comparing UNSILO concept extraction to leading NLP cloud solutions. Available from:https://www.researchgate.net/publication/38194 839 1_Wh ite_Pa per_C ompar ing_U NSI LO_co nce pt_ext raction_to_leading_NLP_cloud_solutions [accessed Nov 14 2025].
[11] Academy, Enago. Launching ‘Review Assistant: An AI-powered Tool for Peer Reviewers, https://www.ena go. com.br/academy/accelerating-peer-review-with-review-assistant/.
[12] Michèle, B., Joshua, Nuijten., R. (2020). Polanin statcheck: Automatically Detect Statistical Reporting Inco nsistencies to Increase Reproducibility of Meta Analyses, Research Synthesis Methods 11 (1).
[13] Aries, Systems. (2018). New Decision Support Tool, StatReviewer, Available in 15.0. https://www. ariess ys.c om/newsletter/february-/new-decision-support-tool-statreviewer-available-in-15-0/.
[14] https://github.com/neulab/ReviewAdvisor Elsevier www.elsevier.com/connect/reviewers-update/rollingout- our-new-editorial-system-evise.
[15] Checco, A., Bracciale, L., Loreti, P., Pinfield, S., & Bianchi, G. (2021). AI-assisted peer review. Humanities and Social Sciences Communications, 8 (1) 1–11. https://doi.org/10.1057/s41599-020-00703-8.
[16] Roberts, J., Fisher, D. (2020). pReview: The artificially intelligent conference reviewer. In: 2020 19th IEE E International Conference on Machine Learning and Applications (ICMLA) IEEE p. 665–668.
[17] Yuan, W., Liu, P., Neubig, G. (2021). Can we automate scientific reviewing? arXiv preprintarX iv:21 02.0 0176.
[18] Bharti, P. K., Ghosal, T., Agarwal, M. et al. (2024). PEERRec: An AI-based approach to automatically gen erate recommendations and predict decisions in peer review. Int J Digit Libr 25, 55–72.
[19] Korteling, J. E., (Hans) Visschedijk, van de Boer, G. C. , Blankendaal, R. A. M., Boonekamp, R. C., Eikelboom, A. R. (2021). Human versus Artificial Intelligence, Frontiers in Artificial Intelligence. Volume 4- 2021 Article 622364.
[20] Vaccaro, M., Almaatouq, A., Malone, T. (2024). When combinations of humans and AI are useful: A systematic review and meta-analysis. Nat Hum Behav 8, 2293–2303.
[21] Maadi, M., Khorshidi, Akbarzadeh., H., Aickelin, U. (2021). A Review on Human–AI Interaction in Mach ine Learning and Insights for Medical Applications. Int. J. Environ. Res. Public Health, 18, 2121.
[22] Hemmer, Patrick., Schemmer, Max., Vössing, Michael., Kühl, Niklas, (2021). Human-AI Complementarity in Hybrid Intelligence Systems: A Structured Literature Review, In: PACIS 2021 Proceedings. Paper 472. 78.
[23] Hemmer, Patrick., Schemmer, Max., Vössing, Michael., Kühl, Niklas (2021). Human-AI Complementarity in Hybrid Intelligence Systems: A Structured Literature Review, In: Twenty fifth Pacific Asia Conference on Information Systems, Dubai, UAE.
[24] Wei, Xu., Dainoff, Marvin., J. Ge, Liezhong., Gao, Zaifeng. (2021). From Human-Computer Interaction to Human-AI Interaction: New Challenges and Opportunities for Enabling Human-Centred AI. (arXiv:2105.05424).
[25] Shcherbiak, A., Habibnia, H., Böhm, R., Fiedler, S. (2024). Evaluating science: A comparison of human and AI reviewers. Judgment and Decision Making. 19, e21.
[26] The AI Review Lottery. (2024). Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates Giuseppe Russo Latona, Manoel Horta Ribeiro,Tim R. Davidson,Veniamin Veselovsky, Robert West. (rXiv:2 405.02150v1 [cs.CY] 3 May.
[27] Liang, Weixin., Izzo, Zachary., Zhang, Yaohui., Lepp, Haley., Cao, Hancheng., Zhao, Xuandong., Chen, Lingjiao., Ye, Haotian., Liu, Sheng., Huang, Zhi., et al. (2024). Monitoring aimodified content at scale: A case study on the impact of chatgpt on ai conference peer reviews. (arXiv preprint arXiv:2403.07183).
[28] Liang, Weixin., Zhang, Yaohui., Wu, Zhengxuan., Lepp, Haley., Ji, Wenlong., Zhao, Xuandong., Cao, Han cheng., Liu, Sheng., He, Siyu., Huang, Zhi, et al. Mapping the increasing use of llms in scientific papers. (arXiv preprint arXiv:2404.01268), 2024.
[29] Shai, Farber. (2025). Comparing human and AI expertise in the academic peer review process: towards a hybrid approach (January 21,). Higher Education Research Development, 0[10.1080/0729436 0.202 4.2445 575], Bar Ilan University Faculty of Law Research Paper No. 5105196, Available at SSRN: https://ssrn.com/ abstract=5105196 or http://dx.doi.org/10.1080/07294360.2024.2445575.
[30] Renata, V., Lee, J. (2025). AI Reviewers: Are Human Reviewers Still Necessary Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 0 (0).
[31] Seghier, M. L. (2025). AI-powered peer review needs human supervision. Journal of Information, Communication and Ethics in Society, 23 (1), p. 104–116. [32] Kankanhalli, Atreyi. (2024). Peer Review in the Age of Generative AI, Journal of the Association for Information Systems, 25 (1), 76-84