Preview

Medical Doctor and Information Technologies

Advanced search

Assessing the quality of large generative models for basic healthcare applications

https://doi.org/10.25881/18110193_2025_3_64

Abstract

Large generative models (LGMs) have significant potential for healthcare and medical science. While publications are growing exponentially, LGM studies lack quality and breakthrough findings. Research articles call for standardized approaches to ensure safe and effective integration of LGMs into clinical practice. Currently, the Moscow healthcare system is testing LGMs as tools for supporting medical decision-making, which has required development of specialized methods and techniques for assessing LGM quality. This paper presents two methods for assessing the quality of large generative models. Both methods are based on analysis of literature data (over 200 sources), results from comprehensive testing of 204 LGMs, and hands-on experience in assessing model quality using a sample of more than 12,000 cases. Designed for two main LGM application scenarios, the methods incorporate a dedicated approach to building test samples, tailored and validated questionnaires, testing methodologies, and unified requirements for the composition and structure of quality assessment outputs.

About the Authors

R. V. Reshetnikov
Moscow Center for Diagnostics and Telemedicine
Russian Federation

PhD

Moscow



I. A. Tyrov
Department of Healthcare of Moscow
Russian Federation

Moscow



Yu. A. Vasilev
Moscow Center for Diagnostics and Telemedicine
Russian Federation

PhD

Moscow



Yu. F. Shumskaya
Moscow Center for Diagnostics and Telemedicine
Russian Federation

Moscow



A. V. Vladzymyrskyy
Moscow Center for Diagnostics and Telemedicine
Russian Federation

DSc

Moscow



D. A. Akhmedzyanova
Moscow Center for Diagnostics and Telemedicine
Russian Federation

Moscow



K. Yu. Bezhenova
Moscow Center for Diagnostics and Telemedicine
Russian Federation

Moscow



M. D. Varyukhina
Moscow Center for Diagnostics and Telemedicine
Russian Federation

PhD

Moscow



M. V. Sokolova
Moscow Center for Diagnostics and Telemedicine
Russian Federation

Moscow



I. A. Blokhin
Moscow Center for Diagnostics and Telemedicine
Russian Federation

PhD

Moscow



D. A. Voytenko
Moscow Center for Diagnostics and Telemedicine
Russian Federation

Moscow



O. I. Mynko
Moscow Center for Diagnostics and Telemedicine
Russian Federation

Moscow



M. R. Kodenko
Moscow Center for Diagnostics and Telemedicine; Bauman Moscow State Technical University
Russian Federation

PhD

Moscow



O. V. Omelyanskaya
Moscow Center for Diagnostics and Telemedicine
Russian Federation

Moscow



References

1. Singh N, Neubronner S, Kanayan S, Illanes S, Choolani M, Kemp MW. Advances, reception and potential of ChatGPT as a tool for healthcare delivery and research: a systematic review. Singapore Med J. 2025 Jul 29. doi: 10.4103/singaporemedj.SMJ-2024-173.

2. Ferreira Santos J, Ladeiras-Lopes R, Leite F, Dores H. Applications of large language models in cardiovascular disease: a systematic review. Eur Heart J Digit Health. 2025; 6(4): 540-553. doi: 10.1093/ehjdh/ztaf028.

3. Andreychenko AE, Gusev AV. Perspectives on the application of large language models in healthcare. 2023; 4(4): 48-55. (In Russ.)

4. Nazarov DM, Badaev FI. Application of large language models in healthcare. Manager zdravookhranenia. 2025; 5: 142-154. (In Russ.)

5. Vasilev YA, Reshetnikov RV, Nanova OG, Vladzymyrskyy AV, et al. Application of Large Language Models in Radiological Diagnostics: A Scoping Review. Digital Diagnostics. 2025; 6(2): 268-285. (In Russ.)] doi: 10.17816/DD678373.

6. Moëll B, Sand Aronsson F. Harm Reduction Strategies for Thoughtful Use of Large Language Models in the Medical Domain: Perspectives for Patients and Clinicians. J Med Internet Res. 2025; 27: e75849. doi: 10.2196/75849.

7. Shool S, Adimi S, Saboori Amleshi R, Bitaraf E, et al. A systematic review of large language model (LLM) evaluations in clinical medicine. BMC Med Inform Decis Mak. 2025; 25(1): 117. doi: 10.1186/s12911-025-02954-4.

8. Preiksaitis C, Ashenburg N, Bunney G, Chu A, et al. The Role of Large Language Models in Transforming Emergency Medicine: Scoping Review. JMIR Med Inform. 2024; 12: e53787. doi: 10.2196/53787.

9. Flanagin A, Iorio A, Cacciamani G, Chen X, et al. Reporting guideline for Chatbot Health Advice studies: the CHART statement. BMC Med. 2025; 23(1): 447. doi: 10.1186/s12916-025-04274-w.

10. Gallifant J, Afshar M, Ameen S, Aphinyanaphongs Y, et al. The TRIPOD-LLM reporting guideline for studies using large language models: a Korean translation. Ewha Med J. 2025; 48(3): e49. doi: 10.12771/emj.2025.00661.

11. Zong H, Wu R, Cha J, Wang J, et al. Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis. J Med Internet Res. 2024; 26: e66114. doi: 10.2196/66114.


Review

For citations:


Reshetnikov R.V., Tyrov I.A., Vasilev Yu.A., Shumskaya Yu.F., Vladzymyrskyy A.V., Akhmedzyanova D.A., Bezhenova K.Yu., Varyukhina M.D., Sokolova M.V., Blokhin I.A., Voytenko D.A., Mynko O.I., Kodenko M.R., Omelyanskaya O.V. Assessing the quality of large generative models for basic healthcare applications. Medical Doctor and Information Technologies. 2025;(3):64-75. (In Russ.) https://doi.org/10.25881/18110193_2025_3_64

Views: 4


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1811-0193 (Print)
ISSN 2413-5208 (Online)