Large generative models for radiology report interpretation: assessing feasibility and patient safety
https://doi.org/10.25881/18110193_2025_4_72
Abstract
With the digitalization of healthcare, patients have gained access to their own medical records. However, poor clarity of medical texts often prevents them from interpreting it correctly. Large generative models (LGMs) have the potential to become a tool for adapting medical texts, but their use is currently fraught with risks. The aim of the study was to evaluate the safety of using LGMs to interpret radiology protocols for patients. Seven models performed a simplified text interpretation using eight computed tomography protocols as input. The generated interpretations were submitted to physicians and respondents without medical training for evaluation. The resulting scores were analyzed to determine the safety of implementing this technology and its feasibility.
All models generated text that met the main quality criteria. However, consistent ethical and safety violations were observed. A comparative analysis failed to identify a model that was superior across all criteria. The study also identified criteria for which assessments by physicians and respondents without medical training differed significantly.
It was demonstrated that, although large-scale generative models are formally successful in simplified interpretation of medical protocols, their direct application without a control system in clinical practice is extremely unsafe. The main problem is the distortion of the original information—the inclusion of additional recommendations, diagnoses, and prognoses, which contravenes patient communication standards. It was shown that despite the technology's potential within the field, safe implementation requires the preliminary development of a quality control system for large-scale generative models, a questionnaire that takes into account the competencies of both experts and non-experts, and clear threshold criteria. This work represents the first step toward creating such a system.
About the Authors
Yu. A. VasilevRussian Federation
DSc
Moscow
I. A. Tyrov
Russian Federation
Moscow
K. M. Arzamasov
Russian Federation
DSc
Moscow
A. V. Vladzymyrskyy
Russian Federation
DSc
Moscow
O. V. Omelyanskaya
Russian Federation
Moscow
A. P. Pamova
Russian Federation
Moscow
L. N. Arzamasova
Russian Federation
Moscow
E. A. Krylova
Russian Federation
Moscow
I. A. Raznitsyna
Russian Federation
PhD
Moscow
E. A. Petrov
Russian Federation
PhD
Moscow
E. V. Astapenko
Russian Federation
Moscow
D. A. Rumyantsev
Russian Federation
Moscow
I. A. Sharafetdinov
Russian Federation
Moscow
References
1. Park J, Oh K, Han K, Lee YH. Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting. Sci Rep. 2024; 14: 13218. doi: 10.1038/s41598-024-63824-z.
2. Starcevic V, Berle D. Cyberchondria: towards a better understanding of excessive health-related Internet use. Expert Rev Neurother. 2013; 13: 205-13. doi: 10.1586/ern.12.162.
3. Luo A, Qin L, Yuan Y, et al. The Effect of Online Health Information Seeking on Physician-Patient Relationships: Systematic Review. J Med Internet Res. 2022; 24: e23354. doi: 10.2196/23354.
4. Vasilev YuA, Reshetnikov RV, Nanova OG, et al. Application of Large Language Models in Radiological Diagnostics: A Scoping Review. Digital Diagnostics. 2025; 6(2): 268–85. (In Russ.) doi: 10.17816/DD678373.
5. Hager P, Jungmann F, Holland R, et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat Med 2024; 30: 2613-22. doi: 10.1038/s41591-024-03097-1.
6. Aydin S, Karabacak M, Vlachos V, Margetis K. Large language models in patient education: a scoping review of applications in medicine. Front Med 2024; 11: 1477898. doi: 10.3389/fmed.2024.1477898.
7. Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale, N.J: L. Erlbaum Associates; 1988.
8. Hedges LV. A random effects model for effect sizes. Psychol Bull 1983; 93: 388-95. doi: 10.1037/0033-2909.93.2.388.
9. Doshi R, Amin KS, Khosla P, et al. Quantitative Evaluation of Large Language Models to Streamline Radiology Report Impressions: A Multimodal Retrospective Analysis. Radiology 2024; 310: e231593. doi: 10.1148/radiol.231593.
10. Rahsepar AA. Large Language Models for Enhancing Radiology Report Impressions: Improve Readability While Decreasing Burnout. Radiology. 2024; 310: e240498. doi: 10.1148/radiol.240498.
11. Van Der Mee FAM, Ottenheijm RPG, Gentry EGS, et al. The impact of different radiology report formats on patient information processing: a systematic review. Eur Radiol. 2024; 35: 2644-57. doi: 10.1007/s00330-024-11165-w.
12. Steitz BD, Turer RW, Salmi L, et al. Repeated Access to Patient Portal While Awaiting Test Results and Patient-Initiated Messaging. JAMA Netw Open. 2025; 8: e254019. doi: 10.1001/jamanetworkopen.2025.4019.
13. Anyidoho PA, Verschraegen CF, Markham MJ, et al. Impact of the Immediate Release of Clinical Information Rules on Health Care Delivery to Patients With Cancer. JCO Oncol Pract. 2023; 19: e706-13. doi: 10.1200/OP.22.00712.
14. Lee H-S, Kim S, Kim S, et al. Readability versus accuracy in LLM-transformed radiology reports: stakeholder preferences across reading grade levels. Radiol Med (Torino). 2025. doi: 10.1007/s11547-025-02098-5.
15. Park J, Oh K, Han K, Lee YH. Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting. Sci Rep. 2024; 14: 13218. doi: 10.1038/s41598-024-63824-z.
16. Ethics and Governance of Artificial Intelligence for Health: Large Multi-Modal Models. WHO Guidance. 1st ed. Geneva: World Health Organization; 2024.
17. Iskusstvennyy intellekt v luchevoy diagnostike: Per Aspera Ad Astra. Ed by Vasilev YA, Vladzymyrskyy AV. Moscow: Izdatelskie resheniya; 2025. (In Russ.)
Review
For citations:
Vasilev Yu.A., Tyrov I.A., Arzamasov K.M., Vladzymyrskyy A.V., Omelyanskaya O.V., Pamova A.P., Arzamasova L.N., Krylova E.A., Raznitsyna I.A., Petrov E.A., Astapenko E.V., Rumyantsev D.A., Sharafetdinov I.A. Large generative models for radiology report interpretation: assessing feasibility and patient safety. Medical Doctor and Information Technologies. 2025;(4):72-85. (In Russ.) https://doi.org/10.25881/18110193_2025_4_72