Preview

Medical Doctor and Information Technologies

Advanced search

Large generative models for radiology report interpretation: assessing feasibility and patient safety

https://doi.org/10.25881/18110193_2025_4_72

Abstract

With the digitalization of healthcare, patients have gained access to their own medical records. However, poor clarity of medical texts often prevents them from interpreting it correctly. Large generative models (LGMs) have the potential to become a tool for adapting medical texts, but their use is currently fraught with risks. The aim of the study was to evaluate the safety of using LGMs to interpret radiology protocols for patients. Seven models performed a simplified text interpretation using eight computed tomography protocols as input. The generated interpretations were submitted to physicians and respondents without medical training for evaluation. The resulting scores were analyzed to determine the safety of implementing this technology and its feasibility.
All models generated text that met the main quality criteria. However, consistent ethical and safety violations were observed. A comparative analysis failed to identify a model that was superior across all criteria. The study also identified criteria for which assessments by physicians and respondents without medical training differed significantly.
It was demonstrated that, although large-scale generative models are formally successful in simplified interpretation of medical protocols, their direct application without a control system in clinical practice is extremely unsafe. The main problem is the distortion of the original information—the inclusion of additional recommendations, diagnoses, and prognoses, which contravenes patient communication standards. It was shown that despite the technology's potential within the field, safe implementation requires the preliminary development of a quality control system for large-scale generative models, a questionnaire that takes into account the competencies of both experts and non-experts, and clear threshold criteria. This work represents the first step toward creating such a system.

About the Authors

Yu. A. Vasilev
Moscow Center for Diagnostics and Telemedicine
Russian Federation

DSc

Moscow



I. A. Tyrov
Moscow Healthcare Department
Russian Federation

Moscow



K. M. Arzamasov
Moscow Center for Diagnostics and Telemedicine
Russian Federation

DSc

Moscow



A. V. Vladzymyrskyy
Moscow Center for Diagnostics and Telemedicine
Russian Federation

DSc

Moscow



O. V. Omelyanskaya
Moscow Center for Diagnostics and Telemedicine
Russian Federation

Moscow



A. P. Pamova
Moscow Center for Diagnostics and Telemedicine
Russian Federation

Moscow



L. N. Arzamasova
Moscow Center for Diagnostics and Telemedicine
Russian Federation

Moscow



E. A. Krylova
Moscow Center for Diagnostics and Telemedicine
Russian Federation

Moscow



I. A. Raznitsyna
Moscow Center for Diagnostics and Telemedicine
Russian Federation

PhD

Moscow



E. A. Petrov
Moscow Center for Diagnostics and Telemedicine
Russian Federation

PhD

Moscow



E. V. Astapenko
Moscow Center for Diagnostics and Telemedicine
Russian Federation

Moscow



D. A. Rumyantsev
Moscow Center for Diagnostics and Telemedicine
Russian Federation

Moscow



I. A. Sharafetdinov
Moscow Center for Diagnostics and Telemedicine
Russian Federation

Moscow



References

1. Park J, Oh K, Han K, Lee YH. Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting. Sci Rep. 2024; 14: 13218. doi: 10.1038/s41598-024-63824-z.

2. Starcevic V, Berle D. Cyberchondria: towards a better understanding of excessive health-related Internet use. Expert Rev Neurother. 2013; 13: 205-13. doi: 10.1586/ern.12.162.

3. Luo A, Qin L, Yuan Y, et al. The Effect of Online Health Information Seeking on Physician-Patient Relationships: Systematic Review. J Med Internet Res. 2022; 24: e23354. doi: 10.2196/23354.

4. Vasilev YuA, Reshetnikov RV, Nanova OG, et al. Application of Large Language Models in Radiological Diagnostics: A Scoping Review. Digital Diagnostics. 2025; 6(2): 268–85. (In Russ.) doi: 10.17816/DD678373.

5. Hager P, Jungmann F, Holland R, et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making. Nat Med 2024; 30: 2613-22. doi: 10.1038/s41591-024-03097-1.

6. Aydin S, Karabacak M, Vlachos V, Margetis K. Large language models in patient education: a scoping review of applications in medicine. Front Med 2024; 11: 1477898. doi: 10.3389/fmed.2024.1477898.

7. Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale, N.J: L. Erlbaum Associates; 1988.

8. Hedges LV. A random effects model for effect sizes. Psychol Bull 1983; 93: 388-95. doi: 10.1037/0033-2909.93.2.388.

9. Doshi R, Amin KS, Khosla P, et al. Quantitative Evaluation of Large Language Models to Streamline Radiology Report Impressions: A Multimodal Retrospective Analysis. Radiology 2024; 310: e231593. doi: 10.1148/radiol.231593.

10. Rahsepar AA. Large Language Models for Enhancing Radiology Report Impressions: Improve Readability While Decreasing Burnout. Radiology. 2024; 310: e240498. doi: 10.1148/radiol.240498.

11. Van Der Mee FAM, Ottenheijm RPG, Gentry EGS, et al. The impact of different radiology report formats on patient information processing: a systematic review. Eur Radiol. 2024; 35: 2644-57. doi: 10.1007/s00330-024-11165-w.

12. Steitz BD, Turer RW, Salmi L, et al. Repeated Access to Patient Portal While Awaiting Test Results and Patient-Initiated Messaging. JAMA Netw Open. 2025; 8: e254019. doi: 10.1001/jamanetworkopen.2025.4019.

13. Anyidoho PA, Verschraegen CF, Markham MJ, et al. Impact of the Immediate Release of Clinical Information Rules on Health Care Delivery to Patients With Cancer. JCO Oncol Pract. 2023; 19: e706-13. doi: 10.1200/OP.22.00712.

14. Lee H-S, Kim S, Kim S, et al. Readability versus accuracy in LLM-transformed radiology reports: stakeholder preferences across reading grade levels. Radiol Med (Torino). 2025. doi: 10.1007/s11547-025-02098-5.

15. Park J, Oh K, Han K, Lee YH. Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting. Sci Rep. 2024; 14: 13218. doi: 10.1038/s41598-024-63824-z.

16. Ethics and Governance of Artificial Intelligence for Health: Large Multi-Modal Models. WHO Guidance. 1st ed. Geneva: World Health Organization; 2024.

17. Iskusstvennyy intellekt v luchevoy diagnostike: Per Aspera Ad Astra. Ed by Vasilev YA, Vladzymyrskyy AV. Moscow: Izdatelskie resheniya; 2025. (In Russ.)


Review

For citations:


Vasilev Yu.A., Tyrov I.A., Arzamasov K.M., Vladzymyrskyy A.V., Omelyanskaya O.V., Pamova A.P., Arzamasova L.N., Krylova E.A., Raznitsyna I.A., Petrov E.A., Astapenko E.V., Rumyantsev D.A., Sharafetdinov I.A. Large generative models for radiology report interpretation: assessing feasibility and patient safety. Medical Doctor and Information Technologies. 2025;(4):72-85. (In Russ.) https://doi.org/10.25881/18110193_2025_4_72

Views: 30


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1811-0193 (Print)
ISSN 2413-5208 (Online)