Large language models in ophthalmology: a scoping review on their utility for clinicians, researchers, patients, and educators

Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29:1930–40. https://doi.org/10.1038/s41591-023-02448-8.
Google Scholar
Cascella M, Semeraro F, Montomoli J, Bellini V, Piazza O, Bignami E. The breakthrough of large language models release for medical applications: 1-year timeline and perspectives. J Med Syst. 2024;48:22. https://doi.org/10.1007/s10916-024-02045-3.
Google Scholar
Eysenbach G. The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Med Educ. 2023;9:e46885. https://doi.org/10.2196/46885.
Google Scholar
Zandi R, Fahey JD, Drakopoulos M, Bryan JM, Dong S, Bryar PJ, et al. Exploring diagnostic precision and triage proficiency: a comparative study of GPT-4 and bard in addressing common ophthalmic complaints. Bioengineering. 2024;11:120. https://doi.org/10.3390/bioengineering11020120.
Google Scholar
Lyons RJ, Arepalli SR, Fromal O, Choi JD, Jain N. Artificial intelligence chatbot performance in triage of ophthalmic conditions. Can J Ophthalmol. 2024;59:e301–e308. https://doi.org/10.1016/j.jcjo.2023.07.016.
Google Scholar
Singh S, Djalilian A, Ali MJ. ChatGPT and ophthalmology: exploring its potential with discharge summaries and operative notes. Semin Ophthalmol. 2023;38:503–7. https://doi.org/10.1080/08820538.2023.2209166.
Google Scholar
Gopalakrishnan N, Joshi A, Chhablani J, Yadav NK, Reddy NG, Rani PK, et al. Recommendations for initial diabetic retinopathy screening of diabetic patients using large language model-based artificial intelligence in real-life case scenarios. Int J Retin Vitreous. 2024;10:11. https://doi.org/10.1186/s40942-024-00533-9.
Google Scholar
Choudhary A, Gopalakrishnan N, Joshi A, Balakrishnan D, Chhablani J, Yadav NK, et al. Recommendations for diabetic macular edema management by retina specialists and large language model-based artificial intelligence platforms. Int J Retin Vitreous. 2024;10:22. https://doi.org/10.1186/s40942-024-00544-6.
Google Scholar
Liu X, Wu J, Shao A, Shen W, Ye P, Wang Y, et al. Uncovering language disparity of ChatGPT on retinal vascular disease classification: cross-sectional study. J Med Internet Res. 2024;26:e51926. https://doi.org/10.2196/51926.
Google Scholar
Mohammadi SS, Nguyen QD. A user-friendly approach for the diagnosis of diabetic retinopathy using ChatGPT and automated machine learning. Ophthalmol Sci. 2024;4:100495. https://doi.org/10.1016/j.xops.2024.100495.
Google Scholar
Chen X, Zhang W, Xu P, Zhao Z, Zheng Y, Shi D, et al. FFA-GPT: an automated pipeline for fundus fluorescein angiography interpretation and question-answer. NPJ Digit Med. 2024;7:111. https://doi.org/10.1038/s41746-024-01101-z.
Google Scholar
Chen X, Zhang W, Zhao Z, Xu P, Zheng Y, Shi D, et al. ICGA-GPT: report generation and question answering for indocyanine green angiography images. Br J Ophthalmol. 2024;108:1450–6. https://doi.org/10.1136/bjo-2023-324446.
Google Scholar
Lin Z, Zhang D, Shi D, Xu R, Tao Q, Wu L, et al. Contrastive pre-training and linear interaction attention-based transformer for universal medical reports generation. J Biomed Inf. 2023;138:104281. https://doi.org/10.1016/j.jbi.2023.104281.
Google Scholar
Chen X, Xu P, Li Y, Zhang W, Song F, He M, et al. ChatFFA: an ophthalmic chat system for unified vision-language understanding and question answering for fundus fluorescein angiography. iScience. 2024;27:110021. https://doi.org/10.1016/j.isci.2024.110021.
Google Scholar
Carlà MM, Gambini G, Baldascino A, Giannuzzi F, Boselli F, Crincoli E, et al. Exploring AI-chatbots’ capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases. Br J Ophthalmol. 2024;108:1457–69. https://doi.org/10.1136/bjo-2023-325143.
Google Scholar
Huang X, Raja H, Madadi Y, Delsoz M, Poursoroush A, Kahook MY, et al. Predicting glaucoma before onset using a large language model chatbot. Am J Ophthalmol. 2024;266:289–99. https://doi.org/10.1016/j.ajo.2024.05.022.
Google Scholar
Kass MA, Heuer DK, Higginbotham EJ, Johnson CA, Keltner JL, Miller JP, et al. The ocular hypertension treatment study: a randomized trial determines that topical ocular hypotensive medication delays or prevents the onset of primary open-angle glaucoma. Arch Ophthalmol. 2002;120:701–13. https://doi.org/10.1001/archopht.120.6.701.
Google Scholar
Carlà MM, Gambini G, Baldascino A, Boselli F, Giannuzzi F, Margollicci F, et al. Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison. Graefes Arch Clin Exp Ophthalmol. 2024;262:2945–59. https://doi.org/10.1007/s00417-024-06470-5.
Google Scholar
Rojas-Carabali W, Sen A, Agarwal A, Tan G, Cheung CY, Rousselot A, et al. Chatbots Vs. human experts: evaluating diagnostic performance of chatbots in uveitis and the perspectives on AI adoption in ophthalmology. Ocul Immunol Inflamm. 2024;32:1591–8. https://doi.org/10.1080/09273948.2023.2266730.
Google Scholar
Ćirković A, Katz T. Exploring the potential of ChatGPT-4 in predicting refractive surgery categorizations: comparative study. JMIR Form Res. 2023;7:e51798. https://doi.org/10.2196/51798.
Google Scholar
Ali MJ. ChatGPT and lacrimal drainage disorders: performance and scope of improvement. Ophthalmic Plast Reconstr Surg. 2023;39:221–5. https://doi.org/10.1097/IOP.0000000000002418.
Google Scholar
Tailor PD, Dalvin LA, Chen JJ, Iezzi R, Olsen TW, Scruggs BA, et al. A comparative study of responses to retina questions from either experts, expert-edited large language models, or expert-edited large language models alone. Ophthalmol Sci. 2024;4:100485. https://doi.org/10.1016/j.xops.2024.100485.
Google Scholar
Pushpanathan K, Lim ZW, Er Yew SM, Chen DZ, Hui’En Lin HA, Lin Goh JH, et al. Popular large language model chatbots’ accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries. iScience. 2023;26:108163. https://doi.org/10.1016/j.isci.2023.108163.
Google Scholar
Tailor PD, Xu TT, Fortes BH, Iezzi R, Olsen TW, Starr MR, et al. Appropriateness of ophthalmology recommendations from an online chat-based artificial intelligence model. Mayo Clin Proc Digit Health. 2024;2:119–28. https://doi.org/10.1016/j.mcpdig.2024.01.003.
Google Scholar
Barclay KS, You JY, Coleman MJ, Mathews PM, Ray VL, Riaz KM, et al. Quality and agreement with scientific consensus of ChatGPT information regarding corneal transplantation and Fuchs dystrophy. Cornea. 2024;43:746–50. https://doi.org/10.1097/ICO.0000000000003439.
Google Scholar
Kianian R, Sun D, Crowell EL, Tsui E. The use of large language models to generate education materials about uveitis. Ophthalmol Retin. 2024;8:195–201. https://doi.org/10.1016/j.oret.2023.09.008.
Google Scholar
Dihan Q, Chauhan MZ, Eleiwa TK, Hassan AK, Sallam AB, Khouri AS, et al. Using large language models to generate educational materials on childhood glaucoma. Am J Ophthalmol. 2024;265:28–38. https://doi.org/10.1016/j.ajo.2024.04.004.
Google Scholar
Ferro Desideri L, Roth J, Zinkernagel M, Anguita R. Application and accuracy of artificial intelligence-derived large language models in patients with age related macular degeneration. Int J Retin Vitreous. 2023;9:71. https://doi.org/10.1186/s40942-023-00511-7.
Google Scholar
Wu G, Zhao W, Wong A, Lee DA. Patients with floaters: answers from virtual assistants and large language models. Digit Health. 2024;10:20552076241229933. https://doi.org/10.1177/20552076241229933.
Google Scholar
Milad D, Antaki F, Milad J, Farah A, Khairy T, Mikhail D, et al. Assessing the medical reasoning skills of GPT-4 in complex ophthalmology cases. Br J Ophthalmol. 2024;108:1398–405. https://doi.org/10.1136/bjo-2023-325053.
Google Scholar
Antaki F, Milad D, Chia MA, Giguère CÉ, Touma S, El-Khoury J, et al. Capabilities of GPT-4 in ophthalmology: an analysis of model entropy and progress towards human-level medical question answering. Br J Ophthalmol. 2024;108:1371–8. https://doi.org/10.1136/bjo-2023-324438.
Google Scholar
Antaki F, Touma S, Milad D, El-Khoury J, Duval R. Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomings. Ophthalmol Sci. 2023;3:100324. https://doi.org/10.1016/j.xops.2023.100324.
Google Scholar
Botross M, Mohammadi SO, Montgomery K, Crawford C. Performance of Google’s artificial intelligence chatbot “Bard” (now “Gemini”) on ophthalmology board exam practice questions. Cureus. 2024;16:e57348. https://doi.org/10.7759/cureus.57348.
Google Scholar
Haddad F, Saade JS. Performance of ChatGPT on ophthalmology-related questions across various examination levels: observational study. JMIR Med Educ. 2024;10:e50842. https://doi.org/10.2196/50842.
Google Scholar
Fowler T, Pullen S, Birkett L. Performance of ChatGPT and Bard on the official part 1 FRCOphth practice questions. Br J Ophthalmol. 2024;108:1379–83. https://doi.org/10.1136/bjo-2023-324091.
Google Scholar
Thirunavukarasu AJ, Mahmood S, Malem A, Foster WP, Sanghera R, Hassan R, et al. Large language models approach expert-level clinical knowledge and reasoning in ophthalmology: a head-to-head cross-sectional study. PLOS Digit Health. 2024;3:e0000341. https://doi.org/10.1371/journal.pdig.0000341.
Google Scholar
Panthier C, Gatinel D. Success of ChatGPT, an AI language model, in taking the French language version of the European Board of Ophthalmology examination: a novel approach to medical knowledge assessment. J Fr Ophtalmol. 2023;46:706–11. https://doi.org/10.1016/j.jfo.2023.05.006.
Google Scholar
Sakai D, Maeda T, Ozaki A, Kanda GN, Kurimoto Y, Takahashi M. Performance of ChatGPT in board examinations for specialists in the Japanese Ophthalmology Society. Cureus. 2023;15:e49903. https://doi.org/10.7759/cureus.49903.
Google Scholar
Cai LZ, Shaheen A, Jin A, Fukui R, Yi JS, Yannuzzi N, et al. Performance of generative large language models on ophthalmology board-style questions. Am J Ophthalmol. 2023;254:141–9. https://doi.org/10.1016/j.ajo.2023.05.024.
Google Scholar
Jiao C, Edupuganti NR, Patel PA, Bui T, Sheth V. Evaluating the artificial intelligence performance growth in ophthalmic knowledge. Cureus. 2023;15:e45700. https://doi.org/10.7759/cureus.45700.
Google Scholar
Moshirfar M, Altaf AW, Stoakes IM, Tuttle JJ, Hoopes PC. Artificial intelligence in ophthalmology: a comparative analysis of GPT-3.5, GPT-4, and human expertise in answering StatPearls questions. Cureus. 2023;15:e40822. https://doi.org/10.7759/cureus.40822.
Google Scholar
Lim ZW, Pushpanathan K, Yew SME, Lai Y, Sun CH, Lam JSH, et al. Benchmarking large language models’ performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine. 2023;95:104770. https://doi.org/10.1016/j.ebiom.2023.104770.
Google Scholar
Marshall RF, Mallem K, Xu H, Thorne J, Burkholder B, Chaon B, et al. Investigating the accuracy and completeness of an artificial intelligence large language model about uveitis: an evaluation of ChatGPT. Ocul Immunol Inflamm. 2024;32:2052–5. https://doi.org/10.1080/09273948.2024.2317417.
Google Scholar
Delsoz M, Raja H, Madadi Y, Tang AA, Wirostko BM, Kahook MY, et al. The use of ChatGPT to assist in diagnosing glaucoma based on clinical case reports. Ophthalmol Ther. 2023;12:3121–32. https://doi.org/10.1007/s40123-023-00805-x.
Google Scholar
Taloni A, Borselli M, Scarsi V, Rossi C, Coco G, Scorcia V, et al. Comparative performance of humans versus GPT-4.0 and GPT-3.5 in the self-assessment program of American Academy of Ophthalmology. Sci Rep. 2023;13:18562. https://doi.org/10.1038/s41598-023-45837-2.
Google Scholar
Singer MB, Fu JJ, Chow J, Teng CC. Development and evaluation of aeyeconsult: a novel ophthalmology Chatbot leveraging verified textbook knowledge and GPT-4. J Surg Educ. 2024;81:438–43. https://doi.org/10.1016/j.jsurg.2023.11.019.
Google Scholar
Raja H, Munawar A, Mylonas N, Delsoz M, Madadi Y, Elahi M, et al. Automated category and trend analysis of scientific articles on ophthalmology using large language models: development and usability study. JMIR Form Res. 2024;8:e52462. https://doi.org/10.2196/52462.
Google Scholar
Dupps WJ Jr. Artificial intelligence and academic publishing. J Cataract Refract Surg. 2023;49:655–6. https://doi.org/10.1097/j.jcrs.0000000000001223.
Google Scholar
Van Gelder RN. The pros and cons of artificial intelligence authorship in ophthalmology. Ophthalmology. 2023;130:670–1. https://doi.org/10.1016/j.ophtha.2023.05.018.
Google Scholar
Bressler NM. What artificial intelligence chatbots mean for editors, authors, and readers of peer-reviewed ophthalmic literature. JAMA Ophthalmol. 2023;141:514–5. https://doi.org/10.1001/jamaophthalmol.2023.1370.
Google Scholar
Apellis Pharmaceuticals. FDA approves Syfovre (pegcetacoplan) injection, the first and only in its class. 2023. Available at: Accessed August 18, 2024.
EyesOnEyeCare. FDA approves IVERIC bio’s IZERVAY (branciciclovir injection) for geographic atrophy. 2023. Available at: Accessed August 18, 2024.
Volpe NJ, Mirza RG. Chatbots, artificial intelligence, and the future of scientific reporting. JAMA Ophthalmol. 2023;141:824–5. https://doi.org/10.1001/jamaophthalmol.2023.3344.
Google Scholar
Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv. 2022, https://arxiv.org/abs/2201.11903.
Anisuzzaman DM, Malins JG, Friedman PA, Attia ZI. Fine-tuning large language models for specialized use cases. Mayo Clin Proc Digit Health. 2024;3:100184. https://doi.org/10.1016/j.mcpdig.2024.11.005.
Google Scholar
Ouyang L, Wu J, Jiang X, Almeida D, Wainwright CL, Mishkin P, et al. Training language models to follow instructions with human feedback. In: Proceedings of the Neural Information Processing Systems (NeurIPS) 2022; 2022. https://doi.org/10.48550/arXiv.2203.02155.
Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Proceedings of the 36th International Conference on Machine Learning. 2019:5243–52. https://doi.org/10.5555/3495724.3496517.
Nguyen Q, Nguyen DA, Dang K, Liu S, Nguyen K, Wang SY, et al. Advancing question-answering in ophthalmology with retrieval-augmented generation (RAG): Benchmarking open-source and proprietary large language models. J-GLOBAL. 2024. Available from: https://jglobal.jst.go.jp/en/detail?JGLOBAL_ID=202402211872512470.
Chen JS, Reddy AJ, Al-Sharif E, Shoji MK, Kalaw FGP, Eslani M, et al. Analysis of ChatGPT responses to ophthalmic cases: can ChatGPT think like an ophthalmologist. Ophthalmol Sci. 2024;5:100600. https://doi.org/10.1016/j.xops.2024.100600.
Google Scholar
Ullah E, Parwani A, Baig MM, Singh R. Challenges and barriers of using large language models (LLM) such as ChatGPT for diagnostic medicine with a focus on digital pathology – a recent scoping review. Diagn Pathol. 2024;19:43. https://doi.org/10.1186/s13000-024-01464-7.
Google Scholar
Celi LA, Cellini J, Charpignon ML, Dee EC, Dernoncourt F, Eber R, et al. Sources of bias in artificial intelligence that perpetuate healthcare disparities-a global review. PLOS Digit Health. 2022;1:e0000022. https://doi.org/10.1371/journal.pdig.0000022.
Google Scholar
Dychiao RGK, Alberto IRI, Artiaga JCM, Salongcay RP, Celi LA. Large language model integration in Philippine ophthalmology: early challenges and steps forward. Lancet Digit Health. 2024;6:e308. https://doi.org/10.1016/S2589-7500(24)00064-5.
Google Scholar
Restrepo D, Wu C, Tang Z, Shuai Z, Phan TNM, Ding J-E, et al. Multi-OphthaLingua: a multilingual benchmark for assessing and debiasing LLM ophthalmological QA in LMICs. AAAI. 2025;39:28321–30.
Tom E, Keane PA, Blazes M, Pasquale LR, Chiang MF, Lee AY, et al. Protecting data privacy in the age of AI-enabled ophthalmology. Transl Vis Sci Technol. 2020;9:36. https://doi.org/10.1167/tvst.9.2.36.
Google Scholar
Kalaw FGP, Baxter SL. Ethical considerations for large language models in ophthalmology. Curr Opin Ophthalmol. 2024;35:438–46. https://doi.org/10.1097/ICU.0000000000001083.
Google Scholar
Bernstein IA, Zhang YV, Govil D, Majid I, Chang RT, Sun Y, et al. Comparison of ophthalmologist and large language model chatbot responses to online patient eye care questions. JAMA Netw Open. 2023;6:e2330320. https://doi.org/10.1001/jamanetworkopen.2023.30320.
Google Scholar
Cohen SA, Brant A, Fisher AC, Pershing S, Do D, Pan C. Dr. Google vs. Dr. ChatGPT: exploring the use of artificial intelligence in ophthalmology by comparing the accuracy, safety, and readability of responses to frequently asked patient questions regarding cataracts and cataract surgery. Semin Ophthalmol. 2024;39:472–9. https://doi.org/10.1080/08820538.2024.2326058.
Google Scholar
Wilhelm TI, Roos J, Kaczmarczyk R. Large language models for therapy recommendations across 3 clinical specialties: comparative study. J Med Internet Res. 2023;25:e49324. https://doi.org/10.2196/49324.
Google Scholar
Xue X, Zhang D, Sun C, Shi Y, Wang R, Tan T, et al. Xiaoqing: A Q&A model for glaucoma based on LLMs. Comput Biol Med. 2024;174:108399. https://doi.org/10.1016/j.compbiomed.2024.108399.
Google Scholar
Biswas S, Logan NS, Davies LN, Sheppard AL, Wolffsohn JS. Assessing the utility of ChatGPT as an artificial intelligence-based large language model for information to answer questions on myopia. Ophthalmic Physiol Opt. 2023;43:1562–70. https://doi.org/10.1111/opo.13207.
Google Scholar
link