Evaluation of the accuracy and safety of machine translation of patient-specific discharge instructions: a comparative analysis

Evaluation of the accuracy and safety of machine translation of patient-specific discharge instructions: a comparative analysis

Authors: Marianna Kong, Alicia Fernandez, Jaskaran Bains, Ana Milisavljevic, Katherine C. Brooks, Akash Shanmugam, Leslie Avilez, Junhong Li, Vladyslav Honcharov, Andersen Yang, Elaine C. Khoong

This study evaluated the accuracy and potential for harm of ChatGPT-4 and Google Translate when translating emergency department discharge instructions from English into Spanish, Chinese, and Russian. Both tools demonstrated high sentence-level accuracy for Spanish and Chinese (≥90%), but performed less well for Russian. ChatGPT-4 was generally more accurate than Google Translate, especially for Chinese and Russian. Despite some inaccuracies in full instruction sets, the potential for clinical harm was low (≤6%). These findings suggest that machine translation tools, particularly ChatGPT-4, may help address language barriers in low-risk clinical contexts, though caution and professional oversight remain essential.