The Educational Potential of ChatGPT: Assessing ChatGPT Responses to Common Patient Hip Arthroscopy Questions

The Educational Potential of ChatGPT: Assessing ChatGPT Responses to Common Patient Hip Arthroscopy Questions

Yasir AlShehri, MD, CANADA Mark Owen Mcconkey, MD, FRCSC, CANADA Parth Lodhia, MD, FRCSC, CANADA

Department of Orthopaedics, Faculty of Medicine, The University of British Columbia, Vancouver, BC, CANADA


2025 Congress   ePoster Presentation   2025 Congress   Not yet rated

 

Anatomic Location

Anatomic Structure

Patient Populations

Diagnosis Method

Sports Medicine

Treatment / Technique


Summary: ChatGPT can provide satisfactory but occasionally inaccurate answers to common patient hip arthroscopy questions. It has the potential to be a useful tool for patients in the future.


Introduction

ChatGPT (Generative Pre-trained Transformer) is web-based artificial intelligence chatbot that has the ability to create content with natural conversational flow using various techniques. It has gained immense popularity since its development, and there has been recent interest in its potential role in patient education. The purpose of this study was to assess the ability of ChatGPT to answer common patient questions regarding hip arthroscopy, and to analyze the accuracy and appropriateness of its responses.

Methods

Ten questions were selected from well-known patient education websites, and ChatGPT (version 3.5) responses to these questions were graded by two fellowship-trained hip preservation surgeons. Responses were analyzed, compared to the current literature, and graded from A to D (A being the highest, and D being the lowest) in a grading scale based on the accuracy and completeness of the response. If the grading differed between the two surgeons, a consensus was reached. Inter-rater agreement was calculated. The readability of responses was also assessed using the Flesch-Kincaid Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL).

Results

Responses received the following concensus grades: A (50%, n=5), B (30%, n=3), C (10%, n=1), D (10%, n=1) (Table 2). Inter-rater agreement based on initial individual grading was 30%. The mean FRES was 28.2 (SD±?9.2), corresponding to a college graduate level, ranging from 11.7 to 42.5. The mean FKGL was 14.4 (SD±1.8), ranging from 12.1 to 18, indicating a college student reading level.

Conclusion

ChatGPT can answer common patient questions regarding hip arthroscopy with satisfactory accuracy graded by two high-volume hip arthroscopists, however, incorrect information was identified in more than one instance. Caution must be observed when using ChatGPT for patient education related to hip arthroscopy.