• About
  • Masthead
  • License Content
  • Advertise
  • Submit Press Release
  • RSS/Email List
  • 2MM Podcast
  • Write for us
  • Contact Us
2 Minute Medicine
No Result
View All Result

No products in the cart.

SUBSCRIBE
  • Specialties
    • All Specialties, All Recent Reports
    • Cardiology
    • Chronic Disease
    • Dermatology
    • Emergency
    • Endocrinology
    • Gastroenterology
    • Imaging and Intervention
    • Infectious Disease
    • Nephrology
    • Neurology
    • Obstetrics
    • Oncology
    • Ophthalmology
    • Pediatrics
    • Pharma
    • Preclinical
    • Psychiatry
    • Public Health
    • Pulmonology
    • Rheumatology
    • Surgery
  • The Scan+
  • Wellness
  • Classics™+
    • 2MM+ Online Access
    • Paperback and Ebook
  • Rewinds
  • Visual
  • Career
  • Podcasts
  • Partners
    • License Content
    • Submit Press Release
    • Advertise with Us
  • Account
    • Subscribe
    • Sign-in
    • My account
2 Minute Medicine
  • Specialties
    • All Specialties, All Recent Reports
    • Cardiology
    • Chronic Disease
    • Dermatology
    • Emergency
    • Endocrinology
    • Gastroenterology
    • Imaging and Intervention
    • Infectious Disease
    • Nephrology
    • Neurology
    • Obstetrics
    • Oncology
    • Ophthalmology
    • Pediatrics
    • Pharma
    • Preclinical
    • Psychiatry
    • Public Health
    • Pulmonology
    • Rheumatology
    • Surgery
  • The Scan+
  • Wellness
  • Classics™+
    • 2MM+ Online Access
    • Paperback and Ebook
  • Rewinds
  • Visual
  • Career
  • Podcasts
  • Partners
    • License Content
    • Submit Press Release
    • Advertise with Us
  • Account
    • Subscribe
    • Sign-in
    • My account
SUBSCRIBE
2 Minute Medicine
Subscribe
Home All Specialties Oncology

GPT-4 performing with superior scores on medical oncology examination questions

bySimon PanandAlex Chan
December 17, 2024
in Oncology, Preclinical
Reading Time: 3 mins read
0
Share on FacebookShare on Twitter

1. The large language model, ChatGPT-4, answered 85.0% of examination-style multiple-choice questions on medical oncology correctly, a performance superior to all other large language models and comparable with medical oncology trainees.

2. Approximately 80% of incorrect answers were rated by clinicians as having a medium to high risk of causing moderate to severe harm if acted upon in clinical practice.

Evidence Rating Level: 2 (Good)

Study Rundown: Large language models (LLMs) may have extraordinary utility across various healthcare settings. For example, potential applications in the field of oncology range from assistance in administrative tasks to clinical decision-making. This cross-sectional study therefore sought to evaluate the medical oncology knowledge of the LLMs, ChatGPT-3.5 (proprietary LLM 1), ChatGPT-4 (proprietary LLM 2), and various open-source LLMs. Proprietary LLM 1 and proprietary LLM 2 were evaluated on their performance across 147 medical oncology examination questions from ASCO’s Oncology Self-Assessment Series, ESMO’s Examination Trial Questions, and unseen original questions. Proprietary LLM 2 achieved the highest performance among all LLMs by answering 85.0% of questions correctly. However, roughly 64% of incorrect answers were considered to have a medium likelihood of causing patient harm, and roughly 18% of incorrect answers were considered to have a high likelihood of causing patient harm if acted upon in practice. Approximately 82% of incorrect answers had a medium or high likelihood of causing moderate or severe harm. Overall, this study found that LLMs are capable of performing well on examination-style multiple-choice medical oncology questions, with some safety concerns being raised surrounding the possible consequences of incorrect decision-making. As such, the use of LLMs in medical oncology may be best applied to low-risk settings or under intensive human supervision with guidelines in place to ensure the safe application of LLMs in clinical practice.

Click to read the study in JAMA Network Open

Relevant Reading: Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

RELATED REPORTS

#VisualAbstract Lack of sentinel-lymph node biopsy does not alter survival in early invasive breast cancer

Venetoclax plus azacitidine demonstrating efficacy for high-risk myelodysplastic syndromes

Lack of sentinel-lymph node biopsy does not alter survival in early invasive breast cancer

In-Depth [cross-sectional study]: In recent years, the potential utility of LLMs in healthcare settings has been an important topic for investigation. They have already been shown to be capable of passing the United States Medical Licensing Examination while demonstrating remarkable knowledge recall and reasoning abilities. However, the performance of LLMs on examinations across different medical subspecialties is highly varied, and their performance on medical oncology examinations is not yet known. This cross-sectional study therefore sought to investigate the medical oncology knowledge of LLMs and their performance across examination-style multiple choice medical oncology questions. Proprietary LLM 1 and proprietary LLM 2 were assessed on 52 questions from ASCO, 75 questions from ESMO and 20 original questions. Proprietary LLM 2 achieved the highest accuracy of all LLMs assessed at 85.0% (95% CI = 78.2% to 90.4%; P < 0.001 vs random answering) with similar performance across each of the question sets (80.8%, 95% CI = 67.5% to 90.4%, P < 0.001; 88.0%, 95% CI = 78.4% to 94.4%, P < 0.001; 85.0%, 95% CI = 62.1% to 96.8%, P < 0.001 for ASCO, ESMO and original questions respectively). Proprietary LLM 1 achieved an accuracy of 60.5% (95% CI = 50.0% to 66.4%; P < 0.001 vs random answering). Incorrect answers by proprietary LLM 2 were more common when questions involved knowledge from recent publications (Wilcoxon test P = 0.02), with 63.6% of incorrect answers being due to incorrect knowledge recall. Among incorrect answers by proprietary LLM 2, the likelihood of causing patient harm by applying the error in practice was considered medium in 63.6% of incorrect answers (95% CI = 43.0% to 85.4%) and high in 18.2% of incorrect answers (95% CI = 5.2% to 40.3%). The extent of possible harm was considered to be moderate in 63.6% of incorrect answers (95% CI = 43.0% to 85.4%) and likely to cause severe harm or lead to death in 18.2% of incorrect answers (95% CI = 5.2% to 40.3%).

Image: PD

©2024 2 Minute Medicine, Inc. All rights reserved. No works may be reproduced without expressed written consent from 2 Minute Medicine, Inc. Inquire about licensing here. No article should be construed as medical advice and is not intended as such by the authors or by 2 Minute Medicine, Inc. 

Tags: artificial infelligencechatGPTllmoncologyopenai
Previous Post

Johnson & Johnson’s Tremfya seeks to expand FDA approval for pediatric indications to treat juvenile psoriatic arthritis

Next Post

#VisualAbstract: Imlunestrant with or without Abemaciclib in Advanced Breast Cancer

RelatedReports

#VisualAbstract Lack of sentinel-lymph node biopsy does not alter survival in early invasive breast cancer
StudyGraphics

#VisualAbstract Lack of sentinel-lymph node biopsy does not alter survival in early invasive breast cancer

March 27, 2025
Thrombophilia-associated stillbirth risk appears limited to factor V Leiden
Oncology

Venetoclax plus azacitidine demonstrating efficacy for high-risk myelodysplastic syndromes

March 20, 2025
Exercise associated with decreased breast cancer risk
Oncology

Lack of sentinel-lymph node biopsy does not alter survival in early invasive breast cancer

March 18, 2025
Quick Take: The clinical effectiveness of sertraline in primary care and the role of depression severity and duration (PANDA): a pragmatic, double-blind, placebo-controlled randomized trial
Obstetrics

Lenvatinib plus pembrolizumab not associated with improved survival compared to standard therapy for recurrent endometrial cancer

March 20, 2025
Next Post
#VisualAbstract: Imlunestrant with or without Abemaciclib in Advanced Breast Cancer

#VisualAbstract: Imlunestrant with or without Abemaciclib in Advanced Breast Cancer

Use of hydroxychloroquine may be protective for cardiovascular events in patients with systemic lupus erythematosus 

Androgen deprivation in prostate cancer: intermittent may compromise survival

Radiotherapy and abiraterone improve survival in low-volume metastatic castration-sensitive prostate cancer

2 Minute Medicine® is an award winning, physician-run, expert medical media company. Our content is curated, written and edited by practicing health professionals who have clinical and scientific expertise in their field of reporting. Our editorial management team is comprised of highly-trained MD physicians. Join numerous brands, companies, and hospitals who trust our licensed content.

Recent Reports

  • Self reported physical activity not significantly different across different severities of hemophilia
  • Ketorolac may have a role in the perioperative management of aortic dissection
  • Nivolumab plus ipilimumab improves progression-free survival in metastatic colorectal cancer
License Content
Terms of Use | Disclaimer
Cookie Policy
Privacy Statement (EU)
Disclaimer

© 2021 2 Minute Medicine, Inc. - Physician-written medical news.

  • Specialties
    • All Specialties, All Recent Reports
    • Cardiology
    • Chronic Disease
    • Dermatology
    • Emergency
    • Endocrinology
    • Gastroenterology
    • Imaging and Intervention
    • Infectious Disease
    • Nephrology
    • Neurology
    • Obstetrics
    • Oncology
    • Ophthalmology
    • Pediatrics
    • Pharma
    • Preclinical
    • Psychiatry
    • Public Health
    • Pulmonology
    • Rheumatology
    • Surgery
  • The Scan
  • Wellness
  • Classics™
    • 2MM+ Online Access
    • Paperback and Ebook
  • Rewinds
  • Visual
  • Career
  • Podcasts
  • Partners
    • License Content
    • Submit Press Release
    • Advertise with Us
  • Account
    • Subscribe
    • Sign-in
    • My account
No Result
View All Result

© 2021 2 Minute Medicine, Inc. - Physician-written medical news.