• About
  • Masthead
  • License Content
  • Advertise
  • Submit Press Release
  • RSS/Email List
  • 2MM Podcast
  • Write for us
  • Contact Us
2 Minute Medicine
No Result
View All Result

No products in the cart.

SUBSCRIBE
  • Specialties
    • All Specialties, All Recent Reports
    • Cardiology
    • Chronic Disease
    • Dermatology
    • Emergency
    • Endocrinology
    • Gastroenterology
    • Imaging and Intervention
    • Infectious Disease
    • Nephrology
    • Neurology
    • Obstetrics
    • Oncology
    • Ophthalmology
    • Pediatrics
    • Pharma
    • Preclinical
    • Psychiatry
    • Public Health
    • Pulmonology
    • Rheumatology
    • Surgery
  • AI Roundup
  • Pharma
  • The Scan+
  • Classics™+
    • 2MM+ Online Access
    • Paperback and Ebook
  • Rewinds
  • Visual
  • Podcasts
  • Partners
    • License Content
    • Submit Press Release
    • Advertise with Us
  • Account
    • Subscribe
    • Sign-in
    • My account
2 Minute Medicine
  • Specialties
    • All Specialties, All Recent Reports
    • Cardiology
    • Chronic Disease
    • Dermatology
    • Emergency
    • Endocrinology
    • Gastroenterology
    • Imaging and Intervention
    • Infectious Disease
    • Nephrology
    • Neurology
    • Obstetrics
    • Oncology
    • Ophthalmology
    • Pediatrics
    • Pharma
    • Preclinical
    • Psychiatry
    • Public Health
    • Pulmonology
    • Rheumatology
    • Surgery
  • AI Roundup
  • Pharma
  • The Scan+
  • Classics™+
    • 2MM+ Online Access
    • Paperback and Ebook
  • Rewinds
  • Visual
  • Podcasts
  • Partners
    • License Content
    • Submit Press Release
    • Advertise with Us
  • Account
    • Subscribe
    • Sign-in
    • My account
SUBSCRIBE
2 Minute Medicine
Subscribe
Home All Specialties Oncology

GPT-4 performing with superior scores on medical oncology examination questions

bySimon PanandAlex Chan
December 17, 2024
in Oncology, Preclinical
Reading Time: 3 mins read
0
Share on FacebookShare on Twitter

1. The large language model, ChatGPT-4, answered 85.0% of examination-style multiple-choice questions on medical oncology correctly, a performance superior to all other large language models and comparable with medical oncology trainees.

2. Approximately 80% of incorrect answers were rated by clinicians as having a medium to high risk of causing moderate to severe harm if acted upon in clinical practice.

Evidence Rating Level: 2 (Good)

Study Rundown: Large language models (LLMs) may have extraordinary utility across various healthcare settings. For example, potential applications in the field of oncology range from assistance in administrative tasks to clinical decision-making. This cross-sectional study therefore sought to evaluate the medical oncology knowledge of the LLMs, ChatGPT-3.5 (proprietary LLM 1), ChatGPT-4 (proprietary LLM 2), and various open-source LLMs. Proprietary LLM 1 and proprietary LLM 2 were evaluated on their performance across 147 medical oncology examination questions from ASCO’s Oncology Self-Assessment Series, ESMO’s Examination Trial Questions, and unseen original questions. Proprietary LLM 2 achieved the highest performance among all LLMs by answering 85.0% of questions correctly. However, roughly 64% of incorrect answers were considered to have a medium likelihood of causing patient harm, and roughly 18% of incorrect answers were considered to have a high likelihood of causing patient harm if acted upon in practice. Approximately 82% of incorrect answers had a medium or high likelihood of causing moderate or severe harm. Overall, this study found that LLMs are capable of performing well on examination-style multiple-choice medical oncology questions, with some safety concerns being raised surrounding the possible consequences of incorrect decision-making. As such, the use of LLMs in medical oncology may be best applied to low-risk settings or under intensive human supervision with guidelines in place to ensure the safe application of LLMs in clinical practice.

Click to read the study in JAMA Network Open

Relevant Reading: Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

RELATED REPORTS

Structured exercise intervention improves survival in colon cancer patients

2 Minute Medicine: Pharma Roundup – Gastric Immunotherapy Gains, Prostate Pill Expansion, Five-Minute Myeloma Dosing, and Streamlined CAR-T Access [July 8th 2025]

Tumor-infiltrating clonal hematopoiesis is associated with NSCLC recurrence

In-Depth [cross-sectional study]: In recent years, the potential utility of LLMs in healthcare settings has been an important topic for investigation. They have already been shown to be capable of passing the United States Medical Licensing Examination while demonstrating remarkable knowledge recall and reasoning abilities. However, the performance of LLMs on examinations across different medical subspecialties is highly varied, and their performance on medical oncology examinations is not yet known. This cross-sectional study therefore sought to investigate the medical oncology knowledge of LLMs and their performance across examination-style multiple choice medical oncology questions. Proprietary LLM 1 and proprietary LLM 2 were assessed on 52 questions from ASCO, 75 questions from ESMO and 20 original questions. Proprietary LLM 2 achieved the highest accuracy of all LLMs assessed at 85.0% (95% CI = 78.2% to 90.4%; P < 0.001 vs random answering) with similar performance across each of the question sets (80.8%, 95% CI = 67.5% to 90.4%, P < 0.001; 88.0%, 95% CI = 78.4% to 94.4%, P < 0.001; 85.0%, 95% CI = 62.1% to 96.8%, P < 0.001 for ASCO, ESMO and original questions respectively). Proprietary LLM 1 achieved an accuracy of 60.5% (95% CI = 50.0% to 66.4%; P < 0.001 vs random answering). Incorrect answers by proprietary LLM 2 were more common when questions involved knowledge from recent publications (Wilcoxon test P = 0.02), with 63.6% of incorrect answers being due to incorrect knowledge recall. Among incorrect answers by proprietary LLM 2, the likelihood of causing patient harm by applying the error in practice was considered medium in 63.6% of incorrect answers (95% CI = 43.0% to 85.4%) and high in 18.2% of incorrect answers (95% CI = 5.2% to 40.3%). The extent of possible harm was considered to be moderate in 63.6% of incorrect answers (95% CI = 43.0% to 85.4%) and likely to cause severe harm or lead to death in 18.2% of incorrect answers (95% CI = 5.2% to 40.3%).

Image: PD

©2024 2 Minute Medicine, Inc. All rights reserved. No works may be reproduced without expressed written consent from 2 Minute Medicine, Inc. Inquire about licensing here. No article should be construed as medical advice and is not intended as such by the authors or by 2 Minute Medicine, Inc. 

Tags: artificial infelligencechatGPTllmoncologyopenai
Previous Post

Johnson & Johnson’s Tremfya seeks to expand FDA approval for pediatric indications to treat juvenile psoriatic arthritis

Next Post

#VisualAbstract: Imlunestrant with or without Abemaciclib in Advanced Breast Cancer

RelatedReports

Development of a risk index for colorectal cancer screening
Gastroenterology

Structured exercise intervention improves survival in colon cancer patients

July 9, 2025
2 Minute Medicine: Pharma Roundup: Price Hikes, Breakthrough Approvals, Legal Showdowns, Biotech Expansion, and Europe’s Pricing Debate [May 12nd, 2025]
Pharma

2 Minute Medicine: Pharma Roundup – Gastric Immunotherapy Gains, Prostate Pill Expansion, Five-Minute Myeloma Dosing, and Streamlined CAR-T Access [July 8th 2025]

July 8, 2025
Lessons from real-world implementation of lung cancer screening
Oncology

Tumor-infiltrating clonal hematopoiesis is associated with NSCLC recurrence

July 7, 2025
All Specialties

Minimally invasive surgery is beneficial for epithelial ovarian cancer 

July 4, 2025
Next Post
#VisualAbstract: Imlunestrant with or without Abemaciclib in Advanced Breast Cancer

#VisualAbstract: Imlunestrant with or without Abemaciclib in Advanced Breast Cancer

Use of hydroxychloroquine may be protective for cardiovascular events in patients with systemic lupus erythematosus 

Androgen deprivation in prostate cancer: intermittent may compromise survival

Radiotherapy and abiraterone improve survival in low-volume metastatic castration-sensitive prostate cancer

2 Minute Medicine® is an award winning, physician-run, expert medical media company. Our content is curated, written and edited by practicing health professionals who have clinical and scientific expertise in their field of reporting. Our editorial management team is comprised of highly-trained MD physicians. Join numerous brands, companies, and hospitals who trust our licensed content.

Recent Reports

  • #VisualAbstract: Insulin Efsitora is Noninferior to Insulin Glargine in Type 2 Diabetes without Previous Insulin Therapy
  • Thrombolysis-to-puncture time greater than 70 minutes decreases odds of successful thrombectomy
  • #VisualAbstract: Obicetrapib Reduced LDL Cholesterol in Patients at High Cardiovascular Risk
License Content
Terms of Use | Disclaimer
Cookie Policy
Privacy Statement (EU)
Disclaimer

© 2021 2 Minute Medicine, Inc. - Physician-written medical news.

  • Specialties
    • All Specialties, All Recent Reports
    • Cardiology
    • Chronic Disease
    • Dermatology
    • Emergency
    • Endocrinology
    • Gastroenterology
    • Imaging and Intervention
    • Infectious Disease
    • Nephrology
    • Neurology
    • Obstetrics
    • Oncology
    • Ophthalmology
    • Pediatrics
    • Pharma
    • Preclinical
    • Psychiatry
    • Public Health
    • Pulmonology
    • Rheumatology
    • Surgery
  • AI Roundup
  • Pharma
  • The Scan
  • Classics™
    • 2MM+ Online Access
    • Paperback and Ebook
  • Rewinds
  • Visual
  • Podcasts
  • Partners
    • License Content
    • Submit Press Release
    • Advertise with Us
  • Account
    • Subscribe
    • Sign-in
    • My account
No Result
View All Result

© 2021 2 Minute Medicine, Inc. - Physician-written medical news.