Prompt Engineering for Eastern Cooperative Oncology Group Status Extraction: Comparing Large Language Model Techniques

Q: How does OCRA curate its research paper database?

OCRA indexes peer-reviewed papers from PubMed and CrossRef, enriched with MeSH terms, author profiles, and citation metrics. Papers are categorized by cancer type, research method, and clinical relevance.

Q: Can I search papers by MeSH term or author?

Yes. Use the search bar to filter papers by MeSH term, author name, journal, keyword, or cancer type. Each paper includes linked MeSH terms and author profiles for further exploration.

Q: How current is the paper database?

The database is updated regularly through automated ingestion from PubMed and CrossRef. Most papers appear within days of their PubMed indexing date.

Meenakshi Dubey; Kok Joon Chong; Yuba Raj Pun; Melissa Ooi; Iain Bee Huat Tan; David Shao Peng Tan; Kee Yuan Ngiam; Hwee Lin Wee

doi:10.1200/CCI-25-00226

PURPOSE

Eastern Cooperative Oncology Group (ECOG) performance status is critical for cancer patient management, yet it is often documented only in unstructured clinical notes. This study compares several approaches to extract ECOG status from oncology notes, focusing on advanced prompting techniques for large language models (LLMs).

METHODS

We evaluated four ECOG extraction approaches on unstructured clinical notes from patients with non–small cell lung cancer, multiple myeloma, or ovarian cancer (2017-2021). The approaches were a rule-based natural language processing algorithm, simple LLM prompting, and two advanced prompts (chain-of-thought and Double Filtering) using a domain-tuned LLM (LLAMAv3.2). Performance was measured on a binary outcome (any ECOG documented v none) and a three-class outcome (ECOG 0-1 v ≥2 v none) and via an adapted QUEST questionnaire for human evaluation.

RESULTS

Both CoT and double filtering technique (DFT) achieved 94% accuracy, outperforming the rule-based method (91%) and simple prompting (86%). DFT had the highest specificity (0.91) and positive predictive value (PPV; 0.93), whereas CoT attained the highest sensitivity (0.98). In the QUEST evaluation, DFT and CoT scored higher on output quality, reasoning, bias reduction, and user satisfaction than the simple prompt. DFT received the top satisfaction rating. In the three-class analysis, DFT and CoT again performed best (accuracy 0.91 v 0.87) and DFT was most sensitive for ECOG ≥2 cases. Estimates for ECOG ≥2 remained imprecise because of the small sample (n = 20). All methods sometimes hallucinated ECOG status.

CONCLUSION

Advanced LLM prompting improved ECOG extraction over basic methods. DFT and CoT each showed specific strengths (DFT had higher PPV and user satisfaction; CoT achieved higher sensitivity). These approaches appear to be generalizable across cancer types. Key implementation considerations include computational cost and human oversight. Overall, advanced prompting can standardize ECOG documentation, accelerate patient cohort identification, and inform personalized treatment planning.

Prompt Engineering for Eastern Cooperative Oncology Group Status Extraction: Comparing Large Language Model Techniques

PURPOSE

METHODS

RESULTS

CONCLUSION

Links

Journal

TL;DR

Institutions

Authors