Investigator

Stefano Patarnello

Agostino Gemelli University Polyclinic

SPStefano Patarnello
Papers(1)
Natural language proc…
Collaborators(5)
Andrea RosatiAnna FagottiLivia LilliMarco PetrilloMassimo Criscione
Institutions(3)
Agostino Gemelli Univ…Universit Cattolica D…Università degli Stud…

Papers

Natural language processing as consultation service platform or clinical decision support system in gynecologic oncology: a systematic review.

Natural language processing is emerging as a key application of artificial intelligence in oncology. This systematic review aims to evaluate the performance and methodological frameworks of natural language processing systems in gynecologic oncology. We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis 2020 guidelines. MEDLINE, EMBASE, and Web of Science were searched for studies published between January 2015 and February 2025. Outcomes were synthesized across 3 research questions: the accuracy of natural language processing systems used as consultation service platforms; the accuracy of natural language processing systems used as clinical decision support systems; and the benchmarking methodologies applied, including their associated methodological outcomes. Consultation service platforms deliver general medical information, whereas clinical decision support systems provide recommendations that are integrated into the patient's clinical workflow. This review analyzed 12 retrospective studies. Consultation service platforms were less accurate than clinicians (60% vs 86.7%) and rated lower in response quality (2.96/5 vs 4.2/5) but outperformed guideline-based answers (1.54/2 vs 1.38/2). In cervical cancer, ChatGPT surpassed experts (7.0 vs 6.1). ChatGPT-4 showed a concordance of 70% with the National Comprehensive Cancer Network and 60% with the European Society of Gynaecological Oncology guidelines in clinical decision support tasks, with an overall recommendation accuracy of 75%. IBM Watson achieved a 72.8% concordance with guidelines. Prompting was applied from 100% to 37.5% across studies. Qualitative benchmarking varied across studies: 83.3% used clinical guidelines and 37.5% of consultation service platforms studies used expert answers. Four- or 5-point scales and binary scoring were used to assess consultation service platforms and clinical decision support systems, respectively. Clinicians remain superior in complex reasoning, but natural language processing systems demonstrate robust performance in guideline-driven tasks, with advantages in speed, readability, and reproducibility. However, performance declined in nuanced scenarios and among under-represented patient sub-groups. Large language models currently play a supportive rather than substitutive role in gynecologic oncology.

7Works
1Papers
5Collaborators