AI-driven patient support: Evaluating the effectiveness of ChatGPT-4 in addressing queries about ovarian cancer compared with healthcare professionals in gynecologic oncology

Q: How does OCRA curate its research paper database?

OCRA indexes peer-reviewed papers from PubMed and CrossRef, enriched with MeSH terms, author profiles, and citation metrics. Papers are categorized by cancer type, research method, and clinical relevance.

Q: Can I search papers by MeSH term or author?

Yes. Use the search bar to filter papers by MeSH term, author name, journal, keyword, or cancer type. Each paper includes linked MeSH terms and author profiles for further exploration.

Q: How current is the paper database?

The database is updated regularly through automated ingestion from PubMed and CrossRef. Most papers appear within days of their PubMed indexing date.

doi:10.1007/s00520-025-09389-7

Artificial intelligence (AI) chatbots, such as ChatGPT-4, allow a user to ask questions on an interactive level. This study evaluated the correctness and completeness of responses to questions about ovarian cancer from a GPT-4 chatbot, LilyBot, compared with responses from healthcare professionals in gynecologic cancer care. Fifteen categories of questions about ovarian cancer were collected from an online patient Chatgroup forum. Ten healthcare professionals in gynecologic oncology generated 150 questions and responses relative to these topics. Responses from LilyBot and the healthcare professionals were scored for correctness and completeness by eight independent healthcare professionals with similar backgrounds blinded to the identity of the responders. Differences between groups were analyzed with Mann-Whitney U and Kruskal-Wallis tests, followed by Tukey's post hoc comparisons. Mean scores for overall performance for all 150 questions were significantly higher for LilyBot compared with the healthcare professionals for correctness (5.31 ± 0.98 vs. 5.07 ± 1.00, p = 0.017; range = 1-6) and completeness (2.66 ± 0.55 vs. 2.36 ± 0.55, p < 0.001; range = 1-3). LilyBot had significantly higher scores for immunotherapy compared with the healthcare professionals for correctness (6.00 ± 0.00 vs. 4.70 ± 0.48, p = 0.020) and completeness (3.00 ± 0.00 vs. 2.00 ± 0.00, p < 0.010); and gene therapy for completeness (3.00 ± 0.00 vs. 2.20 ± 0.42, p = 0.023). The significantly better performance by LilyBot compared with healthcare professionals highlights the potential of ChatGPT-4-based dialogue systems to provide patients with clinical information about ovarian cancer.

AI-driven patient support: Evaluating the effectiveness of ChatGPT-4 in addressing queries about ovarian cancer compared with healthcare professionals in gynecologic oncology

Links

TL;DR

Authors

Funding