Application and validation of AI-driven methods to explore patient experiences of pre-cervical cancer

Q: How does OCRA curate its research paper database?

OCRA indexes peer-reviewed papers from PubMed and CrossRef, enriched with MeSH terms, author profiles, and citation metrics. Papers are categorized by cancer type, research method, and clinical relevance.

Q: Can I search papers by MeSH term or author?

Yes. Use the search bar to filter papers by MeSH term, author name, journal, keyword, or cancer type. Each paper includes linked MeSH terms and author profiles for further exploration.

Q: How current is the paper database?

The database is updated regularly through automated ingestion from PubMed and CrossRef. Most papers appear within days of their PubMed indexing date.

doi:10.1016/j.ejogrb.2026.114953

We sought to apply novel natural language processing (NLP) tools to explore patient experiences of pre-cervical cancer on social media and validate the performance of these tools. All posts and comments were extracted from the forum r/PreCervicalCancer on social media platform Reddit. Using BERTopic, posts were clustered into topics according to their semantic similarity, which were manually reviewed. Topic headings were derived using a large language model (LLM) and compared to manually curated headings. Clustering outliers were reassigned by BERTopic, an LLM and by manual methods in parallel and compared. Post and comment sentiment were quantitatively analysed using VADER. Post upvote scores and comments counts were analysed to measure community engagement. 4592 posts were extracted from r/PreCervicalCancer. Posts clustered into 10 different topics using BERTopic with 88.0% accuracy. 80.0% of topic headings generated by GPT-4o mini were deemed appropriate. Reassignment of clustering outliers by BERTopic and GPT-4o mini was limited, 52.8% and 41.1% accuracy, respectively. Key clinical findings reflect several common concerns among patients, particularly regarding specific lasting physical and psychological impact of procedures like LEEP, result anxiety, and challenges in healthcare navigation. Comments had less negative sentiment than posts (Cohen's d = 0.46), suggesting support. In this cross-sectional study, we validated NLP tools to analyse content, sentiment and reactions to 4592 posts on pre-cervical cancer. Our findings suggest that, with minimal human oversight, automated methods can accurately conduct large-scale analyses of similar clinical content, unlocking new insights of patient experiences using non-traditional data sources.

Application and validation of AI-driven methods to explore patient experiences of pre-cervical cancer

Links

Journal

Authors