We sought to apply novel natural language processing (NLP) tools to explore patient experiences of pre-cervical cancer on social media and validate the performance of these tools. All posts and comments were extracted from the forum r/PreCervicalCancer on social media platform Reddit. Using BERTopic, posts were clustered into topics according to their semantic similarity, which were manually reviewed. Topic headings were derived using a large language model (LLM) and compared to manually curated headings. Clustering outliers were reassigned by BERTopic, an LLM and by manual methods in parallel and compared. Post and comment sentiment were quantitatively analysed using VADER. Post upvote scores and comments counts were analysed to measure community engagement. 4592 posts were extracted from r/PreCervicalCancer. Posts clustered into 10 different topics using BERTopic with 88.0% accuracy. 80.0% of topic headings generated by GPT-4o mini were deemed appropriate. Reassignment of clustering outliers by BERTopic and GPT-4o mini was limited, 52.8% and 41.1% accuracy, respectively. Key clinical findings reflect several common concerns among patients, particularly regarding specific lasting physical and psychological impact of procedures like LEEP, result anxiety, and challenges in healthcare navigation. Comments had less negative sentiment than posts (Cohen's d = 0.46), suggesting support. In this cross-sectional study, we validated NLP tools to analyse content, sentiment and reactions to 4592 posts on pre-cervical cancer. Our findings suggest that, with minimal human oversight, automated methods can accurately conduct large-scale analyses of similar clinical content, unlocking new insights of patient experiences using non-traditional data sources.