Journal

Statistical Methods in Medical Research

Papers (5)

Sample size calculation for mixture cure model with restricted mean survival time as a primary endpoint

It is not uncommon for a substantial proportion of patients to be cured (or survive long-term) in clinical trials with time-to-event endpoints, such as the endometrial cancer trial. When designing a clinical trial, a mixture cure model should be used to fully consider the cure fraction. Previously, mixture cure model sample size calculations were based on the proportional hazards assumption of latency distribution between groups, and the log-rank test was used for deriving sample size formulas. In real studies, the latency distributions of the two groups often do not satisfy the proportional hazards assumptions. This article has derived a sample size calculation formula for a mixture cure model with restricted mean survival time as the primary endpoint, and did simulation and example studies. The restricted mean survival time test is not subject to proportional hazards assumptions, and the difference in treatment effect obtained can be quantified as the number of years (or months) increased or decreased in survival time, making it very convenient for clinical patient-physician communication. The simulation results showed that the sample sizes estimated by the restricted mean survival time test for the mixture cure model were accurate regardless of whether the proportional hazards assumptions were satisfied and were smaller than the sample sizes estimated by the log-rank test in most cases for the scenarios in which the proportional hazards assumptions were violated.

Dynamic prediction by landmarking with data from cohort subsampling designs

Longitudinal data are often available in cohort studies and clinical settings, such as covariates collected at cohort follow-up visits or prescriptions captured in electronic health records. Such longitudinal information, if correlates with the health event of interest, may be incorporated to dynamically predict the probability of a health event with better precision. Landmarking is a popular approach to dynamic prediction. There are well-established methods for landmarking using full cohort data, but collecting data on all cohort members may not be feasible when resource is limited. Instead, one may select a subset of the cohort using subsampling designs, and only collect data on this subset. In this work, we present conditional likelihood and inverse-probability weighted methods for landmarking using data from cohort subsampling designs, and discuss considerations for choosing a particular method. Simulations are conducted to evaluate the applicability of the methods and their predictive performance in different scenarios. Results show that our methods have similar predictive performance to the full cohort analysis but only use small fractions of the full cohort data. We use real nested case-control data from the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial to illustrate the methods.

A competing risks model with binary time varying covariates for estimation of breast cancer risks in BRCA1 families

Mammographic screening and prophylactic surgery such as risk-reducing salpingo oophorectomy can potentially reduce breast cancer risks among mutation carriers of BRCA families. The evaluation of these interventions is usually complicated by the fact that their effects on breast cancer may change over time and by the presence of competing risks. We introduce a correlated competing risks model to model breast and ovarian cancer risks within BRCA1 families that accounts for time-varying covariates. Different parametric forms for the effects of time-varying covariates are proposed for more flexibility and a correlated gamma frailty model is specified to account for the correlated competing events.We also introduce a new ascertainment correction approach that accounts for the selection of families through probands affected with either breast or ovarian cancer, or unaffected. Our simulation studies demonstrate the good performances of our proposed approach in terms of bias and precision of the estimators of model parameters and cause-specific penetrances over different levels of familial correlations. We applied our new approach to 498 BRCA1 mutation carrier families recruited through the Breast Cancer Family Registry. Our results demonstrate the importance of the functional form of the time-varying covariate effect when assessing the role of risk-reducing salpingo oophorectomy on breast cancer. In particular, under the best fitting time-varying covariate model, the overall effect of risk-reducing salpingo oophorectomy on breast cancer risk was statistically significant in women with BRCA1 mutation.

Hierarchical continuous-time inhomogeneous hidden Markov model for cancer screening with extensive followup data

Continuous-time hidden Markov models are an attractive approach for disease modeling because they are explainable and capable of handling both irregularly sampled, skewed and sparse data arising from real-world medical practice, in particular to screening data with extensive followup. Most applications in this context consider time-homogeneous models due to their relative computational simplicity. However, the time homogeneous assumption is too strong to accurately model the natural history of many diseases including cancer. Moreover, cancer risk across the population is not homogeneous either, since exposure to disease risk factors can vary considerably between individuals. This is important when analyzing longitudinal datasets and different birth cohorts. We model the heterogeneity of disease progression and regression using piece-wise constant intensity functions and model the heterogeneity of risks in the population using a latent mixture structure. Different submodels under the mixture structure employ the same types of Markov states reflecting disease progression and allowing both clinical interpretation and model parsimony. We also consider flexible observational models dealing with model over-dispersion in real data. An efficient, scalable Expectation-Maximization algorithm for inference is proposed with the theoretical guaranteed convergence property. We demonstrate our method’s superior performance compared to other state-of-the-art methods using synthetic data and a real-world cervical cancer screening dataset from the Cancer Registry of Norway. Moreover, we present two model-based risk stratification methods that identify the risk levels of individuals.

Issues and solutions in biomarker evaluation when subclasses are involved under binary classification

In practice, it is common to evaluate biomarkers in binary classification settings (e.g. non-cancer vs. cancer) where one or both main classes involve multiple subclasses. For example, non-cancer class might consist of healthy subjects and benign cases, while cancer class might consist of subjects at early and late stages. The standard practice is pooling within each main class, i.e. all non-cancer subclasses are pooled together to create a control group, and all cancer subclasses are pooled together to create a case group. Based on the pooled data, the area under ROC curve ( AUC) and other characteristics are estimated under binary classification for the purpose of biomarker evaluation. Despite the popularity of this pooling strategy in practice, its validity and implication in biomarker evaluation have never been carefully inspected. This paper aims to demonstrate that pooling strategy can be seriously misleading in biomarker evaluation. Furthermore, we present a new diagnostic framework as well as new accuracy measures appropriate for biomaker evaluation under such settings. In the end, an ovarian cancer data set is analyzed.

Publisher

SAGE Publications

ISSN

0962-2802