Research Projects

Data Integration for Time-to-Event Outcomes

Despite significant reductions in cancer mortality in the past three decades, racial disparities in cancer mortality persist and can be attributed to underlying patterns of inequity like barriers to access, socioeconomic status, hospital status, and insurance status.

High quality data are available from national cancer surveillance registries and can potentially be used to study disparities in cancer outcomes. When estimating disparities in cancer mortality, using the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) registry means excluding information on important potential confounders like hospital type, insurance status, and comorbidities. The National Cancer Database (NCDB), which does provide those variables, excludes cause-of-death, making it impossible to study cancer-specific mortality. Together, SEER and NCDB provide a wealth of information on cancer incidence and mortality in the U.S.

Combining their strengths can provide a more holistic view of racial disparities in cancer mortality. To make efficient use of existing cancer surveillance databases through data integration, there is a significant need to develop methods specifically for time-to-event outcomes.

In this setting, I work on several statistical methods:

  1. A doubly robust regression method to adjust for unmeasured confounders only available in a second data source.

  2. A multiple imputation (MI) approach for data integration with a time-to-event outcome.

  3. A data integration method for integrating data from more than two datasets with no dataset containing all the variables and outcome of interest.

Figure 1. Motivating data structure for data integration methods development.

Surrogate Paradox Risk

Clinical trials often collect surrogate endpoints other than the true endpoint of interest. Surrogate endpoints are helpful because they usually occur more frequently resulting in reductions in required study sample size and duration.

When using surrogate endpoints the most important assumption is that the treatment effect on surrogate accurately predicts the treatment effect on the true endpoint. There are settings in which this assumption is violated even though the treatment is positively correlated with the surrogate and the surrogate is positively correlated with the true endpoint—a phenomenon labeled “surrogate paradox”.

When data are available on multiple clinical trials in which both the true and surrogate endpoints have been measured, the quality of surrogates can be assessed by the degree of correlation between trial-level treatment effects on the two outcomes under the causal association framework. However, high correlation still does not preclude the possibility of surrogate paradox, meaning that a surrogate that has been deemed high quality by existing measures can still provide incorrect conclusions about the treatment effect in a new study.

I develop methods for identifying the risk of surrogate paradox in subpopulations when data on multiple trials are available. Incorporating covariate information can provide valuable insights into the mechanism of the surrogate paradox and identify groups that are particularly vulnerable.

Health Disparity Applications

In collaboration with epidemiologists and Neurologists from the Brain Attack Surveillance in Corpus Christi (BASIC) study I worked on multiple projects comparing post-stroke health outcomes between Mexican Americans (MAs), African Americans (AAs), and Non-Hispanic Whites (NHWs).

Figure 2. Observed probabilities of transition between various tobacco use states from four waves of the PATH study incorporating survey weights for a nationally representative population.

In collaboration with the Center for the Assessment of Tobacco Regulations at the University of Michigan I have studied the patterns of poly-tobacco product use using multi-state Markov models. I analyzed the probability of transition between single, dual, and poly tobacco use states in data from the Population Assessment of Tobacco and Health (PATH) study, a nationally representative study launched by the NIH and FDA to collect data on tobacco use behavior in order to inform future tobacco control policies. To analyze this data, we incorporated complex survey weights in the multi-state Markov modeling framework to estimate transition hazard rates between use categories and evaluated whether there are observed disparities in the transition rates by sex, race, education, income, or age.

Figure 3. Tobacco product transitions of interest in the EXHALE cohort study.

In the EXHALE study cohort, I analyzed the short-term dynamics of tobacco product transitions and evaluated differences in transition rates by demographic, background, dependence levels, and biomarker measures using multi-state Markov models, as well as evaluating whether the length of time using a tobacco product combination affected rates of transitioning to different product combinations.