Casual inference in observational studies


Dr. Bo Lu, College of Public Health, Biostatistics
Rank at time of award: Assistant Professor
Dr. Xinyi Xu, Department of Statistics
Rank at time of award: Assistant Professor


Observational studies are very popular in social sciences, epidemiology and some clinical studies because of its convenience and relatively low cost. Unlike randomized experiments, in observational studies researchers cannot assign study subjects into treatment or control groups using a random mechanism, which makes it very difficult to draw a causal relationship between the treatment and the observed outcomes. Therefore, appropriate statistical methods for causal inference in observational studies  are  in  high demand.  Both econometricians and  statisticians have  explored  this methodological challenge for many years. Heckman proposed the "difference-in-difference" method in 1970's; Rubin and Rosenbaum ingeniously advocated the propensity score approach since the early 1980's and published a ground-breaking  paper in 1983. A propensity score is defined as the probability of receiving treatment given the observed covariates, such as age, sex and income. With matching or stratification on propensity score, the researchers could use the observed covariates to reduce the selection bias introduced by the fact that the subjects have self-selected the treatment or control group, and therefore lead to more valid causal inference on the treatment effect. Recently, the propensity score approach has gained increasing attention and popularity in sociology (e.g Smith, 1997), economics (e.g. Dehejie and Wahba, 1998; Smith and Todd, 2003) and political science (e.g. Imai, 2004).
Currently, for  cross-sectional data, a rich set of statistical methods based on propensity score matching have been developed to control for the selection bias, such as pair matching, 1-k matching, variable matching, full matching and matching with several control groups. Moreover, Rosenbaum (2002) proposed a sensitivity analysis to investigate the impact of potential deviations from the ignorable treatment assignment mechanism (i.e. the treatment assignment only depends on the observed covariates).   However, the matching methods to deal with longitudinal studies are conspicuously scarce. One of the reasons is that the matching technique for cross-sectional studies cannot be easily extended to the longitudinal setup since subjects may receive multiple treatments over time and the subjects' continuation in the study could depend on some post-treatment variables or intermediate outcomes.  We'd like to develop some general statistical methodologies that can be potentially applied to various surveys and studies.

Publications resulting from this seed grant

2012. Bo Lu, Zhenchao Qian, Anna Cunningham and Chih-Lin Li. Estimating the effect of premarital cohabitation on timing of marital disruption: Using propensity score matching in event history analysis. Sociological Methods & Research 41:440.

2011. Lu B., Greevy R., Xu X. and Beck C. Optimal nonbipartite matching and its statistical applications, The American Statistician  65(1): 21-30.  PMCID: PMC3501247