# Propensity Score Modeling

There will be an interest among many users of SII data to draw generalized causal inferences about the potential treatment effects of CSR programs. Given the selection of high poverty schools and the fact that schools are not randomly assigned to CSR designs, it becomes necessary to adjust for possible selection bias into the “treatment”. (See, Table 1 showing a comparison of SII sample to the ECLS representative sample.) Adjusting for such a bias strengthens empirical arguments of causal effects attributed to program designs. To adjust for possible selection bias of SII data, we used the method of propensity score stratification proposed by Rosenbaum and Rubin (1983). The objective behind applying this method is to compare observed units with a similar probability of being selected in the treated group. Once units are matched---and assuming there are no additional confounding covariates, the resulting groupings are deemed to have “strongly ignorable treatment assignment”. This means that the causal difference in means can be calculated by subtracting the mean of the comparison group from the mean of the treated group within matched samples (in our case within strata).

The logic of propensity score stratification is as follows. Each unit (in this instance, a school), whether treated or not, has two potential outcomes Y1 (if treated) and Y0 (if control). The causal effect of the treatment (selected CSR program) is the difference between Y1 and Y0 for each unit. Since the unit either belongs to the control group or the treated group, it is impossible to observe both Y1 and Y0 for a given unit. However, we can estimate the average causal effect of a treatment in a population under the assumption that treatment assignment is independent of the potential outcomes. In that case, the average of the treated cases minus the untreated cases provides an unbiased estimate of *E *(Y1 - Y0), which is the population average causal effect.

Propensity score modeling proposes that in the absence of random assignment, it is possible to identify subsets of units (e.g., schools) which have the same distribution on all observed covariates but who differ in treatment assignment (e.g., SFA, ASP, AC). Then, for this subset, treatment assignment is effectively random if no unobserved covariates predict treatment assignment. This is exactly what the propensity score matching or propensity score stratification method is designed to accomplish---statistically equate subsets of units, in this case schools, on all observed covariates. Thus, we can estimate the average causal effect of CSR participation by pooling estimates of the within-stratum causal effect under the assumption of “strongly ignorable treatment assignment”, where unobserved covariates are unrelated to treatment given the observed set of covariates.

In SII work, we apply a multi-step process in developing a propensity model. This example is based on an analysis of student achievement attributable to respective CSR program affiliation. First, an exhaustive list of observed pre-treatment and exogenous characteristics of schools that could theoretically confound the treatment is identified. The strength of any causal argument, under “strongly ignorable treatment assignment” depends on the assumption that the observed covariates are more likely to confound treatment than any unobserved covariates. These 34 covariates and the differences in means between CSR programs are displayed on Table 2. This process revealed instances of large variances between CSR programs on six of these 34 covariates. Thus, it was necessary to include an additional set of covariates which were computed by squaring the following 6 variables: (1) number of students in the school, (2) percent of black students, (3) percent of students born to mother who was a teen at first birth, (4) percent of students coming from households where parents ran out of food in last 12 months, (5) percent of students coming from households where parents did not have resources to buy kids’ clothing in last 12 months, and (6) the percent of students identified by a caregiver as getting into fights. With this adjustment, 40 covariates were used to create the propensity score and also to demonstrate balance across strata once schools were matched. The SII Propensity Score Stratification data file is available here.

The Penalized Maximum Likelihood Estimation (PMLE) was used to create the propensity scores. Ordinary logistic regression was executed with all 40 covariates entered as predictors in order to create the propensity score. Using the *Design* library in the statistical program R (Alzola and Harrell, 2006), an assessment of the model’s degree of over-optimism was computed and a suggested penalty factor was calculated by the program. This penalty factor was applied to a subsequent regression in order to obtain the propensity score.1

Once the propensity score was obtained, schools were then matched using the *Optmatch* program in the statistical program R (Hansen, 2006). *Optmatch* is designed to provide matches between two populations of schools using the Mahalanobis distance to determine the smallest distances between matched sets of treated and comparison schools given one or more criterion variables. The program also allows users to place restrictions on the number of treated or comparison schools per matched set, as well as the maximum distance allowed for a school to be included in a matched set. Matches were conducted so that each treatment school was matched with at least one other comparison school and no schools were excluded from the analysis.

One advantage of matching is the ability to stratify schools on covariates that are not exogenous to the treatment and therefore could not be included in the logistic regression for the propensity score. As an example, one variable that appeared important for comparing schools was the degree of student stability in a school. Schools with a higher proportion of mobile students may be categorically different from schools with lower rates of student mobility. While these rates are likely to be highly correlated with other factors of disadvantaged schools that are included in the propensity model, it may be important to also adjust for school stability since it may have a unique influence on the achievement growth in schools. The *Optmatch* program in R allowed for such matches adjusting for both the propensity score and for student stability aggregated to the school.

With these steps completed, we then checked whether each of our matching procedures produced balance across all of the covariates. Balance was checked between treated and comparison schools on all covariates using the numerous matches produced by the *Optmatch* program. Interested readers may choose to review the balance matches from this propensity stratification stage. While satisfied that these matches provided a relative balance across the 40 covariates for each of our comparisons, this procedure generated a large number of strata (matched sets) that would be cumbersome to add to our statistical models as dummy variables. Therefore, we combined matched sets to create a reduced number of strata. A re-evaluation was necessary to determine whether balance was maintained across these reduced strata. Using logit transformation, balance was demonstrated (see, Table 3, Table 5 and Table 7) across strata. Further, t-test results between treated and comparison schools on all covariates within strata show that under random assignment, no cases where greater than 5% of the within strata statistical comparisons were significant (see, Table 4, Table 6 and Table 8).

A final step in the process of propensity score stratification, then, would involve taking the dummy-coded stratum variables and entering them into the regression models analyzing the outcomes of the study. In other words, the dummy variables representing propensity strata are entered into the school-level hierarchical linear regression models, leaving one out as the reference. The results of such an analysis were conducted by Correnti (2009), where an HLM model using all created strata created was executed, and then, a final analysis using a reduced number of strata created by combining matched sets. The latter models were preferred as they preserved balance and provided more parsimonious hierarchical regression models while not changing statistical inference.

1 These methods have advantages over standard stepwise regression. First, it allows for the estimation of multiple covariates when the df in the sample are small, as in our case, where the number of covariates approached the number of schools in our sample. Especially, since we examined each set of CSR schools versus the set of comparison schools alone. Thus, by applying a differential penalty to those covariates less influential on the outcome, fewer df were used. Second, PMLE allows for model reduction by shrinking each covariate differently, and in proportion to its relationship with the outcome (Moons, Donders, Steyerberg, Harrell, 2004). This, in turn, is useful for the creation of a propensity score for several reasons: (1) highly predictive variables are shrunk less than others, and would make a more significant contribution to the propensity score; (2) all variables are included in the analysis, even those that begin with relative balance between treatment and control conditions, so the researcher does not have to choose which variables to omit; (3) it reduces the chances multicollinearity plays a factor in the calculation of the propensity score, as highly correlated variables are penalized. As a benefit to the researcher, this reduces the number of iterations the researcher has to create a logistic regression model then check for balance, then re-run the model, etc. (Back to text)

**References**

Alzola, C. and Harrell, F. (2006). An Introduction to S and The Hmisc and Design Libraries. Retrieved February, 2009 from http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RS/sintro.pdf

Correnti, R. (2009). Examining CSR Program Effects on Student Achievement: Causal Explanation Through Examination of Implementation Rates and Student Mobility. Paper prepared for 2nd annual conference of the Society for Research on Educational Effectiveness, Washington, DC, March, 2009.

Hansen, B. (2006). Appraising covariate balance after assignment to treatment by groups. Technical Report #436, Statistics Department: University of Michigan.

Moons K.G., Donders A.R., Steyerberg E.W., Harrell F.E. (2004). Penalized maximum likelihood estimation to directly adjust diagnostic and prognostic prediction models for overoptimism: a clinical example. *Journal of Clinical Epidemiology*, v. 57, 1262-1270.

(Back to sample description)

(Back to top - propensity score discussion)

(Back to sample description)

(Back to propensity score discussion)

*** **Mean Diff = Mean difference in logit of the propensity score for ASP schools minus All Other schools

(Propensity score discussion)

*** **Mean Diff = Mean difference in logit of the propensity score for AC schools minus All Other schools

(Propensity score discussion)

***** Mean Diff = Mean difference in logit of the propensity score for SFA schools minus All Other schools

(Propensity score discussion)