In fall 1973, the University of California, Berkeley’s graduate division admitted 44% of male applicants and 35% of female applicants. School administrators were concerned about the potential for bias (and lawsuits!) and asked statistics professor Peter Bickel to examine the data more carefully.
We have a subset of the admissions data for 6 departments.
load("~/Documents/GitHub/STA440/2020/decks/data/UCBadmit.RData")
d=UCBadmit
library(tidyverse)
d <-
d%>%
mutate(male=ifelse(applicant.gender=="male",1,0),
dept_id = rep(1:6, each = 2))
d$successrate=d$admit/d$applications
sum(d$admit[d$male==1])/sum(d$applications[d$male==1])
## [1] 0.4451877
## [1] 0.3035422
We see in this subset of departments that roughly 45% of male applicants were admitted, while only 30% of female applicants were admitted.
Because admissions decisions for graduate school are made on a departmental level (not at the school level), it makes sense to examine results of applications by department.
## dept applicant.gender admit reject dept_id
## 1 A male 512 313 1
## 2 A female 89 19 1
## 3 B male 353 207 2
## 4 B female 17 8 2
## 5 C male 120 205 3
## 6 C female 202 391 3
## 7 D male 138 279 4
## 8 D female 131 244 4
## 9 E male 53 138 5
## 10 E female 94 299 5
## 11 F male 22 351 6
## 12 F female 24 317 6
Hmm, what’s going on here?
Following McElreath’s analysis in Statistical Rethinking, we start fitting a simple logistic regression model and examine diagnostic measures.
The model for department \(i\) and gender \(j\) with \(n_{admit,ij}\) of \(n_{ij}\) applicants admitted is given as:
\(n_{admit,ij} \sim \text{Binomial}(n_{ij},p_{ij})~~~\) \(\text{logit}(p_{ij})=\alpha+\beta\text{male}_{j}\)
##
## Call:
## glm(formula = cbind(admit, reject) ~ 1 + male, family = binomial,
## data = d)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -16.7915 -4.7613 -0.4365 5.1025 11.2022
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.83049 0.05077 -16.357 <2e-16 ***
## male 0.61035 0.06389 9.553 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 877.06 on 11 degrees of freedom
## Residual deviance: 783.61 on 10 degrees of freedom
## AIC: 856.55
##
## Number of Fisher Scoring iterations: 4
Here it appears male applicants have \(e^{0.61}=1.8\) (95% CI (1.6, 2.1)) times the odds of admission as female applicants.
How do our model’s predictions align with observed probabilities?
male=c(1,0)
predadm1=c(exp(adm1$coefficients[1]+adm1$coefficients[2])/(1+exp(adm1$coefficients[1]+adm1$coefficients[2])),exp(adm1$coefficients[1])/(1+exp(adm1$coefficients[1]))) #this is so you see how to get the prediction, not the most efficient code
# more efficient code comes later :)
predprob=cbind(male,predadm1)
d1=merge(d,predprob)
d1[,c(1,2,8,9)]
## male dept successrate predadm1
## 1 0 A 0.82407407 0.3035422
## 2 0 B 0.68000000 0.3035422
## 3 0 C 0.34064081 0.3035422
## 4 0 D 0.34933333 0.3035422
## 5 0 E 0.23918575 0.3035422
## 6 0 F 0.07038123 0.3035422
## 7 1 A 0.62060606 0.4451877
## 8 1 B 0.63035714 0.4451877
## 9 1 C 0.36923077 0.4451877
## 10 1 D 0.33093525 0.4451877
## 11 1 E 0.27748691 0.4451877
## 12 1 F 0.05898123 0.4451877
Eew, this model is way off the mark! Certainly there are some large departmental effects, so let’s fit a more flexible model.
\(n_{admit,ij} \sim \text{Binomial}(n_{ij},p_{ij})~~~\) \(\text{logit}(p_{ij})=\alpha+\beta\text{male}_{j}+\gamma_1I(\text{dept}_i=B)+\gamma_2I(\text{dept}_i=C)\) \(+\gamma_3I(\text{dept}_i=D)+\gamma_4I(\text{dept}_i=E)+\gamma_5I(\text{dept}_i=F)\)
##
## Call:
## glm(formula = cbind(admit, reject) ~ 1 + male + dept, family = binomial,
## data = d)
##
## Deviance Residuals:
## 1 2 3 4 5 6 7 8
## -1.2487 3.7189 -0.0560 0.2706 1.2533 -0.9243 0.0826 -0.0858
## 9 10 11 12
## 1.2205 -0.8509 -0.2076 0.2052
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.68192 0.09911 6.880 5.97e-12 ***
## male -0.09987 0.08085 -1.235 0.217
## deptB -0.04340 0.10984 -0.395 0.693
## deptC -1.26260 0.10663 -11.841 < 2e-16 ***
## deptD -1.29461 0.10582 -12.234 < 2e-16 ***
## deptE -1.73931 0.12611 -13.792 < 2e-16 ***
## deptF -3.30648 0.16998 -19.452 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 877.056 on 11 degrees of freedom
## Residual deviance: 20.204 on 5 degrees of freedom
## AIC: 103.14
##
## Number of Fisher Scoring iterations: 4
Note the gender term no longer reaches statistical significance
predlogit=predict(adm2,data=d)
predprob=exp(predlogit)/(1+exp(predlogit))
d2=cbind(d,predprob)
d2[,c(1,2,8,9)]
## dept applicant.gender successrate predprob
## 1 A male 0.62060606 0.64153930
## 2 A female 0.82407407 0.66416742
## 3 B male 0.63035714 0.63149912
## 4 B female 0.68000000 0.65441963
## 5 C male 0.36923077 0.33613931
## 6 C female 0.34064081 0.35877694
## 7 D male 0.33093525 0.32903451
## 8 D female 0.34933333 0.35144696
## 9 E male 0.27748691 0.23916654
## 10 E female 0.23918575 0.25780964
## 11 F male 0.05898123 0.06154717
## 12 F female 0.07038123 0.06757450
This model provides a much closer fit to the data.
## dept applicant.gender successrate
## 1 A male 0.62060606
## 2 A female 0.82407407
## 3 B male 0.63035714
## 4 B female 0.68000000
## 5 C male 0.36923077
## 6 C female 0.34064081
## 7 D male 0.33093525
## 8 D female 0.34933333
## 9 E male 0.27748691
## 10 E female 0.23918575
## 11 F male 0.05898123
## 12 F female 0.07038123
In the raw data, women had higher acceptance probabilities than men in 4 of the 6 departments. However, the departments to which they applied in higher numbers were the departments that had lower overall acceptance rates.
What happened is that women were more likely to apply to departments like English, which have financial trouble supporting grad students, and they were less likely to apply to STEM departments, which had more plentiful funding for graduate students. The men, on the other hand, were much more likely to apply to the STEM departments that had higher acceptance rates.