Radon, a naturally-occurring radioactive gas, is a carcinogen known to cause lung cancer in high concentrations. Several thousand lung cancer deaths in the US per year are attributable to radon exposure.
Radon levels in US homes vary greatly, and some homes have dangerously high radon levels. In order to identify areas of the US with high radon exposures, the US EPA conducted a study of radon levels in a random sample of more than 80,000 homes.
Click here to check highest recorded radon levels across the US.
(This example is taken from the excellent book by Gelman and Hill.)
We wish to estimate the distribution of radon levels in houses i within 85 counties j in Minnesota. The outcome yij is the natural log of measured radon levels.
One estimate would be the average of all radon levels in Minnesota (same estimate for all counties), ¯y⋅⋅, but this ignores variation across counties, and some counties may have higher radon levels naturally than others (radon is more commonly found in soils with granite rock, as opposed to some other soil types).
Another estimate would be just to average the radon level in each county, ¯yj, which can over-fit the data within county (for example, Lac Qui Parle County, which has the highest observed radon level of all the 85 MN counties, has radon measures from only 2 homes). This is similar to using an ANOVA model with a fixed effect for county.
Note we get pretty good (low variance) estimates in counties where more samples were taken, while our estimates are not great in counties where just a few samples were obtained.
The figure contrasts two extreme approaches to obtaining estimates of group means μj in this type of setting. A common procedure might be the following. Fit the ANOVA model yij=μ+αj+εij, where εij∼N(0,σ2), testing the significance of the groups using an overall F test (here, an 84 degree of freedom test, as we’re testing all 85 means are the same).
With either case, we will be using sub-optimal estimates for some counties, and the above method is fairly extreme (all or nothing one way or the other!).
An improvement might be using the estimate ¯yj for counties with sufficient sample size and the estimate ¯y for counties where the variability is too high.
Important question: how do we define “sufficient” and “too high”?
Random effects ANOVA is a special case of a hierarchical or multilevel linear model that provides a nice framework for borrowing information across groups when needed to stabilize estimates. We can specify such a model as yij=μ+αj+εij, where εijiid∼N(0,σ2) and αjiid∼N(0,τ2). The model on αj allows us to borrow information in order to obtain better group-specific estimates when needed; because αj is now viewed as random, the model can be called a random effects model.
This particular model is sometimes called a random intercept model because each group has its own intercept, μj=μ+αj, that follows a Gaussian distribution.
The multilevel estimates in the previous slide represent a compromise between the two extremes. In this simple setting (with no predictors), the multilevel estimate for county j can be approximated as a weighted average of the mean of all observations in the county, weighting both the unpooled estimate ¯yj and the mean over all counties ¯y.
How does random effects ANOVA borrow information?
The multilevel estimate
ˆμj≈njσ2¯yj+1τ2¯ynjσ2+1τ2,
where
nj is the number of homes measured in county j
σ2 is the within-county variance in the log radon measurements
τ2 is the variance across the average log radon levels of different counties
The weighted average reflects the relative amount of information available on each individual county, compared to that available across all counties.
Averages from counties with smaller sample sizes are less precise, so the weighting shrinks the multilevel estimates closer to the overall state average. For example, if nj=0, the multilevel estimate is just ¯y.
Averages from counties with larger sample sizes are more precise, and the multilevel estimates are closer to the county averages. As nj→∞, the multilevel estimate is just the county average ¯yj.
In intermediate cases, the multilevel estimate is in between the extremes.
In practice, we carry out all estimation together (estimate variances along with the mean parameters), but we won’t worry about this yet.
These estimates are often called shrinkage estimates, as they “shrink” the no pooling estimates back towards the complete pooling mean, to an extent determined by the information in the data.
Next we formalize the model and consider some of its features and implications.
This model is a special case of a random intercept model in which covariates are categorical. Note some consequences of this model.
yij=μ+αj+εij, where εijiid∼N(0,σ2) ⊥ αjiid∼N(0,τ2)
E[yij]=E[μ+αj+εij]=μ+0+0=μ Var[yij]=E[(yij−E(yij))2]=E[(μ+αj+εij−μ)2]=E[(αj+εij)2]=E[α2j+2αjεij+ε2ij]=τ2+0+σ2=σ2+τ2
For two observations in different groups j and j’, Cov(yij,yi′j′)=E[(yij−E(yij))(yi′j′−E(yi′j′))]=E(yijyi′j′)−μ2−μ2+μ2=E(yij)E(yi′j′)−μ2=μ2−μ2=0
For two observations in the same group j, Cov(yij,yi′j)=E[(yij−E(yij))(yi′j−E(yi′j))]=E(yijyi′j)−μ2−μ2+μ2=E[(μ+αj+εij)(μ+αj+εi′j)]=E[μ2+μαj+μεi′j+αjμ+α2j+αjεi′j+ εijμ+εijαj+εijεi′j]=μ2+0+0+0+τ2+0+0+0+0−μ2=τ2
The correlation between two observations in the same group is
Corr(yij,yi′j)=Cov(yij,yi′j)√(Var(yij))√(Var(yi′j))=τ2σ2+τ2
This motivates the use of random effects ANOVA to handle cases in which we expect subgroups of observations to be correlated (e.g., repeated measures or family studies).
It is often convenient to stack our observations into a long vector YN×1 organized by groups. For simplicity in exposition, assume nj=n and the total sample size N=nJ. Then assuming Y follows a multivariate normal distribution (which follows from our specification previously), we can express Cov(Y)=σ2IN×N+τ2(Jn0⋯00Jn⋯0⋮⋮⋮⋮00⋯Jn)=IJ⊗V, where V=σ2In+τ2Jn and Jn is an n×n matrix of 1’s.
Cov(Y)=σ2IN×N+τ2(Jn0⋯00Jn⋯0⋮⋮⋮⋮00⋯Jn)= (σ2+τ2τ2⋯τ20⋯0⋯⋯0τ2σ2+τ2⋯τ20⋯0⋯⋯0⋮⋱τ2τ2⋯σ2+τ20⋯0⋯⋯000⋯0⋱⋯⋯0⋮⋱⋱⋮⋱⋮⋱0⋯⋯00⋯σ2+τ2τ2⋯τ20⋯⋯00⋯τ2σ2+τ2⋯τ20⋯⋯00⋯τ2τ2⋯σ2+τ2)
The Kronecker product is a convenient way to express patterned covariance matrices (among other things). For matrices Am×n and Bp×q, the Kronecker product A⊗B=[a11B⋯a1nB⋮⋱⋮am1B⋯amnB].
Using a Kronecker product, we can succinctly express the block diagonal covariance matrix of all our observations when we have equal numbers of observations in each group.
We can also think of this formulation in the framework of the general linear mixed effects model, where y=Xβ+Zb+ε. In the random effects ANOVA case,
The covariance matrix Σ is then given by Σ=Var(y)=Var(Xβ+Zb+ε)=Var(Xβ)+Var(Zb)+Var(ε)=ZVar(b)Z′+Var(ε)=ZGZ′+R=τ2ZZ′+σ2I
Assuming our N outcomes follow the multivariate Gaussian distribution, our likelihood is given by 1(2π)N2|Σ|12exp(−12(y−Xβ)′Σ−1(y−Xβ)),
and we often work with the log-likelihood, given by
ℓ(y,β,Σ)=−12{Nlog(2π)+log|Σ|+(y−Xβ)′Σ−1(y−Xβ)}∝log|Σ|+(y−Xβ)′Σ−1(y−Xβ),
which we then minimize (as I took the negative) in order to find the MLE.
Consider a one-sample setting in which we wish to estimate the sample mean μ and variance σ2 using the model yi=μ+εi, i=1,…,n, εi∼N(0,σ2).
Then our log-likelihood is proportional to nlogσ2+∑(yi−μ)2σ2, and to find the MLE’s of μ and σ2, when you take derivatives and solve for zero, you obtain ˆμ=¯y and ˆσ2=∑(yi−¯y)2n.
Typically we don’t use the MLE to estimate σ2 because of its well-known small-sample bias, instead using the unbiased estimate s2=∑(yi−¯y)2n−1=nn−1ˆσ2.
One important predictor of radon levels is the floor on which the measurement is taken: basement (xi=0) or first floor (xi=1). Radon comes from underground and can enter more easily when the house is built into the ground; in addition, basements tend to have higher levels than ground floors.
First, we examine the complete-pooling regression, yij=α+βxij+εij, and the no-pooling regression yij=αj+βxij+εij, where αj is the mean log radon level from basement measures of homes (indexed by i) in county j.
The following plot shows the dashed lines ˆy=ˆα+ˆβx for eight selected counties from the complete pooling model, and the solid lines ˆy=ˆαj+ˆβx from no pooling model.
The estimates of β (the association between floor of home and radon level) differ slightly for the two regressions, with ˆβ=−0.61 for the pooling model, and ˆβ=−0.72 for the no-pooling model. As we might expect, we tend to have higher radon levels in the basement (p<0.0001).
Neither analysis is perfect. The complete-pooling analysis ignores variation in radon levels between counties, which is undesirable because our goal is to identify counties with high-radon homes – we can’t pool away the main research question! The no-pooling analysis is also problematic – for example the Lac Qui Parle County line is estimated based on just two data points.
We will start with a simple multilevel model, yij=γ0+αj+βxij+εij, where now αj∼N(0,τ2) and εij∼N(0,σ2). We fit this model using the lmer() function in the lme4 package.
This model can also be parameterized yij∼N(αj+βxij,σ2),
αj∼N(γ0,τ2)
Data: srrs2.dat
srrs2 <- read.table ("srrs2.dat", header=T, sep=",")
mn <- srrs2$state=="MN"
radon <- srrs2$activity[mn]
log.radon <- log (ifelse (radon==0, .1, radon))
floor <- srrs2$floor[mn] # 0 for basement, 1 for first floor, 9 for missing (no missing in minnesota)
n <- length(radon)
y <- log.radon
x <- floor
county.name <- as.vector(srrs2$county[mn])
uniq <- unique(county.name)
J <- length(uniq)
county <- rep (NA, J)
for (i in 1:J){
county[county.name==uniq[i]] <- i
}
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6293 -0.5383 0.0342 0.5603 2.5486
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.32674 0.02972 44.640 <2e-16 ***
## x -0.61339 0.07284 -8.421 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8226 on 917 degrees of freedom
## Multiple R-squared: 0.07178, Adjusted R-squared: 0.07077
## F-statistic: 70.91 on 1 and 917 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = y ~ x + factor(county) - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.14595 -0.45405 0.00065 0.45376 2.65987
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## x -0.72054 0.07352 -9.800 < 2e-16 ***
## factor(county)1 0.84054 0.37866 2.220 0.026701 *
## factor(county)2 0.87482 0.10498 8.333 3.23e-16 ***
## factor(county)3 1.52870 0.43946 3.479 0.000530 ***
## factor(county)4 1.55272 0.28897 5.373 1.00e-07 ***
## factor(county)5 1.43257 0.37866 3.783 0.000166 ***
## factor(county)6 1.51301 0.43672 3.464 0.000558 ***
## factor(county)7 2.01216 0.20243 9.940 < 2e-16 ***
## factor(county)8 1.98958 0.37999 5.236 2.08e-07 ***
## factor(county)9 1.00304 0.23931 4.191 3.07e-05 ***
## factor(county)10 1.56391 0.31099 5.029 6.04e-07 ***
## factor(county)11 1.40113 0.33828 4.142 3.80e-05 ***
## factor(county)12 1.73025 0.37821 4.575 5.49e-06 ***
## factor(county)13 1.03872 0.30881 3.364 0.000804 ***
## factor(county)14 1.98838 0.20325 9.783 < 2e-16 ***
## factor(county)15 1.33797 0.37999 3.521 0.000453 ***
## factor(county)16 0.66486 0.53487 1.243 0.214204
## factor(county)17 1.27480 0.38221 3.335 0.000890 ***
## factor(county)18 1.12155 0.21913 5.118 3.83e-07 ***
## factor(county)19 1.33831 0.09541 14.026 < 2e-16 ***
## factor(county)20 1.80032 0.43672 4.122 4.13e-05 ***
## factor(county)21 1.73399 0.25227 6.873 1.23e-11 ***
## factor(county)22 0.63679 0.30905 2.060 0.039663 *
## factor(county)23 1.39999 0.53613 2.611 0.009183 **
## factor(county)24 2.10162 0.25267 8.318 3.64e-16 ***
## factor(county)25 1.95072 0.20243 9.636 < 2e-16 ***
## factor(county)26 1.36058 0.07422 18.332 < 2e-16 ***
## factor(county)27 1.77336 0.30978 5.725 1.45e-08 ***
## factor(county)28 1.24159 0.34115 3.639 0.000290 ***
## factor(county)29 1.05600 0.43672 2.418 0.015818 *
## factor(county)30 0.92576 0.22807 4.059 5.39e-05 ***
## factor(county)31 2.02057 0.33828 5.973 3.45e-09 ***
## factor(county)32 1.23629 0.37821 3.269 0.001124 **
## factor(county)33 2.06187 0.37821 5.452 6.58e-08 ***
## factor(county)34 1.59044 0.43946 3.619 0.000314 ***
## factor(county)35 0.81920 0.28897 2.835 0.004695 **
## factor(county)36 2.95897 0.53613 5.519 4.55e-08 ***
## factor(county)37 0.40209 0.25227 1.594 0.111345
## factor(county)38 1.86772 0.37999 4.915 1.07e-06 ***
## factor(county)39 1.74807 0.33860 5.163 3.05e-07 ***
## factor(county)40 2.31580 0.37866 6.116 1.48e-09 ***
## factor(county)41 1.96715 0.26759 7.351 4.69e-13 ***
## factor(county)42 1.36098 0.75642 1.799 0.072343 .
## factor(county)43 1.60224 0.25543 6.273 5.69e-10 ***
## factor(county)44 1.04099 0.28609 3.639 0.000291 ***
## factor(county)45 1.29541 0.21101 6.139 1.28e-09 ***
## factor(county)46 1.21461 0.33828 3.591 0.000349 ***
## factor(county)47 0.88393 0.53613 1.649 0.099583 .
## factor(county)48 1.14812 0.25227 4.551 6.13e-06 ***
## factor(county)49 1.70211 0.21010 8.102 1.93e-15 ***
## factor(county)50 2.49321 0.75642 3.296 0.001022 **
## factor(county)51 2.16504 0.37821 5.724 1.45e-08 ***
## factor(county)52 1.92769 0.43672 4.414 1.15e-05 ***
## factor(county)53 1.25080 0.43741 2.860 0.004348 **
## factor(county)54 1.30676 0.15802 8.270 5.28e-16 ***
## factor(county)55 1.61799 0.26885 6.018 2.64e-09 ***
## factor(county)56 1.10110 0.43946 2.506 0.012415 *
## factor(county)57 0.76218 0.30905 2.466 0.013855 *
## factor(county)58 1.86092 0.37866 4.915 1.07e-06 ***
## factor(county)59 1.72178 0.37999 4.531 6.73e-06 ***
## factor(county)60 1.27939 0.53487 2.392 0.016979 *
## factor(county)61 1.15873 0.13389 8.654 < 2e-16 ***
## factor(county)62 1.98301 0.33860 5.856 6.80e-09 ***
## factor(county)63 1.67070 0.43741 3.820 0.000144 ***
## factor(county)64 1.84784 0.22817 8.099 1.97e-15 ***
## factor(county)65 1.29912 0.53487 2.429 0.015357 *
## factor(county)66 1.66574 0.20648 8.067 2.50e-15 ***
## factor(county)67 1.80312 0.21101 8.545 < 2e-16 ***
## factor(county)68 1.09002 0.26743 4.076 5.02e-05 ***
## factor(county)69 1.24245 0.37821 3.285 0.001062 **
## factor(county)70 0.86763 0.07096 12.227 < 2e-16 ***
## factor(county)71 1.49184 0.15174 9.832 < 2e-16 ***
## factor(county)72 1.57990 0.23920 6.605 7.08e-11 ***
## factor(county)73 1.79176 0.53487 3.350 0.000845 ***
## factor(county)74 0.98704 0.37821 2.610 0.009223 **
## factor(county)75 1.72372 0.43741 3.941 8.80e-05 ***
## factor(county)76 2.00844 0.37866 5.304 1.45e-07 ***
## factor(county)77 1.82168 0.28609 6.367 3.17e-10 ***
## factor(county)78 1.28569 0.33956 3.786 0.000164 ***
## factor(county)79 0.61488 0.37866 1.624 0.104785
## factor(county)80 1.32952 0.11181 11.890 < 2e-16 ***
## factor(county)81 2.70953 0.43946 6.166 1.09e-09 ***
## factor(county)82 2.23001 0.75642 2.948 0.003286 **
## factor(county)83 1.62292 0.21048 7.711 3.57e-14 ***
## factor(county)84 1.64535 0.20987 7.840 1.38e-14 ***
## factor(county)85 1.18652 0.53487 2.218 0.026801 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7564 on 833 degrees of freedom
## Multiple R-squared: 0.7671, Adjusted R-squared: 0.7431
## F-statistic: 31.91 on 86 and 833 DF, p-value: < 2.2e-16
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: y ~ x + (1 | county)
##
## AIC BIC logLik deviance df.resid
## 2171.7 2190.9 -1081.8 2163.7 915
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.4071 -0.6164 0.0056 0.6398 3.4288
##
## Random effects:
## Groups Name Variance Std.Dev.
## county (Intercept) 0.1053 0.3245
## Residual 0.5703 0.7552
## Number of obs: 919, groups: county, 85
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.46116 0.05124 28.516
## x -0.69264 0.07036 -9.844
##
## Correlation of Fixed Effects:
## (Intr)
## x -0.290
## $county
## (Intercept) x
## 1 1.1945745 -0.6926365
## 2 0.9286777 -0.6926365
## 3 1.4786009 -0.6926365
## 4 1.5037889 -0.6926365
## 5 1.4460519 -0.6926365
## 6 1.4796397 -0.6926365
## 7 1.8555701 -0.6926365
## 8 1.6796893 -0.6926365
## 9 1.1621916 -0.6926365
## 10 1.5078244 -0.6926365
## 11 1.4323468 -0.6926365
## 12 1.5754595 -0.6926365
## 13 1.2391507 -0.6926365
## 14 1.8355525 -0.6926365
## 15 1.4029057 -0.6926365
## 16 1.2464311 -0.6926365
## 17 1.3731083 -0.6926365
## 18 1.2223643 -0.6926365
## 19 1.3464032 -0.6926365
## 20 1.5820437 -0.6926365
## 21 1.6295442 -0.6926365
## 22 1.0254760 -0.6926365
## 23 1.4409018 -0.6926365
## 24 1.8571106 -0.6926365
## 25 1.8112724 -0.6926365
## 26 1.3627354 -0.6926365
## 27 1.6203441 -0.6926365
## 28 1.3477310 -0.6926365
## 29 1.3167485 -0.6926365
## 30 1.1024167 -0.6926365
## 31 1.7296714 -0.6926365
## 32 1.3656408 -0.6926365
## 33 1.7163254 -0.6926365
## 34 1.5006074 -0.6926365
## 35 1.0902638 -0.6926365
## 36 1.8612907 -0.6926365
## 37 0.7980765 -0.6926365
## 38 1.6279261 -0.6926365
## 39 1.5961932 -0.6926365
## 40 1.8212249 -0.6926365
## 41 1.7607856 -0.6926365
## 42 1.4455459 -0.6926365
## 43 1.5395531 -0.6926365
## 44 1.2220391 -0.6926365
## 45 1.3381032 -0.6926365
## 46 1.3428173 -0.6926365
## 47 1.3017421 -0.6926365
## 48 1.2638020 -0.6926365
## 49 1.6282095 -0.6926365
## 50 1.6219929 -0.6926365
## 51 1.7601468 -0.6926365
## 52 1.6274442 -0.6926365
## 53 1.3828666 -0.6926365
## 54 1.3332417 -0.6926365
## 55 1.5484354 -0.6926365
## 56 1.3261923 -0.6926365
## 57 1.0913754 -0.6926365
## 58 1.6280045 -0.6926365
## 59 1.5659375 -0.6926365
## 60 1.4121426 -0.6926365
## 61 1.2002725 -0.6926365
## 62 1.7089656 -0.6926365
## 63 1.5325292 -0.6926365
## 64 1.7185497 -0.6926365
## 65 1.4174628 -0.6926365
## 66 1.5971714 -0.6926365
## 67 1.6964805 -0.6926365
## 68 1.2398624 -0.6926365
## 69 1.3682586 -0.6926365
## 70 0.8904344 -0.6926365
## 71 1.4827093 -0.6926365
## 72 1.5381798 -0.6926365
## 73 1.5503073 -0.6926365
## 74 1.2597664 -0.6926365
## 75 1.5514276 -0.6926365
## 76 1.6906647 -0.6926365
## 77 1.6621565 -0.6926365
## 78 1.3715768 -0.6926365
## 79 1.0987212 -0.6926365
## 80 1.3406745 -0.6926365
## 81 1.8994853 -0.6926365
## 82 1.5809771 -0.6926365
## 83 1.5707987 -0.6926365
## 84 1.5896624 -0.6926365
## 85 1.3871005 -0.6926365
##
## attr(,"class")
## [1] "coef.mer"
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ x + (1 | county)
##
## REML criterion at convergence: 2171.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.3989 -0.6155 0.0029 0.6405 3.4281
##
## Random effects:
## Groups Name Variance Std.Dev.
## county (Intercept) 0.1077 0.3282
## Residual 0.5709 0.7556
## Number of obs: 919, groups: county, 85
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.46160 0.05158 28.339
## x -0.69299 0.07043 -9.839
##
## Correlation of Fixed Effects:
## (Intr)
## x -0.288
The ratio of county-level variance to total variance is an estimate of the correlation of within-county measures. Here ˆρ=.1077.1077+.5709=0.16.
## $county
## (Intercept) x
## 1 1.1915004 -0.6929937
## 2 0.9276468 -0.6929937
## 3 1.4792143 -0.6929937
## 4 1.5045012 -0.6929937
## 5 1.4461503 -0.6929937
## 6 1.4801817 -0.6929937
## 7 1.8581255 -0.6929937
## 8 1.6827736 -0.6929937
## 9 1.1600747 -0.6929937
## 10 1.5086099 -0.6929937
## 11 1.4322449 -0.6929937
## 12 1.5771520 -0.6929937
## 13 1.2370518 -0.6929937
## 14 1.8380232 -0.6929937
## 15 1.4024982 -0.6929937
## 16 1.2432992 -0.6929937
## 17 1.3723633 -0.6929937
## 18 1.2209415 -0.6929937
## 19 1.3462611 -0.6929937
## 20 1.5840332 -0.6929937
## 21 1.6311136 -0.6929937
## 22 1.0211903 -0.6929937
## 23 1.4409443 -0.6929937
## 24 1.8605721 -0.6929937
## 25 1.8135585 -0.6929937
## 26 1.3626875 -0.6929937
## 27 1.6222663 -0.6929937
## 28 1.3467692 -0.6929937
## 29 1.3149879 -0.6929937
## 30 1.0999775 -0.6929937
## 31 1.7329562 -0.6929937
## 32 1.3646863 -0.6929937
## 33 1.7197950 -0.6929937
## 34 1.5015319 -0.6929937
## 35 1.0870316 -0.6929937
## 36 1.8680899 -0.6929937
## 37 0.7928242 -0.6929937
## 38 1.6303574 -0.6929937
## 39 1.5979922 -0.6929937
## 40 1.8260564 -0.6929937
## 41 1.7636308 -0.6929937
## 42 1.4456250 -0.6929937
## 43 1.5404841 -0.6929937
## 44 1.2199767 -0.6929937
## 45 1.3375197 -0.6929937
## 46 1.3416955 -0.6929937
## 47 1.2995481 -0.6929937
## 48 1.2623708 -0.6929937
## 49 1.6294468 -0.6929937
## 50 1.6253580 -0.6929937
## 51 1.7641693 -0.6929937
## 52 1.6300755 -0.6929937
## 53 1.3820836 -0.6929937
## 54 1.3328317 -0.6929937
## 55 1.5494611 -0.6929937
## 56 1.3246513 -0.6929937
## 57 1.0877739 -0.6929937
## 58 1.6303984 -0.6929937
## 59 1.5675867 -0.6929937
## 60 1.4116740 -0.6929937
## 61 1.1995431 -0.6929937
## 62 1.7120492 -0.6929937
## 63 1.5338618 -0.6929937
## 64 1.7205672 -0.6929937
## 65 1.4170797 -0.6929937
## 66 1.5982671 -0.6929937
## 67 1.6981946 -0.6929937
## 68 1.2380854 -0.6929937
## 69 1.3673371 -0.6929937
## 70 0.8899487 -0.6929937
## 71 1.4829168 -0.6929937
## 72 1.5389227 -0.6929937
## 73 1.5520593 -0.6929937
## 74 1.2574763 -0.6929937
## 75 1.5530274 -0.6929937
## 76 1.6938490 -0.6929937
## 77 1.6642922 -0.6929937
## 78 1.3708520 -0.6929937
## 79 1.0944378 -0.6929937
## 80 1.3404792 -0.6929937
## 81 1.9060482 -0.6929937
## 82 1.5835784 -0.6929937
## 83 1.5716875 -0.6929937
## 84 1.5906331 -0.6929937
## 85 1.3862294 -0.6929937
##
## attr(,"class")
## [1] "coef.mer"
The first column gives estimates of αj, and the second column gives the estimate of β (does not vary over county j according to the model we specified).
Note that there is strong pooling in counties with few data points (multilevel line is closer to the dashed line, as in Lac Qui Parle county), and weak pooling in counties with more data points (multilevel line is on top of no pooling line in St. Louis county).
Radon comes from uranium that has been in the ground since the earth’s formation. A measure of average uranium content of the soil was available at the county level (but not for each house). Recall our current model to account for variability across counties due to uranium levels in the soil: yij=αj+βxi+εij, where αj∼N(γ0,τ2) ⊥ εij∼N(0,σ2). To include the county-level uranium measure, uj, as a predictor, we just modify the model so that αj∼N(γ0+γ1uj,τ2).
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ x + u.full + (1 | county)
##
## REML criterion at convergence: 2134.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.9673 -0.6117 0.0274 0.6555 3.3848
##
## Random effects:
## Groups Name Variance Std.Dev.
## county (Intercept) 0.02446 0.1564
## Residual 0.57523 0.7584
## Number of obs: 919, groups: county, 85
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.46576 0.03794 38.633
## x -0.66824 0.06880 -9.713
## u.full 0.72027 0.09176 7.849
##
## Correlation of Fixed Effects:
## (Intr) x
## x -0.357
## u.full 0.145 -0.009
## $county
## (Intercept) x u.full
## 1 1.445120 -0.6682448 0.7202676
## 2 1.477009 -0.6682448 0.7202676
## 3 1.478185 -0.6682448 0.7202676
## 4 1.576891 -0.6682448 0.7202676
## 5 1.473999 -0.6682448 0.7202676
## 6 1.439566 -0.6682448 0.7202676
## 7 1.593872 -0.6682448 0.7202676
## 8 1.509045 -0.6682448 0.7202676
## 9 1.397556 -0.6682448 0.7202676
## 10 1.466362 -0.6682448 0.7202676
## 11 1.531245 -0.6682448 0.7202676
## 12 1.475555 -0.6682448 0.7202676
## 13 1.486617 -0.6682448 0.7202676
## 14 1.562814 -0.6682448 0.7202676
## 15 1.449652 -0.6682448 0.7202676
## 16 1.431496 -0.6682448 0.7202676
## 17 1.396746 -0.6682448 0.7202676
## 18 1.499187 -0.6682448 0.7202676
## 19 1.383197 -0.6682448 0.7202676
## 20 1.482112 -0.6682448 0.7202676
## 21 1.507354 -0.6682448 0.7202676
## 22 1.252279 -0.6682448 0.7202676
## 23 1.435133 -0.6682448 0.7202676
## 24 1.593848 -0.6682448 0.7202676
## 25 1.591105 -0.6682448 0.7202676
## 26 1.432151 -0.6682448 0.7202676
## 27 1.451023 -0.6682448 0.7202676
## 28 1.471547 -0.6682448 0.7202676
## 29 1.480673 -0.6682448 0.7202676
## 30 1.445935 -0.6682448 0.7202676
## 31 1.524017 -0.6682448 0.7202676
## 32 1.437994 -0.6682448 0.7202676
## 33 1.540931 -0.6682448 0.7202676
## 34 1.476560 -0.6682448 0.7202676
## 35 1.456290 -0.6682448 0.7202676
## 36 1.563196 -0.6682448 0.7202676
## 37 1.307618 -0.6682448 0.7202676
## 38 1.591774 -0.6682448 0.7202676
## 39 1.488871 -0.6682448 0.7202676
## 40 1.540850 -0.6682448 0.7202676
## 41 1.519206 -0.6682448 0.7202676
## 42 1.457093 -0.6682448 0.7202676
## 43 1.492752 -0.6682448 0.7202676
## 44 1.339205 -0.6682448 0.7202676
## 45 1.363380 -0.6682448 0.7202676
## 46 1.418700 -0.6682448 0.7202676
## 47 1.429966 -0.6682448 0.7202676
## 48 1.394820 -0.6682448 0.7202676
## 49 1.480131 -0.6682448 0.7202676
## 50 1.495954 -0.6682448 0.7202676
## 51 1.541446 -0.6682448 0.7202676
## 52 1.484985 -0.6682448 0.7202676
## 53 1.417856 -0.6682448 0.7202676
## 54 1.297171 -0.6682448 0.7202676
## 55 1.536860 -0.6682448 0.7202676
## 56 1.426593 -0.6682448 0.7202676
## 57 1.344875 -0.6682448 0.7202676
## 58 1.471195 -0.6682448 0.7202676
## 59 1.471316 -0.6682448 0.7202676
## 60 1.435281 -0.6682448 0.7202676
## 61 1.459521 -0.6682448 0.7202676
## 62 1.508361 -0.6682448 0.7202676
## 63 1.455963 -0.6682448 0.7202676
## 64 1.541677 -0.6682448 0.7202676
## 65 1.422890 -0.6682448 0.7202676
## 66 1.586234 -0.6682448 0.7202676
## 67 1.563953 -0.6682448 0.7202676
## 68 1.495340 -0.6682448 0.7202676
## 69 1.408462 -0.6682448 0.7202676
## 70 1.246716 -0.6682448 0.7202676
## 71 1.431690 -0.6682448 0.7202676
## 72 1.441835 -0.6682448 0.7202676
## 73 1.464737 -0.6682448 0.7202676
## 74 1.363079 -0.6682448 0.7202676
## 75 1.496793 -0.6682448 0.7202676
## 76 1.490651 -0.6682448 0.7202676
## 77 1.520897 -0.6682448 0.7202676
## 78 1.515393 -0.6682448 0.7202676
## 79 1.317927 -0.6682448 0.7202676
## 80 1.442140 -0.6682448 0.7202676
## 81 1.587608 -0.6682448 0.7202676
## 82 1.490002 -0.6682448 0.7202676
## 83 1.398639 -0.6682448 0.7202676
## 84 1.551352 -0.6682448 0.7202676
## 85 1.423816 -0.6682448 0.7202676
##
## attr(,"class")
## [1] "coef.mer"
We could also allow the effect of home floor to vary by group:
yij=αj+βjxij+εij, where εij∼N(0,σ2) and
(αjβj)∼N((γ0+γ1ujγ2),(τ21ρτ1τ2ρτ1τ2τ22))
This model is called a random intercepts and slopes model.
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ x + u.full + (1 + x | county)
## Control: lmerControl(optimizer = "Nelder_Mead")
##
## REML criterion at convergence: 2128.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.1161 -0.6127 0.0288 0.6379 3.5101
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## county (Intercept) 0.01668 0.1291
## x 0.12753 0.3571 0.21
## Residual 0.56006 0.7484
## Number of obs: 919, groups: county, 85
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.46265 0.03565 41.026
## x -0.64238 0.08737 -7.353
## u.full 0.76802 0.08888 8.641
##
## Correlation of Fixed Effects:
## (Intr) x
## x -0.260
## u.full 0.181 -0.045
Note the uncertainty around the random effect of floor is large; we may want to eliminate this random term from the model to simplify things.
## $county
## (Intercept) x u.full
## 1 1.454839 -0.5557849 0.7680152
## 2 1.498250 -0.8329306 0.7680152
## 3 1.471204 -0.6081378 0.7680152
## 4 1.541061 -0.4891592 0.7680152
## 5 1.471605 -0.5764021 0.7680152
## 6 1.442406 -0.6541232 0.7680152
## 7 1.561590 -0.2744483 0.7680152
## 8 1.491389 -0.6037511 0.7680152
## 9 1.422819 -0.4472495 0.7680152
## 10 1.461792 -0.7409657 0.7680152
## 11 1.515219 -0.6119111 0.7680152
## 12 1.468764 -0.6388426 0.7680152
## 13 1.483998 -0.6300106 0.7680152
## 14 1.532249 -0.6930081 0.7680152
## 15 1.450372 -0.6565622 0.7680152
## 16 1.439612 -0.6557430 0.7680152
## 17 1.413297 -0.9328837 0.7680152
## 18 1.492989 -0.4044518 0.7680152
## 19 1.393825 -0.8101057 0.7680152
## 20 1.473727 -0.6359653 0.7680152
## 21 1.494676 -0.5669882 0.7680152
## 22 1.299576 -0.8311211 0.7680152
## 23 1.437894 -0.6895495 0.7680152
## 24 1.555657 -0.5532118 0.7680152
## 25 1.560314 -0.2880760 0.7680152
## 26 1.444008 -0.7840421 0.7680152
## 27 1.447600 -0.6525284 0.7680152
## 28 1.466966 -0.5665901 0.7680152
## 29 1.476661 -0.6342642 0.7680152
## 30 1.455885 -0.6463087 0.7680152
## 31 1.504201 -0.6182987 0.7680152
## 32 1.442926 -0.6538220 0.7680152
## 33 1.517462 -0.6106102 0.7680152
## 34 1.468485 -0.6766566 0.7680152
## 35 1.455596 -0.4768488 0.7680152
## 36 1.540259 -0.3751583 0.7680152
## 37 1.351477 -0.6802207 0.7680152
## 38 1.558721 -0.3400229 0.7680152
## 39 1.481342 -0.5536103 0.7680152
## 40 1.514288 -0.6107998 0.7680152
## 41 1.504239 -0.4855500 0.7680152
## 42 1.456392 -0.6460150 0.7680152
## 43 1.500071 -1.0192175 0.7680152
## 44 1.356740 -0.9672180 0.7680152
## 45 1.388691 -0.8886304 0.7680152
## 46 1.428126 -0.6624021 0.7680152
## 47 1.426758 -0.8947845 0.7680152
## 48 1.413136 -0.5945437 0.7680152
## 49 1.466444 -0.8973583 0.7680152
## 50 1.483597 -0.6302432 0.7680152
## 51 1.517137 -0.6107992 0.7680152
## 52 1.475255 -0.6350796 0.7680152
## 53 1.423073 -0.7531899 0.7680152
## 54 1.325222 -1.0189740 0.7680152
## 55 1.513798 -0.4626758 0.7680152
## 56 1.430771 -0.8613309 0.7680152
## 57 1.372062 -0.7609505 0.7680152
## 58 1.465894 -0.5930129 0.7680152
## 59 1.462209 -0.7632360 0.7680152
## 60 1.440217 -0.6553925 0.7680152
## 61 1.465805 -0.3377132 0.7680152
## 62 1.505305 -0.2849207 0.7680152
## 63 1.458960 -0.5281158 0.7680152
## 64 1.517429 -0.6427512 0.7680152
## 65 1.430668 -0.6609281 0.7680152
## 66 1.533864 -0.4550602 0.7680152
## 67 1.525190 -0.2880584 0.7680152
## 68 1.491948 -0.6254020 0.7680152
## 69 1.419827 -0.6672131 0.7680152
## 70 1.278893 -0.6583891 0.7680152
## 71 1.436184 -0.8115634 0.7680152
## 72 1.442011 -0.6543522 0.7680152
## 73 1.460830 -0.6434421 0.7680152
## 74 1.386199 -0.6867082 0.7680152
## 75 1.492218 -0.4534797 0.7680152
## 76 1.476514 -0.6628855 0.7680152
## 77 1.495764 -0.7827571 0.7680152
## 78 1.501583 -0.6084108 0.7680152
## 79 1.347579 -0.8712484 0.7680152
## 80 1.452347 -0.8427810 0.7680152
## 81 1.551396 -0.3087664 0.7680152
## 82 1.479602 -0.6325591 0.7680152
## 83 1.408640 -1.1654550 0.7680152
## 84 1.530259 -0.6076644 0.7680152
## 85 1.431796 -0.6602741 0.7680152
##
## attr(,"class")
## [1] "coef.mer"
Multilevel models can provide a good fit to the data that is a compromise between complete and no pooling across groups. They are often used to handle correlated data due to clustering, such as neighborhood or family effects.