Goals of this case study are (1) to predict the outcome of the presidential election, (2) to predict whether the US Senate remains in Republican control, (3) to predict the electoral college vote, (4) to predict the outcomes of all NC Congressional elections (the 13 federal Representatives to Congress), and (5) to predict the outcome of the NC Senate election, including characterization of uncertainty in predictions.
We will present predictions and corresponding uncertainty quantification weekly. In addition, prizes will be given in a number of categories (e.g., most creative useful data source).
[Matrix Formulation of the Linear Model]
[Introduction to Multilevel Modeling]
[Some Light Technical Details About Multilevel Models]
[More Details on the Radon Example]
[Deep Dive into R Code]
Groups: you can select your own group, or I can put you into groups. There are no restrictions on who can be in your group. It must have at least 3 members and no more than 5. Groups will be allocated on a first-come, first-served basis. I reserve the right to add group members to smaller groups if needed (any groups I construct from students without team member preferences will aim to have 4-5 students).
Interim report: who votes in NC? Using the NC voter files, identify who votes in NC so that these data can be used in conjunction with surveys and other data in order to predict outcomes of NC congressional elections. Present results in short (<5 min) video presentation to upload. Page limit is 8 pages.
Weekly updates on predictions starting 10/16: provide point estimates of the probability of the probability that President Trump is re-elected and the probability the US Senate remains in Republican control; please also provide estimates of the predicted two-party vote share (point and interval estimates) for the 13 NC Congressional elections and the Tillis vs. Cunningham Senate race (point and interval estimates) – have one person do this in Quizzes/Tests.
Final report: Items for Final Report (12 page limit) due 8am on Election Day:
Presentation after election: discuss what went right and wrong with your modeling and assumptions based on election outcomes (e.g., winners, who voted, etc.)
In addition to the usual grade, a best prediction winner will be chosen, with all due honor and glory, based on the following algorithm developed by students in STA 340 (decision analysis), shown here for the outcome of the 13 NC congressional races. We’ll include the NC senate race in this algorithm as well.
For a predicted vote share \(p\), truth \(\theta\), and confidence interval \(CI\) with bounds \(p_\min, p_\max\):
\[ \begin{aligned} L(p, \theta) &= 100\cdot|p-\theta| + I\{\theta\notin CI\}\cdot200\min\{|p-p_\min|,|p-p_\max|\} + 10\cdot|p_\max-p_\min|\\& + I\{0.5\notin CI\}\cdot\big(10\cdot I\{wrong\} - 3\cdot I\{right\}\big) \\ S &= -\sum_{i=1}^{13} L(p_i, \theta_i) \end{aligned} \]
Point estimates within the \(CI\) are penalized linearly and outside the \(CI\) are penalized linearly with a higher slope. There is an additional penalty for wide confidence intervals, but only at 1/10th (or less) the cost of missing the point estimate.
The term on the second line of the loss function only comes into play when a confidence interval did not include 0.5. That is, the team was very certain of calling the race for one side or the other. Being very certain and wrong incurs an additional loss. Being very certain and right incurs utility, but with lower magnitude than being wrong. The idea behind this choice is some races should be easy to call and being unambiguously wrong with the confidence interval should hurt more.