About me

Chris Donnay is currently the lab manager for the Data and Democracy Lab at Cornell University. As lab manager, he has extensive experience translating real-world problems into technical solutions for non-profits and advocacy groups. He has a knack for communicating high-level mathematics to a wide range of audiences, with a focus on clarity and actionable insights. He got his PhD in mathematics from the Ohio State University, where he studied data science, probability, and statistical modeling, with applications to redistricting and voting. Chris also holds a Master's of Science in education from the University of Pennsylvania, where he was a high school math and computer science teaching fellow.

CDonnay

Portfolio

Below are a smattering of Chris' past research projects, which include data science and modeling projects for democracy non-profits and advocacy groups, applications and Python libraries for civic good, as well as more traditional mathematical research projects.

VoteKit

VoteKit is a one-stop shop for all things election modeling in Python. With an end-to-end pipeline for constructing ballots, running elections, and analyzing outcomes, VoteKit seeks to be a tool for academics, researchers, and practitioners alike.

Chris is the principal developer, manages the other open source contributors, writes documentation and tutorials, and designs, prototypes, evaluates, and implements new features for the package.

Read about more about VoteKit in the Journal of Open Source Software.

Skills Used

  • Python
  • Package management: PyPI, Poetry
  • Statistical modeling
  • Documentation: Sphinx, Jupyter, Markdown
  • Open source software: GitHub
VoteKit Analysis
A plot showing how different models of ballot generator perform with respect to proportionality under an STV election, generated by VoteKit.
MI Analysis
A restricted area around Detroit where we focused our efforts on improving partisanship. We refer to this as a "partial scramble", where we fix the rest of the state and only perturb this area.

The Voting Rights Act in Michigan

In 2024, Michigan's legislative maps were struck down as being in violation of the Voting Rights Act in the Michigan Supreme Court case Agee v. Benson. When it came time to create new maps, the advocacy group Voters Not Politicians asked us to help them model ways to balance different metrics of partisanship with the Voting Rights Act.

Chris created a model of redistricting plans in Michigan, collecting data from public and private sources in order to produce maps that balanced different metrics of partisanship with the Voting Rights Act. The results of our study influenced the creation of the new legislative maps used as a remedy for the case.

While the non-technical summary for Voters Not Politicians is not publicly available, you can read the technical report which was included in Chris' thesis.

Skills Used

  • Geospatial data: Census, electoral
  • Markov chains: optimization, sampling
  • Technical solutions for political advocacy
  • Technical communication with non-technical audiences
  • Statistical modeling

STV City Council Election in Portland, OR 2024

In November of 2024, Portland, OR held their first city council election with a single-transferable-vote (STV) system, electing 12 council members across 4 districts. There was concern about how STV might impact the ability of communities of color to participate in the process.

Chris provided Python support to a post-mortem study of the election, including processing of ballots with pandas, analysis of the election using VoteKit, developing visualizations with matplotlib, and explaining the code in a Jupyter notebook. The results showed that dominant media narratives about ballot errors by people of color were misguided, and that STV actually helped increase representation of communities of color.

Skills Used

  • Data cleaning: pandas
  • Data visualization: matplotlib
  • Statistical analysis: VoteKit
  • Python
  • Non-technical communication
Analyzing candidate similarity
Understanding how similar candidates are in an STV election. Candidate pairs (i,j) that are more green show a kind of "mutual boosting", where if candidate j appears on a ballot, candidate i is more likely to appear on a ballot as well. This reveals a slate of candidates (Avalos, Routh, Dunphy, and Ender) who all mutually boost.
GBBS Analysis
Freddy? Is that you?

Predicting Bake Off Winners

As part of the Erdős Institute Data Science Bootcamp, Chris and his team trained supervised learning models using scikit-learn—regression, random forest, k-nearest neighbors, and Naive Bayes— to predict winners and uncover key drivers of success in the Great British Baking Show. Ultimately, they found that the most important factor in predicting winners was performance in the technical challenges (receiving a handshake from Paul Hollywood turned out to be far less predictive than hoped!).

At the end of the bootcamp, they presented their results to a panel of industry experts, who specifically highlighted the team's clear communication of modeling limitations and results. They were awarded first place.

Skills Used

  • Data collection and cleaning
  • Machine Learning: KNN, Random Forest, Naive Bayes, Regression
  • Statistical analysis
  • Technical communication
  • Python: scikit-learn
  • Data visualization: matplotlib

Districtr 2.0

Districtr is a web-based tool for creating districting plans. After many years of faithful service, Districtr 1.0 was retired and replaced with Districtr 2.0. Districtr is used by many localities as their official submission tool for the redistricting process.

Chris was the product manager the development and public release of Districtr 2.0 with a remote team of five full-stack developers. Together with his team, Chris ensured that the app was robust and scalable, that there was smooth communication between the dev team and the PI, and that the app was easy to use and understand.

Skills Used

  • Project management
  • Non-technical communication
  • Full stack development
  • Geospatial data
Districtr 2.0
A map module for drawing Pennsylvania's congressional districts in Districtr 2.0.
Districtr 2.0
This histogram shows how the number of seats won by Democrats in the Ohio Senate varies based on the bias of the underlying House map. While the underlying House map is incredibly biased, the distribution of seats won by Democrats in the Senate does not separate as widely.

3:1 Nesting Rules in Redistricting

A nesting rule is a rule that requires that a senate district be composed of adjacent house districts. Ohio and Wisconsin have 3:1 nesting rules (3 house districts per senate district). How does this affect the space of feasible plans? How does this affect the ability of a map-maker to gerrymander?

Chris implemented a novel algorithm for generating plans that satisfy a 3:1 nesting rule in Python using Markov chain methods. He validated his model using different mixing heuristics and statistical tests. He found that while 3:1 nesting has little impact on the number of seats won by a party in comparison to unnested plans, it does curtail the impact of gerrymandered house maps and the ability to gerrymander more broadly.

Chris' paper is in revisions with the journal Statistics and Public Policy. Read the preprint here.

Skills Used

  • Python
  • Technical communication
  • Markov chains
  • Geospatial data

Asymptotics of Redistricting

While much of Chris' work has been focused on applications of statistical models and data science, his PhD is technically in theoretical mathematics, and he has a particular interest in combinatorics. This project was part of his PhD dissertation, and is set to appear in the American Mathematical Monthly, which is a prestigious journal known for its high standards of writing.

How many ways are there to redistrict an n×n grid into n districts? How does this grow with n? What can this tell us about the shape of a typical district? This mathematical research explores the combinatorial properties of redistricting and their implications for understanding gerrymandering. We find that the number of maps grows exponentially in n^2, and that a random districting plan is likely to be highly non-compact.

Read the preprint here.

Skills Used

  • Combinatorics
  • Asymptotic analysis
  • Technical communication
Tiling of the 8x8 grid
A completed redistricting plan for the 8x8 grid (equivalent to a tiling!). This completion is part of an algorithm that generates a lower bound on the number of possible redistricting plans for the 8x8 grid.

Say hello!

Design: Tooplate

Chris used generative AI to assist in formatting this website. As part of a commitment to open source transparency, as well as a desire to show the power of generative AI, you can find the chats he had with Cursor here and here.