The secretary of state in the US state of Georgia recently announced that the state would conduct a risk-limiting audit of the presidential election votes. This means they will collect a random sample of ballots, and count them, comparing to the outcome of the original count.
What they care about is the winner, not the exact count. But since the election results were so close (0.2% of almost 5 million votes), the sample size needed to confidently say that the outcome was correctly determined was so large that the entire population of ballots needed to be recounted. By hand, not by machine.
Where does the random sample idea come from? It is actually quite old. In the late nineteenth century, social scientists believed that one could only use complete enumeration, a census, to learn about a population. Anders Kiaer was director of the Norwegian Central Bureau of Statistics, and suggested at a meeting of the International Statistical Institute (ISI) in 1895 that one could learn from samples from the population. He introduced the idea of representative samples, in other words samples that looked like the whole population at least in some aspects, such as age and gender. In 1903, at another ISI meeting, the French statistics director Lucien March suggested that one could select the sample at random to avoid the arbitrariness in deciding how to select a representative sample. The idea of sampling as a way of learning about populations was not universally accepted until two decades later, when in a 1934 paper by Jerzy Neyman, one of the founders of statistical science, established (albeit still with some controversy) that random samples were better than systematic samples.
Of course, any cook knows that if you are adding spices to a soup, you must stir the soup first in order to know how it tastes after adding the spices. Otherwise you get a spoonful that is either without spice, or has too much spice. The stirring is what makes the sampling random (and the soup tasty).
There are several ways to choose random samples. One starts with determining a sampling frame, a list of all the people in the population. A simple random sample selects all members of the list with equal probability. But one can also divide the list into groups based on, for example, age, and draw individuals at random from within each age group. This is called stratified random sampling. And there are other methods for random sampling.
The main difficulty in sampling is to determine the list of the population members. In countries where everyone has a unique identifier or registration number this is relatively straightforward. In countries where not everyone has a number that you can find out it is much harder.
The idea of randomness comes up in other areas of statistics as well. Think about a field experiment to determine which variety of wheat provides the highest yield. One way of doing that is to divide the field into plots and plant the different varieties in different plots. The left part of the figure below shows a possible (systematic) way of arranging four varieties of wheat, named A, B, C and D. But what if there is a fertility gradient in the field, such that the soil on the diagonal from upper left to lower right is more fertile than the rest of the field? Then variety A would be deemed much better than the rest, whether this is true or not. One could of course test the soil before the experiment. But what if varieties B and C affect each other, so that when grown next to each other neither does well? Or any of a myriad other issues arises? There are too many to think about. The solution here is to randomize the design, or in other words put the four A’s in random locations in each row (or in the entire design). The picture on the right randomizes varieties to plots within each row, and we see that now the diagonal has several different varieties.
The idea of randomizing varieties or a treatment to plots in the field came from another founder of statistics, Ronald Fisher, in a paper published in 1926. There are lots of ways this randomization can be done. Of course, the design on the left could also be obtained by a random draw — unlikely (happens 3 times in a million draws), but possible.
The Monte Carlo approach to simulation of random events came up in the Manhattan project in 1946 as a way of trying out ideas without having to actually build a hydrogen bomb. The mathematician Stanislaw Ulam proposed this, and it was implemented on the ENIAC computer by John von Neuman and the team of women programmers. Monte Carlo methods are nowadays used in a great variety of situations, including numerical integration, Bayesian statistical analysis, and optimization.
In the late 1970s I was a graduate student at UC Berkeley Statistics department. My friend Richard Lockhart and I spent a lot of time at a coffee shop, discussing ideas about dissertation topics and the like. One thing we both agreed on is that statistical inference should be done in a Bayesian fashion, where one sets down a distribution for a parameter of interest, collects some data, and updates the distribution according to the data. But for the problems I was most interested in, complicated issues in population dynamics, it was simply not possible to do the updating calculations. A revolution in statistics happened in the 1990s, when these calculations became possible due to some very clever uses of Monte Carlo integration.
During my graduate student time, another revolution occurred. Stanford statistics professor Brad Efron was visiting Berkeley and teaching a course about a new idea he had: in order to figure out the behavior of a statistical procedure based on a set of data, just draw a sample from the data set and see what the procedure says about the resampled data set. He called this a bootstrap procedure. In some cases one can determine the exact probability distribution for the bootstrap, but if that does not work, one can simulate the bootstrap by resampling over and over again. It really is like magic, or at least like being pulled up by one’s own bootstraps. In 2019, the invention of the bootstrap was the reason Efron was awarded the International Prize of Statistics, the statistical counterpart to the Nobel Prize.
But back to Georgia and the election audit. In close elections, when the difference is small enough, many states either require or allow a request for a recount of the ballots. In Georgia, a candidate can request a recount when the difference is less than half a percent. A recount is obtained by running the ballots through the same machines again. The election audit, on the other hand, selects a simple random sample of ballots, compares them by hand to the output of the voting machine, and continues drawing ballots until it is sufficiently unlikely (say less than 1% chance) that a complete hand count would have overturned the result (declared a different winner). The closer the race is, the fewer ballots need to come out differently to change the winner, and therefore one has to count more ballots. If the process finds a lot of errors it is possible that a complete hand count of some 5 million votes must be completed. The audit had to be done before the secretary of state validated the results of the presidential election in Georgia.
In the end, it turned out to be a full hand recount of the Georgia vote. The most significant finding was a batch of a few thousand votes that had not been counted, and some unrecorded memory cards, but the outcome was that Biden won Georgia by some 12 000 votes. Once the vote had been certified, the Trump campaign requested a recount. It did not change the outcome very much. The final difference was 11 779 votes in favor of Biden.