The influenza virus spreads one person at at time.
According to the Centers for Disease Control and Prevention, an infected individual coughs, sneezes or even just talks, and airborne droplets land in the mouths and noses of other people up to 6 feet away.
Many flu forecasts are created by feeding these assumptions about how the flu spreads into a computer, a technique is known as mechanistic modeling.
The Delphi research group at Carnegie Mellon University does things a little differently. Rather than giving a computer rules to follow about how the flu spreads, the team just gives it raw data about past flu seasons.
"Instead of creating a simulation in the computer of how people infect each other, we give the computer examples of the epidemic trajectory in the past and ask it to generalize what the epidemic might be like this year," said Roni Rosenfeld, a professor of computer science at CMU and the head of the Delphi group.
The Delphi group, along with teams from other universities and research centers, submits weekly forecasts to the Centers for Disease Control as part of the Epidemic Prediction Initiative during flu season. For the last three seasons, the Delphi systems have yielded the most accurate forecasts as evaluated by the CDC.
Each forecast provides seven predictions: what week the flu will take off (onset week); what week the flu will reach its peak of infections (peak week); how high that peak will be; and what the level of circulation will be like one, two, three and four weeks in the future.
Rosenfeld explained that his group's non-mechanistic models rely on machine learning, meaning that they are designed for the computer to learn from data, as opposed to being explicitly programmed to reflect the aforementioned assumptions.
"You would expect that if you assume an explanation you believe to be true, it would help you," he said. "But the question is: to what extent is it true? Reality is always much more complicated."
The Delphi group mostly feeds their forecasting systems flu data provided to them by the CDC, but that data always reflects the state of flu circulation at some point in the past. They also incorporate data from the internet, like how often people are tweeting about the flu or searching about it on Google. Rosenfeld said that these factors are strongly correlated to how many people actually have the flu in real time, which is important to consider when making predictions about the future.
"Before you can forecast, you have to 'now' cast," said Rosenfeld.
The CDC uses results from the forecasts to coordinate the timing of information and vaccination campaigns.
For now, forecast submissions to the Epidemic Prediction Initiative are created for 10 geographical regions that span the United States. For instance, Pennsylvania is part of the mid-Atlantic region, which also includes Delaware, Washington D.C., Maryland, Virginia and West Virginia.
"These are useful to CDC because this is the level at which they make their decisions. But to other actors and stakeholders, it would be more useful to have more geographically focused information, such as: when will the flu peak in Pittsburgh?" said Rosenfeld.
Rosenfeld said he wants to narrow future forecasts to the city level and produce them on a daily basis, although he said both would be a challenge in terms of collecting enough data to make accurate predictions.
Submissions to this year's Epidemic Prediction Initiative will begin in November.