Raiders of the storm: The data science behind weather prediction

What kind of data and techniques are used to model and predict weather and climate? How do you reduce uncertainty and communicate complexity? Are Harvey and Irma signs of climate change, and is it going to get worse?
Written by George Anadiotis, Contributor

Video: Apps to help with disaster preparation

Scott Capps is not your average weatherman. It was only after getting a bachelor's degree in business and having worked in the field for about a decade that he decided to go back to college to pursue his lifelong passion: Atmospheric Science.

As a UCLA postgraduate researcher, Capps got skills that include numerical weather modeling, programming, statistics and high-performance computing, and experience in working with atmospheric datasets.

Capps combines the skills of a data scientist with the background of a meteorologist. He is now running his own business, working with utilities and public sector organizations. In addition to publishing several peer-reviewed articles on climate, Capps' work includes multi-institution projects on public policy guidance and disaster preparedness.

His insights on data and techniques for weather prediction, how they can be used and communicated more effectively and the intersection of data science and climate carry additional weight.


How effective can data science - backed weather prediction be? Image: NOAA

Predictive models and machine learning

Weather models are at the heart of what Capps does. They are used both for forecasting and to recreate historical data. Increasingly however over the last decade machine learning (ML) has come to be applied in atmospheric science.

"Machine learning takes weather data and builds relationships between that data and whatever predictors we are interested in," Capps explains. ML in atmospheric science is in its infancy, but it's seeing exponential growth, and in that respect this domain is not different as ML is being adopted pretty much everywhere these days.

The problem is, ML won't necessarily tell you why. As a scientist, Capps is always interested in knowing the whys. Not just to satisfy his curiosity, but for very pragmatic reasons as well. This would allow improving physically grounded models as well as providing more clear explanations. Capp explains:

I want to know why does a model perform the way it does, what are the physical processes that it latches to? If a cloud is not forming where a model said it would, why is that? Is it the soil, the moisture, the wind? If we can understand this, then we can know what atmospheric processes are missing from our models and try and represent them.

Doing that, Capps says, creates a reinforcement loop: ML can help improve physically grounded models, which means things become less challenging for ML models, and by combining both approaches they can get better results. The challenge remains however for ML models to become more interpretable.

"Over the last years I have come to realize that in the foreseeable future we are always going to need ML to be able to provide results at the level our clients need. As an atmospheric scientist working with ML experts, I'd say explainable AI is the next step," says Capps.

Satellite imagery and sensor data

Atmospheric science relies on a combination of data, but Capps explains that today the primary source is satellite imagery. That does not mean pretty pictures though.

Satellite imagery comes in different sizes and shapes: some satellites operate in the black and white spectrum, others in infrared, some imagery can be useful to identify and measure clouds, others to measure winds over the oceans or convection.

And what about sensor data; are they used as well? It depends. Capps says they are mostly used when doing predictions at a local, granular level to ground truth weather models, when using reliable equipment.

Ingesting live data into weather models is another use case, but for Capps this has not proven to be that useful. The reason he cites is that by the time users get to look at forecasts, the benefit of live data ingestion is gone: after three to six hours the weather model equations have taken over.

Capps explains that today most of his clients rely on satellite imagery to validate models, to generate short term forecasts and to determine whether a forecast is correct. If a model seems to be producing results in accord with incoming data, trust in the model increases.

ML is also used on satellite imagery for pattern matching. If ML recognizes a pattern that has appeared again in the past, this can be used to predict what is going to happen in the future.

Predicting Harvey and Irma

But how accurate are these predictions in the end, and how much can we rely on them to prepare for disasters such as hurricanes? Capps is adamant:

Models are approximations. The further ahead in time you go, the more their accuracy degrades. People need to be educated on the fact that these are models making assumptions.

If people knew what kind of assumptions these models make, they would be asking how on earth can these models be even remotely correct.

It blows my mind to think of the assumptions these models make and are still able to forecast the path of a hurricane at the remote location where it will land. It's just crazy.

Case in point, Harvey and Irma. Capps says prediction success can be measured in many ways, but he does think that NOAA did a good job:

For Harvey, they pinpointed the fact that this system will stall, do a U-turn and stay over Texas and dump a tremendous amount of water while there. Harvey blew from a tropical storm to a category 4 - 5 hurricane within a day.

That was an explosion in energy, they were on top of it and when their confidence was above a threshold they released and communicated the announcement effectively and efficiently.

For Irma, it was determining when and where she might make a turn. She was heading West-Northwest for a while. They were communicating she was going to turn North-Northwest and impact Florida.

It's very difficult to predict exactly where she would make a turn in her trajectory, because it depends on so many factors. I think they did the best they could given the uncertainties in deciphering and interpreting model outputs.

Others may disagree of course, and I'm only saying that from a remote location that was not impacted. But what it comes down to is that Florida needed to be prepared, and that was communicated way ahead of landfall.


How easy is it for the general public to read visualizations? Image: NOAA

The cone of uncertainty

Communicating uncertainty and complexity is a key topic when it comes to complex data analysis. And it gets even more complicated when dealing with an entire population, as people have different histories, perspectives, biases and so on.

By now it should be clear that prediction models come with uncertainty built in. To communicate that, we have the so-called cone of uncertainty.

This visualisation is meant to convey a projected path for hurricanes, based on averaging many model outcomes. Reportedly however this was misinterpreted by some people, and some have stepped forward to propose alternative visualisations for such cases.

"There is no doubt the National Hurricane Center has done a ton of work on the cone of uncertainty, and tested it with various audiences to see what their response would be. I have seen this work in numerous conferences over the years, and I don't know of a better way," says Capps.

He goes on to add that even if someone came up with a better way, there is the risk of mis-communicating that new and better way. Perhaps running the risk of re-educating at a time of emergency would not be wise, but that's only if we assume the public is educated in the first place.

"I think a part of the public would have to be re-educated. The level of knowledge varies -- it's the chaotic beauty of a human audience," says Capps. "But if anything, people need to be educated on the fact that these are models and they make assumptions."

Climate change?

Capps has some experience in education too, as he co-founded a non-profit with the mission of educating students on the science behind climate change. He says that while talking to adults about the subject was frustrating, as many wouldn't listen, children provided a different experience: "Kids were very open to getting the science behind it. We just taught the science, they made the extrapolation themselves: we don't know what the future will be like if we continue down that road."


Temperature is the one signal that can't be misinterpreted. Image: Xkcd

Or do we? Since Capps and most of his clients are based in California, a big part of their work has to do with the Santa Ana winds and how they affect local climate. In this context, Capps has gotten to do an adaptation of global climate models, downscaling them to local scale and detail and working with data that go back to the late 70s.

Looking at projected results for the next 50 to 100 years was sobering:

You can imagine the uncertainty when we talk about models and projections at this scale. But the one signal that really stood out was temperature.

The increase in temperature as you go up in elevation in local mountains was dramatic. This will impact our ecosystem, which will also have to go up in elevation to compensate. Models for temperature have a high skill.

When you move to things like precipitation, models have a harder time, so I can't say anything on whether we're going to dry out or not. It's hard enough to predict precipitation for the next couple of days

Capps believes climate change is real and linked to human activity. But he points out that while in the past it was not something people talked about, even in the wake of catastrophic events, now it's the first thing people will mention, which is also irresponsible:

Some time ago it was called global warming, so people interpreted that to mean that the temperature would constantly rise across the globe, which is certainly not the case. Climate change means we're going to have more extremes, which is a better way to characterize it.

Take hurricanes -- you got a tropical storm going to a category 4 hurricane within a day. But that does not necessarily mean that every year it's going to be worse -- we've also seen quiet years in the past decade. People need to understand it's a chaotic intermix of processes.

There's never going to be a straight line up, it's going to be a winding, fluctuating path. But what we've seen so far is that when the conditions are right, Atlantic hurricanes grow explosively. And that needs to be investigated scientifically.

Editorial standards