Today's Earth scientists are spending less time standing in fields collecting soil samples, and more time behind a computer screen. Most geoscience data is automatically collected by sensors and satellites. The big challenge is making sense of all that data so that scientists can get back to what they do best: Observing the world, asking questions, conducting experiments, and finding evidence.
Scientists use large, publicly available datasets from government programs such as NASA, NOAA, and USGS (that's the National Aeronautics and Space Administration, the National Oceanic and Atmospheric Administration, and US Geological Survey, in non-acronym speak). Many Earth scientists also have private sources, and combining these public and private datasets is difficult and time-consuming.
If a scientist wants to look at satellite images to gain a better understanding of climate change, for example, they have to spend hours sifting through data and managing several software programs.
"You want to reduce the time that you're just managing data and get to those real meaty scientific questions," says Dr. Annie Burgess. She is the lab director at the Earth Science Information Partners (ESIP) Lab, which funded a project led by Dr. Ziheng Sun, Principal Investigator, Center for Spatial Information Science and Systems at George Mason University. He developed Geoweaver, a program that solves the big data challenges that earth scientists face.
Sun developed a web-based system for deep learning on multiple datasets. It provides geoscientists with a system for making sense of public data (such as satellite images from NASA and NOAA) and private data (such as field observations). The project, called Geoweaver, helps earth scientists effectively use machine learning to sift through data so they can understand what's really going on with our planet.
"ESIP Lab Geoweaver is an online application for scientists to manage their research workflows," Sun explains. "It could be installed anywhere and accessed from anywhere. It is a life-saving project for people coding in multiple languages, dealing with multiple facilities and multiple datasets to carry out their science workflows."
Machine learning isn't new, but previous versions were too slow to support the real-time data that Earth scientists need. Today's computational power is much better, so Sun's program can train on field data in much less time. He says the old, slow versions didn't work, so geoscientists don't have faith in machine learning. That's why he created a program that combines the newest AI techniques with the programs that they already know and trust.
"Geoweaver will accelerate the adoption of artificial intelligence techniques in science," Sun says. "It allows scientists to combine their legacy programs and datasets with the cutting-edge deep learning algorithms to create AI models which can more accurately and more automatically understand and predict our environment."
His research group is already using the program in the lab every day for traditional geoscience research, such as studying crop yield prediction, agricultural drought, flooding damage assessment, and air quality prediction.
This research is more important now than ever because traditional models for agricultural markers such as crop yield didn't factor in the rising global temperatures.
Burgess explains, "In a time of change in climate, really understanding something like crop yield, which affects the economy, affects global food supply." She adds, "As the climate is more variable, you can't rely on standard modeling techniques. And so the type of work that Ziheng Sun is doing where you're using machine learning and satellite imagery, it's going to prove more robust output for the future in a time of changing climate."
That's why the ESIP lab is providing small grants to help scientists like Dr. Sun develop prototypes that combine classic science techniques with the advantages of big data.
Geoweaver was designed with geoscientists in mind, but it can also be used by other scientists or even people working with data in completely different disciplines. It's for people who are managing multiple servers and data sets for a machine learning workflow.
"It can be used by anybody who deals with servers, deals with multiple servers, multiple end features, and multiple operating systems," Sun says. He is using the program in his lab now, and developing the final version of the open-source software is expected to be available in the next six months.