How edge computing transformed marine biology research at Oregon State University

The Hatfield Marine Science Center used AWS Snowball Edge to revolutionize the collection of oceanic and coastal data.
Written by Macy Bayern, Multiplatform Reporter

With the global Internet of Things (IoT) market expected to exceed $724.2 billion by 2023, many organizations are going to be searching for a way to process all of their IoT data quickly, and in a manner that allows them to derive real business value. But the Hatfield Marine Science Center (HMSC) at Oregon State University already found an answer: The Amazon Web Service (AWS) Snowball Edge.

Edge computing is the collection and analysis of data at the site where the data is generated -- or at the 'edge' of the enterprise network. The technology offers a means of analyzing large volumes of data in near real-time, allowing organizations to gain data from IoT devices remotely.


Amazon's Snowball Edge appliance provides multiple connectivity options to copy up to 100TB in around 19 hours from edge devices.

Image: Amazon

AWS Snowball Edge is a portable device that can transfer such data into AWS for analysis, or process the data locally thanks to its onboard compute power. With the ability to store up to 100TB of data, the Snowball Edge is a durable, portable, secure powerhouse of a storage device. And HMSC used it to their advantage.

SEE: IT leader's guide to edge computing (Tech Pro Research)

One of HMSC's major research projects is the study of plankton dynamics in the ocean, which involves using an underwater microscope to capture continuous images of the organisms. The biological oceanographers then identify the organisms in each frame, which sounds simple, but not when you have 60 or 80 terabytes worth of data, said Christian Briseno-Avena, a researcher at HMSC who worked directly with the integration of AWS Snowball to their systems.

"Our original method for capturing oceanic image data involved many small hard drives, and we had to hand-carry each one to our computing center and loaded them one at a time. It would take weeks to months before we could analyze the images we collected, so it really slowed down our research. It also cost us tens of thousands of dollars per year," said Bob Cowen, director of HMSC, in an Amazon press release.

"With AWS Snowball Edge, we can now collect 100TB of data with no intermediate steps, and we can also analyze the images immediately using the onboard compute capabilities. This allows us to do deeper analysis, and we can upload all the raw data to the AWS Cloud by simply shipping the AWS Snowball Edge device back. AWS Snowball Edge allows us to access AWS storage and compute capabilities in our coastal explorations where no internet is available and allows us to move petabytes to the AWS Cloud quickly and easily where we can continue to use all the power of the AWS platform."

Briseno-Avena explained that the old system was towed behind the ship to collect data in real time. The data was stored in multiple boxes of 2TB hard drives, so the researchers not only had to haul boxes of these hard drives on and off the ship, but they also had to ensure they had enough hard drives to begin with, he said.

Edge computing was their answer.

Simplifying the process

"Our goal is to always find a robust, cost-effective solution, because we're working off of grants, and we need to make every grant dollar go as far as possible," said Chris Sullivan, assistant director of biocomputing at HSMC through the Center for Genome Research and Biocomputing (CGRB) at Oregon State, who also worked alongside Cowen and Briseno-Avena during the project.

"We really wanted to simplify the process, which helps us normalize the data, ensure that everything gets processed timely, and manage it in a much better way," Sullivan added. The main goal was efficiency, since the original process was anything but efficient.

With such a high volume of data, Sullivan, Briseno-Avena, and Cowen's work was a perfect candidate for AWS Snowball Edge. The appliance allowed them to walk onto the ship with 100 terabytes on a single device, and walk out with all of their data on that same device.

While putting so much information on a single device is both convenient and efficient, it also produces a fair amount of risk, said Sullivan. They were placing a week's worth of data on a single device, he explained, so if something went wrong, all of that time, money, and research was for nothing.

"We worked with Amazon and had another group testing to make sure the device was resilient, and robust, and all the different things that we needed to ensure that when we put this out there and collected the data on it, it was going to come back," Sullivan said. "We were going to get our data. It was going to come back up to Amazon and was immediately going to start kicking off jobs, and the data was going to get processed."

At what cost?

The major downside to edge computing and cloud processing is expense.

"Every single minute you spend in the cloud costs you dollars. The type of equipment that you use changes the cost dramatically," said Sullivan. "Whether or not you can leverage spot instances changes those costs traumatically." And of course, HMSC couldn't use spot instances with the work they were doing, so costs were already starting to escalate.

Additionally, classification was eating up dollars. "To actually identify and recognize images and do classification of those images into the different plankton categories that we had, that takes a tremendous amount of time," said Sullivan. "And so Amazon's hardware is strictly GPU and PCI bus at this point in time, and that really changed our processing time in that classification container. So every minute you spend up there, you're spending dollars. When we were leveraging Amazon to do an entire ship's worth of stuff, we were probably looking at about 30-40 days worth of work."

Adjusting technique

This is where IBM stepped in, Sullivan explained. IBM introduced the team to technology in which they placed the GPUs directly onto the motherboard, as well as part of the CPU and memory bus. IBM's tech would decrease their runtime from 35 days, to about 10 days.

"We went 4x faster on this new IBM interconnect," said Sullivan. "Our algorithm ran just the same because these new pieces of hardware, the software just compiles and works."

Since the team at HSMC shifted to IBM's hardware, they needed cloud services that could run it, causing them to now use Nimbix cloud services. "[Amazon] was our jumping off point to use the edge computing technology, but because of the continuing cost in that space we have to be cautious, because I have to make sure we answer the scientific question. Science comes first," Sullivan said.

HSMC is still leveraging Amazon in one area: the Snowball Edge. Sullivan pointed out that AWS Snowball Edge is rentable. So the team rented the applicance, and used other hardware that fit their needs.

"I think it's important to understand that the Snowball still represents a pathway for us to land the data. Even if we were to do processing out at sea with the IBM hardware, trying to do real-time, we still need someplace that's going to hold 80 terabytes of data on the fly," said Sullivan.

"So basically we leverage all this technology and all these resources that Chris [Sullivan] allowed us to use to actually make this pipeline work for us -- not only at this time, but in the future as well," said Briseno-Avena. "It's a very reliable and very cost effective model."

Also see

Image: iStockphoto/bestdesigns
Editorial standards