​How Salesforce went open source for its Thunder Internet of things cloud

Salesforce's Internet of things cloud is built on four key open source big data platforms. Here's what the back-end of Thunder, the platform behind the IoT Cloud, had to deliver.
Written by Larry Dignan, Contributor

Thunder, the scalable processing engine behind Salesforce's Internet of things (IoT) Cloud, took about a year to complete and is powered by four open source platforms used for big data analytics.

Adam Bosworth, executive vice president at Salesforce, gave us an overview of how Thunder processes the data coming in from multiple devices and end points and then surfaces it in the Salesforce user interface. Before joining Salesforce, Bosworth was founder and CEO of Keas; the exec who built and managed Google Docs and ran Google Health; and a general manager at Microsoft working on projects such as Access, XML and Microsoft Communities. He was also chief software architect at BEA Systems and co-founded Crossgain, which was sold to BEA.

Simply put, Bosworth is used to building big things. "We needed a platform for building engagement applications that are proactive," explained Bosworth in an interview. Bosworth contends that the shift to IoT is just as fundamental as the shift to the Web and mobile. People expect things to be monitored and deliver useful, relevant information.

On the customer front, connected devices can make service better, proactive and engaging. The dream: Problems will be solved before the customer realizes there's one," said Bosworth. Salesforce's opportunity is obvious. Few companies can deliver automated, proactive and intelligent service in real time. If Salesforce can abstract that layer and build it into its existing services there's a lot of growth ahead.

Bosworth rattled off a bevy of examples of how IoT is going to revamp customer relationships, but many of them were theoretical. Only an anecdote about how Amazon handled his wife's birthday present when it was damaged illustrated a real-world example today. In a nutshell, Bosworth's wife's birthday present was damaged so Amazon reshipped a new one on the fly and sent a note apologizing. "Amazon just solved the problem. You couldn't do that with humans. The software recognized, looked up my lifetime value of a customer, checked inventory and picked the fastest way to get it to me," said Bosworth. "We need to make it easy for our customers to build that application, because our customers can't build like Amazon."
Adam Bosworth

The guiding principle behind the IoT Cloud is that Salesforce had to do the heavy lifting on the background technology so both developers and business users can get value from the platform. "Business users can customize, extend and modify systems themselves. The ROI is there because business users don't have to go back to the engineering teams," said Bosworth.

To build Thunder, Bosworth's plan was to use proven open source technologies used for big data and allow Salesforce to be the user interface for the platform. By eliminating that user interface build--or last mile delivery issue--Salesforce could roll out Thunder faster. The four primary technologies behind Thunder include:

  • Spark, a general engine for large-scale data processing that's designed to be faster than Hadoop and MapReduce.
  • Storm, an open source distributed real-time computation system to process streams of data. Storm was initially contributed by Twitter.
  • Kafka, an Apache project and a messaging broker that can handle megabytes of reads and writes per second. Kafka came out of LinkedIn.
  • Cassandra, a highly scalable open source database that's deployed in enterprises such as Apple, Instagram and Netflix to name a few. Cassandra is known to outperform NoSQL.
  • Heroku, Salesforce's platform-as-a-service.

Bosworth said that IoT data is exploding. Before IoT an active customer may generate 100 events, but "most companies don't have that many active customers." Today, a device can ping every 5 seconds, 20,000 times a day. There are 100 to 1000 times as many interesting things coming in than before.

Those events highlight the scale that was required. The IoT Cloud would have to listen to billions of events a day and then distill it not something actionable. The problem: Big data logs aren't actionable by themselves, but are only useful if distilled into intelligence quickly so customer profiles can be updated. There has to be a real-time profile of what a customer is doing.

If all goes well, the Salesforce platform could distill events, show material changes and provide real-time responses to immediate problems. Based on what was happening, Thunder had to add real-time logic. "Logic is part of the application and coupled with context profile data it's easy to write," said Bosworth.

Thunder had to:

  1. Listen and digest;
  2. Apply logic;
  3. Proactively engage.

The scaling assumption is that there could be terabytes of data per customer. CRM is changing to be assisted instead of human. "Everything had to scale out and Thunder had to be a massively scalable ESB (enterprise service bus)," said Bosworth.

As far as the architecture goes, Bosworth said Thunder processes information in the following ways.

  • Incoming events from anywhere are dumped into Kafka.
  • Spark takes the profile data from Kafka and puts it into Cassandra for profile updates within minutes.
  • Storm takes data from Kafka and writes logic to handle real-time events.
  • The technologies all run on Heroku, Salesforce's cloud application platform.

To make the IoT Cloud useful to business users, Bosworth's team wrote another layer of code on top of Storm. That layer of code allows "mere mortals to write intelligent logic" to handle events, said Bosworth. This logic revolves around addressable events, time periods, profile changes and other items.

Thunder also had to plug into Salesforce's Heroku developer platform so IoT data can ultimately be added into apps.

In the end, Bosworth said that the Thunder project received a "huge benefit from using Salesforce as the UI." "We didn't have to write logic to get the data to the Service Cloud. We just had to get the right objects inside of the Service Cloud," said Bosworth. "If we had to write the user engagement side the project would have taken another two years."

Editorial standards