Alibaba Blinks: Building an open source, data-driven cloud empire in real-time

Acquiring data Artisans, the vendor leading development of open source Apache Flink framework for real-time data processing, is the latest move from Alibaba. Where does this fit in Alibaba's strategy to grow its cloud?

Alibaba starts international cloud sector expansion in Poland Central Europe's cloud market could be just what Chinese e-commerce giant Alibaba is looking for. Read more: https://zd.net/2CWYwBj

Open source business comes and goes, clouds are here to stay. That's one lesson 2018 has offered, and part of the reason why vendors trading in open source/open core software are adjusting their strategy. Last year saw a number of said vendors change their licensing, adding clauses meant to restrict cloud vendors from "strip mining": Taking open source platforms and offering them as managed services.

The reasoning behind this strategy (Commons Clause) that has spurred debate in the open source world is that cloud vendors are getting something for free, and then making money out of it. In addition, in many cases this creates competition with platform vendors, who also offer managed versions of their software, creating a "frenemy" situation.

Also: Alibaba's data Artisans acquisition breathes new life into Apache Flink

In the case of Alibaba and data Artisans, the conflict ended before it began: Alibaba just acquired data Artisans for a total of €90 million. Data Artisans is the vendor leading development of the open source Apache Flink framework for real-time data processing, as it employs a major part of its core committers.

Flink is one of the key players in data streaming frameworks, enabling processing of data in real-time. Such frameworks are becoming increasingly important, set to eventually become the de-facto entry point for data ingestion and processing. All major cloud vendors at this point either have their own offerings, or offer managed versions of open-source frameworks such as Apache Kafka or Apache Spark, or both.

Alibaba goes cloud, data, and AI

Alibaba is well on its way to becoming a major cloud vendor, too. On a global scale, that is, because it already is one back home in China. Alibaba is often thought of as the Chinese Amazon, but this is only partially true. Alibaba, like Amazon, started out from retail, in which it is the dominant player in the Chinese market. Alibaba functions as a platform on which retailers can sell and manage aspects such as logistics. 

Also: CES 2019: Alibaba's Tmall Genie assistant comes to BMW vehicles in China


When it comes to cloud services, however, Alibaba wants to diversify from AWS by offering a value-add proposition instead of trying to play catch-up with them. The computational infrastructure needed to deliver platform services to clients is also used to offer them domain-specific solutions tailored to their needs. This is in stark contrast to AWS, which offers infrastructure and tools and lets clients build their own applications.

"Convincing clients to go cloud is easy. But we need to convince them to go Alibaba Cloud, and that's where we made a different choice: vertical, vertical, vertical, value, value, value," said Wanli Min, AI and data mining scientist at Alibaba Cloud, when discussing Alibaba's strategy in 2017. Min is a key figure in devising and implementing Alibaba's strategy, which is based on using data and AI to offer value-add services. 

alibabas-1-billion-ipo-the-numbers-to-know.jpg

Alibaba's strategy is based on an ecosystem, and it leverages this ecosystem to offer domain specific, data science-based intelligence applications too.

Alibaba's strategy is built around creating an ecosystem, and Min highlighted this when discussing Alibaba's offering compared to specialized domain solutions, focusing on data science: "We can support clients going into uncharted territory. Our Brains can support you, and you will not be fighting by yourself -- you'll have an army of data scientists on your side." 

Also: Is Poland the opening Alibaba needs to break grip of Amazon, Microsoft, Google on cloud?


Brains is the name Alibaba uses for its AI-powered domain-specific solutions, and "an army" is literal in this case: Alibaba has ~50,000 employees, 20,000 of which are technical. Min is the leader of a cross-functional team of 300 people: 50 data scientists, 200 data engineers, and 50 business experts. Min said they have managed to recruit people from places like Japan, Europe, and the US.

Let's quickly recap Alibaba's moves in terms of global expansion in the last few months: Landing in Europe by inking deals with Spanish and Polish retailers. Developing its Tmall Innovation Center to help sellers develop products. Collaborating with the likes of BMW and Intel on AI. And finally, joining the Open Invention Network patent protection group, the largest patent non-aggression community in history, and acquiring data Artisans.

Alibaba goes open source and real-time

Supporting open source actually makes lots of sense as a piece in Alibaba's strategy. Open source represents infrastructure for data and AI-driven solutions. The key to making such solutions work is data and expertise, and Alibaba does not seem to be in short supply of those. Alibaba is not in the business of selling managed services either, so why would they not want to invest in open source when they have no reason to compete with it? 

Also: Alibaba's Hema stores changing the supermarket experience


This, and the need for scale, can explain Alibaba's special relationship with Apache Flink and data Artisans, leading to the acquisition. Min explained that Alibaba's infrastructure was based on a Lambda architecture, i.e., one that has two lines of data processing, one for batch and one for real-time. Flink enables this to be collapsed in a single line (Kappa architecture), saving resources and enabling faster insights in the process.

Alibaba has been long involved in Flink, having developed its own extensions to deal with their requirements, called Blink. As Alibaba needed the expertise and support that data Artisans has to offer, as well as its hardened, enterprise version including features such as patent-pending technology for serializable transactions, Alibaba has also been a data Artisans client. 

kappa-architecture.png

The Kappa architecture was introduced to flatten and simplify the Lambda architecture, and it relies on modern streaming engines. (Image: Datanami)

At Alibaba scale, leveraging Flink can translate to substantial savings and competitive advantage. Instead of relying on an external entity for what is strategic software infrastructure, why not bring this in?

Alibaba's open-source compatible strategy means this can also work well for data Artisans, without forcing it to change its course. Kostas Tzoumas, data Artisans CEO, has repeatedly emphasized open source as a core principle for the company. Data Artisans has also been reluctant to raise capital, as part of a strategy to maintain control of the company and grow organically.

This deal may mean that data Artisans can have its pie and eat it, too, injecting a healthy dose of cash, while maintaining control. And Alibaba has committed to contribute Blink to core Flink. We would not be surprised, however, to see data Artisans push a Commons Clause for Flink in the near future as well. Other cloud providers are now direct competition, after all.

Also: Alibaba Cloud doubles capacity in Indonesia with second data centre

Another thing to keep an eye on is how this will affect the evolution of Apache Beam. Beam is the closest thing to a standard in the data streaming world, enabling streaming framework workloads and processing to be ported among different frameworks. Beam was initiated by Google for its cloud, and gained support from Flink and Samza, but not Spark. With Alibaba now behind Flink, this means de facto support from another major cloud vendor.

Previous and related coverage:

CES 2019: Intel and Alibaba partner on artificial intelligence

Alibaba is combining its cloud with upcoming Intel hardware to use AI algorithms and compute power for 3D athlete tracking.

Top cloud providers 2018: How AWS, Microsoft, Google, IBM, Oracle, Alibaba stack up

Here's a look at the annual run rates, hybrid cloud strategies, and approaches to artificial intelligence and machine learning among the public cloud providers.

Inside Alibaba's Robot.He robot dining experience

Nestled inside one of Alibaba's Hema stores in Shanghai is the company's futuristic dining experience, Robot.He, where food is sourced, stored, and delivered by robots.