The Data and AI landscape 2021: It a MAD, MAD, MAD world

Data and AI companies used to be considered tech companies. But the wave of IPOs and proliferation of unicorns is turning this market into its own sector. Our question is how sustainable is this MAD new world?


United Artists

While much of the world closed down during the COVID-19 pandemic, the gates opened wide for financing early and late-stage startups in the data and AI space. A team at FirstMark Capital, led by partner Matt Turck, has issued the latest of its annual encyclopedic data and AI landscapes. At one end of the spectrum, Turck and colleagues wrote, there is the maturation as evidenced by the breakout IPOs of Snowflake and Confluent, reassuring the VC community that revenues and growth are real and that the onrush of funding is not triggering a replay of the dot com bubble. 

And yes, the report states that the promise of AI is now being borne out. We're seeing it crop up, not only in the tools that business analysts use like BI but also the fact that in-database machine learning is becoming a checkbox feature, not to mention that Oracle, SAP, Salesforce and others are now routinely incorporating machine learning into enterprise applications. So we're seeing fewer headlines about driverless cars and the world is not waiting for a self-driving Uber to pull up, to prove that AI is real.

A few months ago, FirstMark went MAD -- it introduced the MAD Index of publicly listed machine learning, AI, and, and data companies. The significance is that there are now enough of them to list on their own, as opposed to being grouped under the more general technology umbrella. And the list -- at the time roughly a dozen companies plus or minus -- have gone public recently (within the past five years).

Of course, making this all possible is the venture community. Citing data from CB Insights, venture funding surged 157% YoY by Q2 of this year, while public financing, either through IPOs, direct listings, or SPACs was up over 6x in the first half of 2021 compared to a year earlier. The one indicator that has dipped is acquisitions, in all likelihood, because VC-pumped valuations are making companies such as Databricks (at $38 billion) far too expensive to acquire, even for the likes of the Microsofts out there.


The  2021 Machine Learning, AI and Data (MAD) Landscape

Credit: FirstMark Capital

The report began with an overview of the ecosystem, discussing financing activity, and then spotlighting key technology trends in data infrastructure, analytics, and AI. It's too voluminous for us to review line by line, but you can see the complete report here. A thumbnail image of the landscape is shown above. A more legible rendition can be found here, and if you'd like all the gory details, the FirstMark team put together a detailed spreadsheet that you can access here.

We'll stick to some overriding impressions.

As we (and others) have discussed before, we no longer think of Big Data as anything special. When you can fire up a Snowflake data warehouse with petabytes of data and turn on autoscaling, suddenly, you're harnessing the cloud to perform what formerly required data engineers to set up Hadoop clusters, run Zookeeper, then manually code those MapReduce (or later Spark) routines. And with the capability to analyze data outside relational tables, like JSON, Parquet files, and project graph views, suddenly those 3 V's that supposedly defined big data are now supported by your cloud data warehouse or lakehouse. Describe it as the  "Modern Data Stack," or refer to it as the phenomenon that the author's named "The Big Unlock."

We all keep reading Snowflake's numbers and hear rumors on when Databricks will finally IPO, but Amazon Redshift, Azure Synapse Analytics, and Google BigQuery continue to be among the fastest-growing services for their respective clouds.

And thanks to AutoML, and the emergence of tooling and services covering the entire lifecycle of building and running those models, you can say the same about AI. There's the continued dance of players who are morphing into generalized platforms. ML platforms, from the SageMakers and Vertex AIs to the Dataikus and Data Robots of the world, are broadening themselves into full lifecycle services. You can also say the same about other parts of the data ecosystem. Confluent doesn't merely want to run your Kafka streams, but also your real-time data warehouse.

Nonetheless, the authors cite the usual centrifugal forces, with growing attention to data meshes taking center stage (we'll have more to say about that in an upcoming post). But the move to consolidation has hardly put a brake on venture activity. According to CB Insights, $38 billion flowed into AI startups, which was about as much as went in during all of 2020; and in that first six months, there were over 50 rounds exceeding $100 million. The financing community has spread beyond VCs to hedge funds and that lovely acronym of ventures poetically abbreviated as SPACs. And many of those financings have been instigated, not by hungry startups, but by financiers eager to get in further.

The authors are bullish on the outlook for analytics and AI in general. But the report also cites explosive growth, not in revenues, but the number of startups in niches like reverse ETL, data quality, data catalogs, data annotation, and MLOps where investment appears to be ahead of the market's readiness to absorb it.

So, there's a lot of crazy cash out there. We have a few rhetorical questions. Does a company like Databricks really need $3.5 billion in the bank? With the proliferation of venture-funded startups exceeding $1 billion valuations, has the term "unicorn" grown outdated? Are financiers rushing in because of FOMO -- fear of missing out?

But our main question is are we headed into another bubble? We had an offline email exchange with lead author Matt Turck on that very issue. His take is that "there are more quality companies than ever." There are repeat founders coming in with strong track records. A good example is Dataiku cofounder Florian Douetteau, whose previous act was the successful exit with the Exalead search engine, which was acquired by Dassault Systemes roughly a decade ago.

The report also cites the tight job market. To some extent, that's old news -- there has long been a shortage of data scientists and data engineers ever since we began mouthing the words "Big Data." Like the Java developer shortage during the dot com era, these are issues that are largely solvable; witness the flood of enrollments in college data science programs. Our concern about talent is on other parts of the food chain – seasoned managers, executives, sales, and marketing. We've been hearing a number of vendors tell us of the challenges in filling those slots. In most cases, this is not about schools turning out grads with the right degrees. Our sense is that the lack of managerial and go-to-market talent could put some brakes on growth.

Turck also cites solid growth in annual recurring revenues for many of these startups, and that, unlike the dot com era, which was about the promise, the current era is about deployment. We agree. Our take is that the cloud is making a big difference here. In past eras, organizations would have had to put their capital budgets where their mouths were, acquire, deploy, and maintain more servers. Conversely, the cloud allows almost instant scale-out without the red tape of capital budgeting.

In the general economy, there are potential storm clouds on the horizon, such as the likelihood that money will get more expensive as the Fed finally starts raising interest rates, not to mention the structural hurdles posed by globally disrupted supply chains. We do think that for now, we are in peak times for startup financing. We wouldn't be surprised by a spate of IPOs or other exits in the next 12 months followed by a slowdown in venture and other forms of financing. Some degree of market shakeout for new ventures is likely -- we saw this with the initial spate of Big Data startups during the 2015/16 timeframe. But then again, we also expect success for many of the current crop of data and AI startups, as economic disruptions are the very problems that they are designed to take on.