Data and analytics in 2020: Industry predictions

Here's my annual roundup of predictions for the upcoming year, from experts around the analytics industry.

There's no shortage of experts and opinions in the world of data and analytics. But the end of the year provides a nice forcing function to collect analyses and prognostications, then slice, dice and summarize them. This year, as you might expect, experts in the field had a lot to say about Big Data, AI, the cloud, skills shortages and more. Our harvest of predictions follows, organized by theme.

Beating a dead elephant?

What would a year-end opinion roundup be without bashing Hadoop and Big Data, while at the same time, realizing that its death is greatly exaggerated?  

Haoyuan Li, founder and CTO at Alluxio, says "There is a lot of talk about Hadoop being dead...but the Hadoop ecosystem has rising stars. Compute frameworks like Spark and Presto extract more value from data and have been adopted into the broader compute ecosystem." Li further expounds that "HDFS [the Hadoop Distributed File System] will die but Hadoop compute will live on and live strong."

Yellowbrick Data's CTO, Brian Bulkowski has a similar dead-but-alive take on things: "Big Data is well and truly dead, but the data lake looms large."  And Todd Wright, Head of Data Management and Data Privacy Solutions at SAS, reminds us that "The promise of big data never came from simply having more data – and from more sources - but by being able to develop analytical models to gain better insights on this data." 

AI in the analytics mainstream

Speaking of analytical models, and with the Big Data eulogies out of the way, let's move on to artificial intelligence (AI). But let's fast-forward past the hype-driven clichés and observe that there seems to be a slowly building consensus that AI and machine learning will become a sub-specialty within the broader data analytics landscape, rather than remaining quite so segregated from it. Currently separate teams will merge or coordinate more closely and, to some extent, the skill sets will become more unified as well.

Alluxio's Li captures this under the title "AI & analytics teams will merge into one as the new foundation of the data organization."  He says "AI is the next step to structured data analytics. What used to be statistical models has converged with computer science to become AI and ML. So data, analytics, and AI teams need to collaborate to derive value from the same data they all use."  Eugene Roytburg, Managing Partner at Fractal Analytics, concurs, saying "AI (ML) will be more clearly defined as a part of broader analytics and will have better-defined application areas and value creation. Many companies have been confused about the two..."

Opinions are forming around AI ethics and fairness too. Suraj Amonkar, a VP, also at Fractal Analytics, believes "The AI community will continue to debate and progress on the challenge of governance, privacy, safety and ethics in AI." Amonkar's colleague at Fractal, Chief Design Officer Parameswaran Venkataram, figures that "In the next year, organizations will start using 'Ethics in AI' to drive the way new applications of AI are conceptualized and designed from scratch. Given people's awareness and expectations from issues related to how they use (or are used by) technology, designing for ethical AI will become the norm eventually." Finally, Soudip Roy Chowdhary, CEO at Eugenie.ai, thinks "We would see a substantial rise in research efforts related to building a privacy-aware AI ecosystem and enabling fairness in AI algorithms."

Operations

Our experts also feel that building production processes around AI (and tools for making AI more operational) will become a bigger thing next year. Alluxio's Li explains "Machine learning with models has reached a turning point, with companies of all sizes and at all stages moving towards operationalizing their model training efforts." Todd Wright at SAS put it this way: "The market will see growth in model management...organizations will need the ability to easily register, modify, track, score, publish, govern and report on analytical models." 

Kelsie Pallanck, Senior Content Director at O'Reilly Media, has a similar, if not somewhat more skeptical, analysis, saying "ML- and AI-driven apps have arrived, and DevOps practices will play a large role in the development workflow...AIOps, though getting plenty of attention, is still young. Its benefits to IT operations are still more theoretical than actual. But stay tuned." And Peter Bailis, CEO at Sisu Data, asserts that "Data is not the sole domain of the data scientist anymore. Everyone in an organization will start acting more like a data analyst on a daily basis, and we'll see new skills and tools focused on specific use cases emerge."

The year of the cloud, again?

Under the heading "Cloud reigns," Information Builders' SVP of product and engineering, Eric Raab and Kabir Choudry, its VP of field technical engineering, say "...there are now proven solutions that are purpose-built for cloud-based operation. 2020 will see the floodgates open with organizations moving to the cloud to take advantage of the usability, scalability and flexibility of cloud-native solutions."  WANdisco CEO David Richards believes that "In 2020, the thousands of companies created before the cloud will look to join the party, ushering in a much bigger phase of cloud growth. The process will begin by shifting their data to the cloud, laying the foundation for an optimal environment for artificial intelligence and machine learning applications."

Sandeep Dutta, Fractal Analytics' Chief Practice Officer, APAC, states "Many companies focused on creating enterprise data lakes on the cloud that can help get good, reliable quality data sets in place. We are expecting to see this trend accentuate in 2020." Yellowbrick's Bulkowski has a database- and hardware-specific take on the cloud revolution, saying "The most exciting and innovative databases are leveraging hardware innovation to bring the next levels of price and performance. The cloud enables this innovation...you'll be running your databases on more and more specialized hardware, but you'll never even know it."

Can you say "multi-hybrid?"

Our prognosticators see increasing sophistication around hybrid and multi-cloud approaches, too. In fact, there's nothing but consensus. For example, Yellowbrick's Bullkowski opines "A best-of-breed architecture envisions building blocks within the technical stack, then selects not from a single cloud vendor, but from the variety of service providers. Assumptions that a given cloud provider has the lowest or best prices, or that the cost of networking between clouds is prohibitive, becomes less and less true."

Neo4j's CEO, Emil Eifrem looks at it this way: "Existing and succeeding as an ISV [independent software vendor] in 2020 will mean having a mastery of both multi cloud and on-prem offerings. Open source ISVs with a cloud offering understand how to bolster cloud deployments for those that need customization. AWS will never be multi cloud, and they are highly unlikely to be on-prem." 

Rajiv Mirani, CTO, Cloud Platform at Nutanix, says "Enterprises will continue to invest in hybrid cloud and look for greater interoperability between private and public clouds for all workloads, including legacy as well as cloud native. Customers will begin to look to vendors who can offer a software to run any workload on any location without the burden of rearchitecting or refactoring applications."

K8s a go-go

With all this talk about cloud, we'd be remiss to skip over Kubernetes (K8s), the open source container orchestration technology that seems to have taken over the entire tech scene this year. So let's note that K8s has been making particular inroads in the analytics world. Of K8s, Pallanck at O'Reilly says "...it's big and getting bigger. The pace of enterprise adoption of this leading container orchestration solution will only gain momentum in 2020."  And Haoyuan Li at Alluxio says "In 2020, we'll see a shift to AI and analytic workloads becoming more mainstream in Kubernetes land."

Stephen Fabel, Director of Product at Canonical (the company behind Linux distribution Ubuntu) shares the enthusiasm, saying "Kubernetes has become an integral part of modern cloud infrastructure and serves as a gateway to building and experimenting with new technology...We think this trend will continue at strength in 2020." But he cautions that "we may also see some companies questioning whether Kubernetes is really the correct tool for their purposes. While the technology can provide tremendous value, in some cases it can be complex to manage and requires specialist skills." On the skills issue, O'Reilly's Pallanck would agree, explaining that "A 2019 national job search on LinkedIn turned up 16,744 open positions for Kubernetes-related roles.

Mad skills

What can the industry do about the analytics skills shortage, overall?  Hugh Owen, Executive Vice President, Worldwide Education at MicroStrategy thinks it's about training the technologists you've already got. Owen asserts "Enterprise organizations will need to focus their attention not just on recruiting efforts for top analytics talent, but also on education, reskilling, and upskilling for current employees as the need for data-driven decision making increases—and the shortage of talent grows."

Skills shortages show up everywhere, especially in AI. John LaRocca, Managing Director for Europe/NA Operations at Fractal Analytics, comments that "The demand for AI solutions will continue to outpace the availability of AI talent, and businesses will adapt by enabling more applications to be developed by non-AI professionals, resulting in the socialization of the process." 

In that same vein, noted industry expert Marcus Borba, at Borba Consulting, remarks, in a report from MicroStrategy, that "the demand for development in machine learning has increased exponentially. This rapid growth of machine learning solutions has created a demand for ready-to-use machine learning models that can be used easily and without expert knowledge." 

Venkat Venkataramani, CEO at Rockset, sees an even simpler solution to the skills shortage: that vendors and customers should conform to skill sets that are already pervasive, like good ol' SQL: "We will see enterprises making a huge push towards standardizing around SQL for their entire data management stack. Data management solutions – whether it's streaming platforms, online operational systems, or offline batch analytics – will all converge to SQL as a standard interface for developers and data scientists alike."

Governance, baby

We can't do all these great things with data if we don't pay attention to data privacy, protection and governance. It's all coming to a head. 

SAS' Wright notes that "The increasing amount of privacy/protection laws seen throughout the world have prompted organizations to develop data governance programs that include data privacy by default." Sisu's Bailis says "Beyond 2020, governance comes back to the forefront. As platforms for analysis and diagnosis expand, derived facts from data will be shared more seamlessly within a business, as data governance tools will help ensure the confidentiality, proper use, and integrity of data improve to the point they fade into the background again."

Alteryx's Chief Data and Analytics Officer, Alan Jacobson, says it this way: "Whilst people don't enjoy following rules, they do enjoy being given a framework within which they can succeed and thrive. Good governance will increasingly be seen as an enabler for achieving corporate objectives using efficient and effective best practices that enable the workforce."

Next year in Dataville

With this year's brain trust edifying us on Hadoop and Big Data; AI; the cloud; Kubernetes; the tech skills shortage; and the growing focus on data governance, we now have a good framework to take on all the data, analytics and AI news next year. To me, there's huge overlap between each of these subjects. Big Data analytics got us in position to make AI real. The cloud and Kubernetes, meanwhile, have made it easier to deploy the necessary technology and work with it, to gain experience with it and address the skills shortage through upskilling. And as all of this unfolds, our need to focus on data governance is greater than ever. 

Look for the convergence of these lobes of data technology to converge further next year and maybe even drive some corporate consolidation as well. As industry events take place, George Anadiotis, Tony Baer and I will do our best to keep you apprised, help you understand it all and point out the bigger trends, as we see them.