The last part of the 2010s has been all about AI, and the 2020s will not be any different. We will see AI widening its reach, and impacting every conceivable field. Having already seen the AI hype rise, however, we must also be prepared for a backlash. And it's very important to be aware of what "AI" actually means.
In essence, what we call AI today is an umbrella term for various pattern matching techniques. Machine learning and its various subdomains, such as deep learning, essentially boil down to pattern matching. We've seen several breakthroughs in the 2010s, but the seeds for most techniques and algorithms have been planted decades ago and remain essentially the same.
Still, we have seen the performance of AI systems in many domains going from being worse than human, to catching up and surpassing humans. How is that possible? The answer is twofold: Data and compute.
The digitization of nearly all aspects of human activity has led to an explosion in the volumes of data being generated. Algorithms now have much more data to work with, and that alone means they can perform much better. In parallel, however, progress was made in domains such as image recognition: adjustments in neural networks, brought about by vibrant communities, have boosted the accuracy of the algorithms. ImageNet is a good example of this.
The problem with the AI frenzy is the divide between the haves and the have nots is widening. And not just because of the resources and expertise the big players have. It's a self-reinforcing loop of sorts: Being data-driven, designing and producing data-driven products means these products not only can have an edge, but they also bring in more data as they operate.
Data, however, is just one part of the AI equation. The other part is hardware. Without the tremendous progress in hardware, the 2010s have seen, AI would not be possible. Access to the compute power needed to process the massive amounts of data needed for machine learning used to be a privilege reserved for the select few.
The big innovator, and winner, in the 2010s AI hardware was NVIDIA. The company that most people came to know as a maker of GPUs, specialized hardware typically used by gamers for fast graphics rendering, has reinvented itself as an AI superpower. The architecture of GPUs, it turns out, is very well suited to running AI workloads.
Up until the beginning of the 2010s, the world was mostly running on relational databases and spreadsheets. To a large extent, it still does. But if the 2010s brought the first traces of dissent in the monoculture of tabular data structures, the 2020s will bring the final nail in the coffin. The NoSQL wave of databases has largely succeeded in getting developers, administrators, CIOs, CTOs, and business people out of their comfort zone, and instilled the "best tool for the job" mindset.
Polyglot persistence, as is the lingo for using data models and data management interchangeably depending on the task at hand, is becoming the new normal. After relational, key-value, document, columnar, and time-series databases, the latest link in this evolutionary proliferation of data structures is graph. Graph databases and knowledge graphs have been making waves and being included in hype cycles for the last couple of years.
While it's understandable why many people tend to think of graph as a new technology, the truth is this technology is at least 20 years old. And it has been largely initiated by none other than Tim Berners Lee, who is also credited as the inventor of the web, in 2001 with the publication of his Semantic Web manifesto in the Scientific American. Lee also coined the term Giant Global Graph, to describe the next stage in the evolution of the web.
Having been into this technology since the early 2000s, it's exhilarating to see it getting steam with technical progress, funding, and use cases piling up to a snowball effect. It is also amusing to see graph-washing beginning to commence. In essence, progress in graph is happening along the trajectory of progress in machine learning.
It's not so much that there was a major breakthrough in the technology that made it feasible, but more about the right conditions that made it boom. Many of the concepts, formats, standards, and technology enabling graph databases and knowledge graphs to flourish today have been developed over more than 20 years. What has brought on the perfect graph storm is a combination of factors.
Like AI, the data explosion has contributed to bringing graph in the fore. Now that Big is no longer a qualifier for Data, because we have mastered the art of storing lots of it, the question really is how to get value out of data. Leveraging connections in data is a prominent way of getting value out of data and graph is the best way of leveraging connections.
This is why graph databases excel in use cases that require finding connections in data, such as anti-fraud or master data management. This is why graph analytics, with algorithms such as centrality or PageRank that are based in accounting for nodes and edges, can offer valuable insights in connected datasets. As the terminology seems to still be in flux for many newcomers in this field, a short history lesson, and grounding in semantics, may be called for.
Google has played a key role in the rise of graphs, and knowledge graphs. As the web itself is a prime use case for graphs, PageRank was born. As crawling and categorizing content on the web is a very hard problem to solve without semantics and metadata, Google embraced them, and coined the term Knowledge Graph, in 2012. This, and the widespread adoption of schema.org that came with it, marked the beginning of the meteoric rise of graph technology and knowledge graphs.
Knowledge graphs can address key challenges such as data governance but ultimately, they can serve as the digital substrate to unify the philosophy of knowledge acquisition and organization with the practice of data management in the digital age. The NASAs and the Morgan Stanleys of the world are managing ontologies, and utilizing knowledge graphs.
Graphs and knowledge graphs cross-cut into AI, too. Much of the AI hardware and software for the 2020s utilizes graph data structures. A combination of bottom-up, pattern matching techniques with top-down, knowledge-based approaches is the most promising way for AI to continue to make progress.
Knowledge Graph is a technology that enables other technologies to accelerate their growth, and it also enables humans to take stock of their own knowledge. This is why the future is Knowledge Graph.
To infinity and beyond
Looking back, it becomes clear how far we have come in the relatively short span of the last decade. Counter-intuitive as this may seem, however, we are not certain this is a good thing. Somewhere along the way, technological progress left human ability to monitor, comprehend and digest technology in the dust. In the dawn of this new decade, we seem to be engrossed in the never-ending race for more: More data, more processing power, more technology.