Running AI workloads is coming to a virtual machine near you, powered by GPUs and Kubernetes
Run:AI offers a virtualization layer for AI, aiming to facilitate AI infrastructure. It's seeing lots of traction and just raised a $75M Series C funding round. Here's how the evolution of the AI landscape has shaped its growth.
Run:AI takes your AI and runs it on the super-fast software stack of the future. That was the headline to our 2019 article on Run:AI, which had then just exited stealth. Although we like to think it remains accurate, Run:AI's unconventional approach has seen rapid growth since.
Run:AI, which touts itself as an "AI orchestration platform", today announced that it has raised $75M in Series C round led by Tiger Global Management and Insight Partners, who led the previous Series B round. The round includes the participation of additional existing investors, TLV Partners and S Capital VC, bringing the total funding raised to date to $118M.
We caught up with Omri Geller, Run:AI CEO and co-founder, to discuss AI chips and infrastructure, Run:AI's progress, and the interplay between them.
Run:AI offers a software layer called Atlas to speed up machine learning workload execution, on-premise and in the cloud. Essentially, Atlas functions as a virtual machine for AI workloads: it abstracts and streamlines access to the underlying hardware.
That sounds like an unorthodox solution, considering that conventional wisdom for AI workloads dictates staying as close to the metal as possible to squeeze as much performance out of AI chips as possible. However, some benefits come from having something like Atlas mediate access to the underlying hardware.
In a way, it's an age-old dilemma in IT, playing out once again. In the early days of software development, the dilemma was whether to program using low-level languages such as Assembly or C or higher-level languages such as Java. Low-level access offers better performance, but the flip side is complexity.
A virtualization layer for the hardware used for AI workloads offers the same benefits in terms of abstraction and ease of use, plus others that come from streamlining access to the hardware. For example, the ability to offer analytics on resource utilization or the ability to optimize workloads for deployment on the most appropriate hardware.
Initially, Run:AI supported Nvidia GPUs, with the goal being to add support for Google's TPUs as well as other AI chips in subsequent releases. Since then, there has been ample time; however, Run:AI Atlas still only supports Nvidia GPUs. As the platform has evolved in other significant ways, this clearly was a strategic choice.
The reason, as per Geller, is simple: market traction. Nvidia GPUs is by and large what Run:AI clients are still using for their AI workloads. Run:AI itself is seeing lots of traction, with clients such as Wayve and the London Medical Imaging and AI Centre for Value Based Healthcare, across verticals such as finance, automotive, healthcare, and gaming.
However, Geller's experience from the field is that organizations are not just looking for a cost-efficient way to train and deploy models. They are also looking for a simple way to interact with the hardware, and this is a key reason why Nvidia still dominates. In other words, it's all in the software stack. This is in accordance with what many analysts identify.
However, we were wondering whether the promise of superior performance might lure organizations or whether Nvidia competitors have managed to somehow close the gap in terms of their software stack evolution and adoption.
Nvidia's domination is not the only reason why Run:AI's product development has turned out the way it has. Another trend that shaped Run:AI's offering was the rise of Kubernetes. Geller thinks that Kubernetes is one of the most important pieces in building an AI stack, as containers are heavily used in data science -- as well as beyond.
It took Run:AI a while to identify that. Once they did, however, their decision was to build their software as a plugin for Kubernetes to create what Geller called "Kubernetes for AI". In order to refrain from making vendor-specific choices, Run:AI's Kubernetes architecture remained widely compatible. Geller said the company has partnered with all Kubernetes vendors, and users can use Run:AI regardless of what Kubernetes platform they are using.
Over time, Run:AI has built a notable partner ecosystem, including the likes of Dell, HP Enterprise, Nvidia, NetApp and OpenShift. In addition, the Atlas platform has also evolved both in width and in-depth. Most notably, Run:AI now supports both training and inference workloads. Since inference typically makes for the bulk of operational costs of AI in production, this is really important.
In addition, Run:AI Atlas now integrates with a number of machine learning frameworks, MLOps tools, and public cloud offerings. These include Weights & Biases, TensorFlow, PyTorch, PyCharm, Visual Studio and JupyterHub, as well as Nvidia Triton Inference Server and NGC, Seldon, AirFlow, KubeFlow and MLflow, respectively.
Even frameworks that are not pre-integrated can be integrated relatively easily, as long as they run in containers on top of Kubernetes, Geller said. As far as cloud platforms go, Run:AI works with all 3 major cloud providers (AWS, Google Cloud and Microsoft Azure), as well as on-premise. Geller noted that hybrid cloud is what they see on customer deployments.
Even though the reality of the market Run:AI operates in upended some of the initial planning, making the company pursue more operationalization options as opposed to expanding support for more AI chips, that does not mean there have been no advances on the technical front.
Run:AI's main technical achievements go by the names of fractional GPU sharing, thin GPU provisioning, and job swapping. Fractional GPU sharing enables running many containers on a single GPU while keeping each container isolated and without code changes or performance penalties.
What VMware did for CPUs, Run:AI does for GPUs, in a container ecosystem under Kubernetes, without hypervisors, as Geller put it. As for thin provisioning and job swapping, those enable the platform to identify which applications are not using allocated resources at each point in time, and dynamically re-allocate those resources as needed.
All of them, Geller said, are Run:AI partners, as they represent infrastructure to run applications on. Geller sees this as a stack, with hardware at the bottom layer, an intermediate layer that acts as the interface for data scientists and machine learning engineers, and AI applications on the top layer.
Run:AI is seeing good traction, growing its Annual Recurring Revenue by 9x and staff by 3x in 2021. The company plans to use the investment to further grow its global teams and will also be considering strategic acquisitions as it develops and enhances its platform.