IBM and Bloomberg announced on Thursday that KServe is joining the LF AI & Data Foundation as its latest incubation project. KServe provides a Kubernetes Custom Resource Definition for serving machine learning models on arbitrary frameworks and underpins several IBM products, including Watson Assistant.
Bloomberg, Google, IBM, Nvidia, Seldon and other organizations collaborated with the KServe Project Community to release and publish it as open source.
In a blog post, IBM's Animesh Singh and Bloomberg's Dan Sun and Alexa Griffith said they were speaking on behalf of the KServe community and touted LF AI & Data Foundation for its work "building an ecosystem to sustain innovation in artificial intelligence and data open source projects."
According to the companies, KServe aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like TensorFlow, XGBoost, Scikit-learn, PyTorch, and ONNX.
"KServe encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and Canary Rollouts to your ML deployments. It enables a simple, pluggable, and complete story for Production ML Serving, including prediction, pre-processing, post-processing, and explainability," Singh, Sun and Griffith said.
Animesh Singh, CTO and director of Watson AI at IBM, said the company is both a co-founder and adopter of KServe. Singh said hundreds of thousands of models run concurrently for internet-scale AI applications like IBM Watson Assistant and IBM Watson Natural Language Understanding.
Singh added that ModelMesh from IBM, open sourced and available as part of the KServe project, solves the challenge of costly container management, effectively allowing them to run hundreds of thousands of models in a single production deployment with minimal footprint.
Nvidia senior director of product management for accelerated computing Paresh Kharya explained that the Nvidia Triton Inference Server works in lock-step with KServe to encapsulate the complexity in deployment and scaling of AI in Kubernetes via its serverless inferencing framework.
"Nvidia continues to be an active contributor to the KServe open source community project to support effortless deployment of AI machine learning models at scale," Kharya said.
KServe is also helping Bloomberg expand its use of AI in Bloomberg Terminal and other enterprise products, according to Bloomberg head of AI engineering Anju Kambadur. Kambadur explained that Bloomberg wants to move quickly from idea to prototype to production and needs to ensure that models evolve seamlessly once built to accommodate changes in data.
"This is important not just for building better products faster, but also to ensure that we unlock the creative potential of our AI researchers without burdening them with writing tons of boilerplate code. n this regard, I am both excited and grateful that KServe, which Bloomberg helped found and lead the development of, has taken such strides," Kambadur said.
Mark Winter, software engineer at popular South Korean search engine Naver Search, added that KServe has allowed them to modernize their AI serving infrastructure and provided the tools needed to handle the traffic scaling differences between day and night cycles.
"By providing a standardized interface on top of Knative and Kubernetes, KServe allows our AI researchers to focus on creating better models and putting their hard work into production without becoming experts in delivering and managing highly-available backend services," Winter said.
The announcement comes ahead of the release of KServe 0.8, which a spokesperson said would have features like a new ServingRuntime custom resource, ModelMesh multi namespace reconciliation, improved CloudEvent and gRPC support, and KServe v2 REST API integration with TorchServe.
According to the spokesperson, the Roadmap for v1.0 features a stabilized API unifying the ModelMesh and single model serving deployments and more advanced inference graph capabilities.