How to build scalable AI in 3 steps: Lessons from Capital One's ML team

"When it's time to actually go to production, and then keep that model up and running 24-7 in some mission-critical application like transaction fraud, that is the hardest part," says Abhijit Bose, head of Capital One's Center for Machine Learning and Enterprise ML Platforms,
Written by Stephanie Condon, Senior Writer

As head of Capital One's Center for Machine Learning and Enterprise ML Platforms, Abhijit Bose oversees a team tasked with embedding real-time intelligent decisions and experiences across the company using machine learning and AI for both employees and customers. That's obviously no small task at one of the 10 largest banks in the US. 

In fact, one big area of focus for his team this year has been building the foundations for scalable machine learning. 

"Many companies struggle with deploying machine learning and scaling machine learning or AI throughout the company," Bose said to ZDNet. "You can always have pockets of people who are building things on Amazon or experimenting with a machine learning model with their own data, and then everything just falls flat. When it's time to actually go to production, and then keep that model up and running 24-7 in some mission-critical application like transaction fraud, that is the hardest part."

To tackle that challenge, Capital One has taken a threefold approach; Bose explained: "Building foundational platforms, ensuring it builds responsible and well-managed AIand hiring the right talent."

Here's more of what Bose had to say on each of these themes: 

Building an adaptable, foundational platform

"First, we need to build first-class machine learning platforms that can scale, and that can be adaptive to the recent trends -- this space moves really fast. So, our platform foundation should be built in a way that you can adapt very quickly to different libraries coming every six months, different technologies coming. Just to give you some examples, we went from running basic containers on AWS to running fully-fledged Kubeflow pipelines on AWS. That's a very sophisticated way of running machine learning that only a few tech companies are doing. So the platform has to be foundational and then built really well. That's a big focus area for the company right now."

Make it responsible and well-managed 

"We also want to build machine learning in a responsible and well-managed way.  We have lots of controls that we apply to our current generation of machine learning models, but some of them might be manual, some of them might be a little bit more automated. 

"We want to build a lot of the controls in a way that it can also accelerate some of our deployment, but at the same time, we are being very thoughtful about it. When we need to slow down, we do slow down. We're not going to sacrifice responsible AI for the sake of speed or business value. So, a lot of research needed on explainable AI, which we are doing now. We are doing a lot of work in engineering to start taking that research into the platforms."

Bose noted that Capital One's operations run entirely on AWS infrastructure. Running completely in the cloud actually makes it easier to manage AI responsibly, he said. 

"We can think much bigger than other companies," he said. "If you think about it, a lot of companies where maybe your data may be in 10 different systems, you cannot really uniformly apply controls. You cannot even define common standards and still scale. Every project, in every one of those silos, would be very painful. So we have been thinking holistically what responsible AI means for us."

Cultivating talent

"The third thing that we have focused on is our talent. We realize that both retention and recruiting are going to be super important for us, especially given what's going on in the marketplace. 

"For our internal folks to have the career growth they need, we have a new machine learning engineer program that our current engineers can apply to. It's a combination of online courses, some instructor-led courses, and then on-the-job training. They have to actually run; they have to build models; they have to work with our infrastructure to do with the data pipelines and ML pipelines, so they really get trained to become a certified machine learning engineer. And upon completion of the training, they can be a machine learning engineer in one of our teams. 

"Plus, we're also hiring machine learning engineers to this new job family that we created externally... Once you create a job family, you have to think about their career, their performance management -- what are some of the expectations of that talent? You need to make sure there is a clear path for their growth in the company.

"In the external world, it really made our recruiting efforts a little bit easier. Because if you look at the tech companies, like Facebook or Google, they have clear data scientists roles, they have machine learning engineers.  A lot of companies actually mix those skills up."

Editorial standards