How to get started with Machine Learning: Expert advice from the front lines

Smart companies know that artificial intelligence and machine learning have the potential to transform business, but many aren't sure where to start. Here's an expert guide based on 10 of the most burning questions asked about what it takes to launch a machine learning initiative.


Machine learning (ML) is frequently the catalyst that turns business data into accurate predictions and actionable information. For many companies, however, the barriers to entry seem daunting:

  • Data needs to be discovered, organized, and standardized
  • Compute resources need to be sourced to handle ML workloads
  • Models need to be built and trained by data scientists
  • Outputs need to be integrated into business processes

When faced with requirements that are hard to scope out and goals that require imagination, companies need expert help. Larry Pizette, senior manager and leader of the Machine Learning (ML) Solution Lab at Amazon Web Services, knows a lot about overcoming ML barriers. He meets with customers from diverse industries, helping them plan and implement their ML initiatives across a wide range of use cases.

We had the opportunity to sit down with Larry and pose 10 questions, eliciting his insights for business leaders who want to understand ML trends, how to get a project started, and how to move from proof-of-concept to production.

Approaching ML

Q1: What's driving interest in machine learning today?

We're at an inflection point, where machine learning has hit the common consciousness. There are references to artificial intelligence and machine learning on television, magazines, websites, and product announcements - it's everywhere. Customers are looking at ML for a variety of different reasons, but it tends to be around the same goals: better experiences and outcomes for their customers, improved operations, and, for commercial customers, better financial results. Business owners want to unlock the potential of their data. It's hard not to pay attention to machine learning.

Q2: Are there specific industries or functions that are strong candidates for ML?

We have customers from many different business areas: oil and gas, healthcare, life science, insurance, financial services, restaurants, professional sports, transportation, government, and educational institutions. Customers are coming to the ML Solutions Lab because they want to get started with ML capabilities in their organizations, and they'd like to understand the art of the possible.

For example, insurance companies want to help assess risk better, to better price their products and maintain their profitability. Business leaders, government leaders, educational leaders are all starting to think about how ML can impact their organizations. They know that they need to get started; they're just not 100 percent sure where to begin. They know that Amazon has an extensive history with machine learning, so they come to us, and we can help them get going.

Implementing ML

Q3: What steps do organizations need to take when they want to get started with ML?

Getting started is really a matter of three steps: learning about ML, ideation on use cases and approaches for validating the ideas, and implementing one or two high-value proof of concepts (POCs).

If your organization doesn't already have a solid foundation of ML knowledge, we recommend taking the time to educate your business and operational leaders. When customers come to us needing this knowledge, we first engage them in a machine learning discovery session, sitting down with our specially trained business development organization. As part of that step, we make sure that customers understand, at a business level, how ML works, and the importance of data to their organizations.

The next step is typically to have an ideation session and identify POCs. We like to include both business leaders -- the business owners that are looking for the outcomes that they want -- and IT folks that can help us identify the data to meet the business need. This is the point where we start identifying the details of a POC and the business outcomes.

We also dive into the importance of having an understanding of the data, and in many cases, having annotated data - that is, having data that is tagged with information about the data itself. An example would be an image and a description of the image. This step is significant, as many organizations don't know the data that they have or how they'll access it - and data is necessary for training ML models.

Proving ML's worth

Q4: So that brings us to the third step. What does a successful proof-of-concept look like?

A: We find that customers that are willing to identify impactful business cases - something important that can really make a difference to their organization - are the ones that can demonstrate results quickly with ML.

It's really important that customers are willing to prototype and do proofs of concept to get going. If they try to "boil the ocean" and have a multi-year deliverable, it's much harder to demonstrate forward progress.

Where I've seen POCs go really fast is when customers have the data and are highly motivated. With those customers, they get the data; they allocate resources to the collaboration with us; they get us going with access to the data and a clear objective; and we can show initial POC modeling results in as little as three weeks from when we get access to the data.

Q5: Once the POC stage is finished, how long does it take to implement an ML solution in production?

A: Once you've proven the potential of ML, the next step is to move capability to production, which may include integrating the ML capability into a larger IT system.

When a customer moves a workload to production, it's typically longer in duration than the POC. It really depends on the complexity, because we are building machine learning into the overall system. If it's a standalone type of capability, it can go into production very quickly. If it has to be integrated with a much larger ecosystem of IT systems and business processes, it can take several months.

But the cool thing about it is, once you get up and running, you can keep updating your models. You're gathering data, you're gathering outcomes, and you're improving your models over time. As you gather more data and information on what "good" looks like, it's critical to iterate by re-training and re-evaluating your models -- this will help you to ensure you're getting the best outcomes possible.

As there is variability in each custom use case, our ML Solutions Lab can help our customers with scoping POCs and production deployment.

Q6: What are some examples you're especially proud of?

A: I'm particularly proud of the really cool work that we've done with professional sports leagues such as Major League Baseball (MLB). For example, we ideated with key MLB stakeholders on a compelling capability for the television audience. Once we decided upon a model that identifies a runner's chance of success for stealing second base, it took us three weeks, starting on September 1, 2018 to prepare the data, train, evaluate and deploy the model.

We leveraged Amazon SageMaker as a key part of this effort, including deployment into production in the cloud. SageMaker provided the requisite sub-second response time for making predictions (called inferences), and, using an auto-scaling cluster, it provided both the high availability and high performance that MLB needed. You'll be able to see this in action at future MLB games, including the 2018 World Series.

ML in Practice

Q7: What advice do you have for companies looking to embrace ML?

A: We recommend focusing on how you can deliver a better experience for your customers and identifying the business and operational outcomes desired - and then leveraging ML to meet those needs.

To help realize this potential, culturally, there needs to be an acceptance that ML is an important part of business and operations. Frequently, enterprises have different IT groups, each of which has its own area of expertise or business units that they support. Some ML initiatives may require information from across these domains, so it's important to understand the data strategy and where ML will fit in.

Q8: What role does cloud computing play in ML?

A; The cloud has really done some amazing things for machine learning, because you can try out capabilities. You can do proofs of concepts. If you run into a problem, no worries. You just tried it out. There's no hardware, and you can scale up or down. You have the data there, so you can try other approaches if one doesn't work. Because you can quickly scale up significant amounts of capacity, using ML-tailored Accelerated Computing instances, you can train vast amounts of data in minutes to hours, and that would've taken days in the past.

Q9: What makes ML from AWS unique?

A: We're trying to put machine learning into the hands of every organization and build out services so that we can help with the undifferentiated heavy lifting of machine learning.

If you look at our ML services stack, you can see that there are many different levels where you can employ AWS services. At the very highest level, we have API-based machine learning services such as Rekognition, Lex, Polly, Comprehend, Translate, and Transcribe. These services are very easy to use and do not require a data scientist.

For customers that want to go a layer deeper, we have infrastructure services, such as SageMaker, that make it easier to build, train, and deploy capabilities into the cloud. SageMaker is an amazing infrastructure service that makes it exceptionally easy to run widely used framework software such as Apache MxNet, TensorFlow, PyTorch, Cafe2, and Chainer. Also, SageMaker includes built-in algorithms that have been optimized and can improve performance by 10x over what you can run elsewhere. We work closely with companies like Intel to ensure that the hardware powering that infrastructure is custom-built and optimized for the best performance with the most popular frameworks. Also, SageMaker makes it easier for our customers to deploy ML capabilities into production. With click of the button deployment for scalable, multi-availability zone environments, we've taken the IT complexity out of deploying to production.

Q10: If you were in the shoes of business leader just starting out, what would your next steps be?

First, I'd look for ways to be data-driven in helping customers, optimizing operations to be most efficient, and maximizing financial outcomes. Second, I'd make sure that we're building the people capabilities within our organization to leverage ML, including hiring or training data scientists, where needed. Third, I'd look to identify or establish our organization's approach to data, including governance and repositories, and scoping the IT to support ML in the enterprise. There's no way for businesses to ignore a game-changing technology like ML, and I'm very proud to be part of a team that helps answer questions and move these initiatives forward.

To learn more about ML from AWS and Intel, please visit

Get your PDF copy of this report HERE.