Clear the path to continuous intelligence with machine learning, consultancy urges

Continuous Delivery for Machine Learning, or CD4ML, promotes 'a cross-functional team produces machine learning applications in small and safe increments.'

What do technology leaders and professionals need to do to help their organizations achieve the holy grail of continuous intelligence? Look to artificial intelligence and machine learning to pave the way. However, achieving a state of continuous intelligence isn't an overnight sprint by any means -- many organizations aren't quite ready to bring together the adroit data management, automation, processes and skills needed to make things happen. 

national-gallery-of-art-cropped-washington-dc-july-2016-photo-by-joe-mckendrick.jpg

Photo: Joe McKendrick

That's the word from a three-part series published by ThoughtWorks, which advocates an approach it calls Continuous Delivery for Machine Learning (CD4ML), "a software engineering approach in which a cross-functional team produces machine learning applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time, in short adaptation cycles." 

Employing data "to produce tangible outcomes for business is the real value driver and for that, we are seeing the world moving more towards intelligence," write Ken Collier, Mark Brand and Pramod N, all with ThoughtWorks. "The use of machine intelligence to drive business outcomes is a central theme." 

For example, a hospital can employ machine intelligence to detect and prevent hospital-based infections among patients, employing data "such as vital biometrics, interactions with doctors and staff, and feeding schedules are collected by medical monitoring and other hospital systems." The challenge is that "captured data are quickly processed into digestible information, which predictive models can consume to produce insight about patients. Doctors must quickly make decisions about patient treatment and corresponding actions must be taken by caregivers. The models and hypotheses informing the insight and decisions about patient treatment should be continuously reviewed as often as needed to iterate the processes in order to ensure the best possible clinical decision-making. This requires changes in business processes, organizational collaboration, technical practices, and investment in supporting technical infrastructure."

Getting there isn't an overnight sprint, however. "Unfortunately, in most organizations, this cycle is time-consuming, manually intensive, and laden with friction," Collier and his co-authors caution.

CD4ML is built on the ability to flow data through model training, cleansing, prediction service development, and continuous monitoring and feedback loops, in which "new data then informs the next iteration of training the prediction model." This is a journey involving many moving parts of the enterprise, and the ThoughtWorks authors have charted its development through a five-state process:

Legacy state: This is characterized by a "conventional data warehouse architecture with some combination of enterprise data warehouse and/or a collection of subject-area data marts." The ThoughtWorks team suggests "adopting modern, adaptive data architectures to ease ongoing access to all forms of data; use of advanced analytical capabilities to generate clear predictions; and better business and technology collaboration methods to frame more effective decisions."

Data lake state: "The conversion of data into information for analysis is a substantial improvement over the legacy state," according to the authors. However, "decisions and actions remain rather ad hoc in this state, and improvements of the business experience may be sporadic and take months to implement."  Decisions are taken in a "batch sense at human timescale rather than near real time as and when events unfold." 

Data science state: Time to bring in the data scientists to help sift through the data. At the same time, the decision-maker is "part of insight creation journey. There is less friction between information creation, insight creation and decision makers. Science and experimentation also is another key differentiator at this stage: However, most organizations in this state still "exhibit analysis-paralysis behaviors, in which data science models, however sophisticated, remain in a proof-of-concept lab state and fail to see real world utility. Decision to action is still a hurdle at this stage." 

Insight product state: Collaboration hits its stride here, as "analysts, scientists, decision- and delivery-owners are formally organized into a cross-functional product team with a clear charter to build business-aligned intelligence into insight products or directly into the data pipeline — and to rapidly combine this intelligence with automated decisioning to drive predictable actions.": Businesses "move away from being reliant only on reports and ad hoc analysis for decision making, but instead, get into the mode of tuning decisions as insight products." 

Finally, the continuous intelligence state: This is where CD4ML platform thinking and a data DevOps culture become the norm. This is "continuous delivery for data," the ThoughtWorks team explains. "As data scientists create more refined and accurate models, they can easily deploy these into production as replacements for prior models. Being able to create products which learn and complete the intelligence cycle in a continuous fashion is what sets this stage apart. The loops become more seamless and most of the hurdles are removed. Loops become tighter and faster with more use and more experimentation, which is a key indicator of the health of intelligence cycle."