Microsoft looks to make product planning more science than art

Microsoft has been quietly building a platform to help its own product teams -- and ultimately, those from other companies -- turn product planning more of a science and less of a black art.Microsoft calls the test bed the Microsoft Experimentation Platform (ExP).

Microsoft has been quietly building a platform to help its own product teams -- and ultimately, those from other companies -- turn product planning more of a science and less of a black art.

Microsoft looks to make product planning more science than art
Microsoft calls the test bed the Microsoft Experimentation Platform (ExP). Here is how the EXP team describes its mission on its Web site:

"The Experimentation Platform enables product groups at Microsoft and later on will enable developers using Windows Live to innovate using controlled experiments with live users. The platform enables testing new ideas quickly using the best-known scientific method for establishing causality between a feature and its effects: randomized experimental design. The basic methodology in controlled experiments is to expose a percentage of users to a new treatment, measure the effect on metrics of interest, and run statistical tests to determine whether the differences are statistically significant, thus establishing causality."

The chief experimenter behind this initiative is General Manager Ronny Kohavi. Kohavi joined Microsoft in 2005 from At Amazon, he was the director of data mining and personalization. He joined Microsoft's Natural and Interactive Services Division (NISD) to build a system that would map user activities to intent using machine-learning algorithms.

"In the first few months, I realized that few tools existed at Microsoft to run live experiments and make data-driven decisions based on user actions," Kohavi told me, via e-mail. "I was being asked to build something with limited ability to iterate quickly and test ideas with controlled experiments, so my initial reaction was to build a small system to run controlled experiments within my project."

"After the division VP left and the division was being reorganized, I decided that there is a great opportunity to build a platform for controlled experiments that would serve multiple groups at Microsoft. In March 2006 I was given the go-ahead to hire a small incubation team. In June 2006 coding started, and we ran our first two experiments a year later, in June 2007," Kohavi added.

I had a chance to ask Kohavi a few additional questions about Microsoft's Experimentation Platform via e-mail. Here is our exchange, which I edited for length:

MJF: How did the idea for this "experimentation platform" come about? Was it something the Microsoft brass encouraged to prove that the company is an innovator and not just a follower?

Kohavi: During my Amazon days, I realized how useful experimentation is in helping prioritize ideas. The two most successful innovations by my personalization teams were not on any roadmap the year before, and were initially ranked so low by myself and the team of experts that one was given as a ramp-up project to a new employee on the team, and the other given to an intern. The intern built Amazon’s Behavior-Based Search, a simple idea of surfacing products that people ultimately buy after they query for something. After several iterations, the project was shown to incrementally increase Amazon's revenue by hundreds of millions of dollars. Other people on our team had similar experiences at other companies, and we are beginning to see some great value at Microsoft. This great ROI (Return-On-Investment) is what led to the proposal of building such a platform at Microsoft.

This was not a top-down project, but a bottom-up proposal I made. A few executives, led by (Corporate Vice President of Live Platform Services) David Treadwell, saw the potential and supported the project early on. (Chief Software Architect) Ray Ozzie later said: “We have an unprecedented opportunity to run A/B tests with online users and innovate more quickly based on actual user response. Microsoft needs to shift the culture from planning the exact features to planning a set of possible features, and letting customers guide us.”

The goal for the ExP team is to accelerate the cultural change towards more use of data-driven decisions using experiments, not just to provide the technology (the platform). Our mission is to accelerate software innovation through trustworthy experimentation. We write papers, we share results to build best practices (some externally), and we teach classes. Over 150 people at Microsoft have gone through our half-day class. A lot of material is available on our site

MJF: Is the experimentation platform focused on Live-related products/services only? Or could other product groups use it as well, say, the Windows team or the CRM team?

Kohavi: The Experimentation Platform itself is built as a set of web services, so it could be called from other services or from client software. However, we believe that controlled experiments are especially well-suited for services-based software (e.g., online properties and software plus services architectures). Because the cost of developing prototypes and testing them against live users is low in these settings, experimentation fits well with quick iterative development. In a recent paper, we described the ingredients required for successful experiments: .

One good example of how services can impact classical client software is Microsoft Office Help. If users approve the better-when-connected option, help queries are sent to a service at Microsoft, which returns help articles. This allows editors to improve and write new articles based on actual requests, and it also allows development teams to improve the search algorithms by running controlled experiments. So although many people think of “Office” as classical shrink-wrapped client software, the opportunity exists to experiment and improve the experience even after the product “shipped.”

As the software plus services model further develops, and as experiences become more seamless with online services, the opportunity to run experiments increases, satisfying more of the ingredients that enable an experimentation culture.

MJF: Could you share more specifics about the platform, such as the infrastructure that powers it? Where does the ExP team sit in the MS hierarchy?

Kohavi: Our first experiments went live in June 2007 on the MSN home page and on Windows Marketplace. Several examples are described in the eMetrics 2007 talk at and in the papers on the site.

The platform is built as a set of web services written in C#, browser-based User Interfaces for reviewing analyses, and a client DLL that is basically a caching layer to provide the fast performance needed. Calls to the client return in under 5msec 99.99% of the time. We described the architecture in Section 5 of

The ExP team is under Treadwell. He reports to Ozzie. This “neutral” place in the organization allows us to build the platform for use across all of Microsoft.

MJF: Do you feel the ExP platform is unique? Or was it patterned after other ventures out there?

Kohavi: The Experimentation Platform is definitely unique. There are third parties that provide software for controlled experiments, but they are based on JavaScript interfaces. While this lowers the barrier to a first experiment, our focus has been on building tight integration with other systems that can lower the cost of experimentation much more significantly, something that can only be really done with deeper integration. For example, the platform has been integrated into the MSN content management system (CMS) for the MSN home page, allowing “codeless” experiments to run. An editor or a program manager can change layout simply by using the interface they use every day, without writing a line of code (JavaScript or otherwise).

A code-based integration (unlike JavaScript) also allows running backend experiments (e.g., search algorithm changes, recommendations), which are harder to do efficiently through JavaScript changes to the UI layer.

While we may provide a JavaScript interface to the platform in the future, a culture of experimentation at scale can really flourish if the cost of running experiments is lowered to the point where instead of debating whether some button should be red or green, it’s cheap enough to experiment with both options.

MJF: Any rough timetable as to when MS will allow third party vendors to start taking advantage of ExP? Will this be a free service? Or paid?

Kohavi: We have announced our intention to open the Experimentation Platform to the developer ecosystem, and will probably start with large partners.

We’re not ready to talk about specific timelines at this point. There is enough low hanging fruit at Microsoft where experimentation can help, and this will give us the opportunity to solidify the platform and the APIs (application programming interfaces).

MJF: Other points about ExP which may not be obvious from the site, but you think are worth bringing to readers' attention?

Kohavi: There are several points worth sharing.

1. Human intuition is poor; really poor. We regularly see experts in new domains initially defining the direction until realization sets in that experts are not very good, especially with novel ideas. Despite ... strong evidence, many people continue to believe that they’re unique with their ability to nail the right set of features for the next release, or to nail the right content because it’s in their DNA.

Humans suffer from something called Apophenia, the tendency to see patterns with random or meaningless data. We also attribute causality to correlational data. Controlled experiments are scientific way to prove that a chance is causing the change in observed metrics.

2. Organizations can benefit from a critical step required for running experiments: determining what they’re optimizing. We call this the OEC, or the Overall Evaluation Criterion. Organizations that are able to state that objective clearly get immediate alignment, dismissing many debates. This is hard; it requires a lot of thought by organizational leaders, as one has to agree on measurable metrics today that will lead to future success. Short-term revenue and profit can easily be increased by displaying more ads and hurting the user experience, but a much harder exercise is defining what you’re optimizing for today that will generate long-term profits.

3. The key to building ExP is building a system that is trustworthy.Initial responses to controlled experiments may be: “Oh, you’re providing a random number generator.” An Experimentation Platform, however, must focus on much more, from careful attention to the statistics and math, alerts and alarms when suspicious events occur, to robot detection and outlier removal. Our group is open about how we compute things, about the architecture, and our thoughts about things like Multivariable Testing, or MVTs (they are much less useful in online settings).

The one great thing about controlled experiments is that once you experience the culture and see the value, you can’t go back! Many young companies experiment, but lose that ability as they grow larger.