Betting that machine learning gets better with practice, investors have put $75 million worth of new funding into software startup Moveworks of Mountain View, California, in order to advance its program for streamlining the help desk operations of corporations.
ZDNet wrote earlier this year about how Moveworks used "sentence embeddings" to ingest examples of things people ask their help desk systems. Moveworks's software is then able to automate responding to those sayings in natural language, and getting the user to a resolution.
The $75 million Series B financing, from Iconiq Capital, Kleiner Perkins, Sapphire Ventures, as well as a personal investment by Microsoft chair John W. Thompson, will be used for a variety of goals. It will increase the sales and marketing effort, in order to expand the company's "footprint" within corporate IT, but also to dramatically increase the research and development team. R&D headcount is expected to double from 60 people over the next twelve months, co-founder and CEO Bhavin Shah told ZDNet by phone.
"The funding we've gotten, now $105 million in total, is a recognition of the progress we've made so far," said Shah. The current investors were joined in this round by prior investors Lightspeed Venture Partners, Bain Capital Ventures, and Comerica Bank.
The progress of which Shah speaks includes being able to work off of a lot more data, "because we have had very high user engagement," said Vaibhav Nivargi, Moveworks's chief technical officer and a co-founder.
Moveworks now has 70 million actual trouble tickets to work from to train language models, up from about 20 million earlier this year. That gives the company 120 million sentence examples and a total of 2 billion "tokens," a measure of the vocabulary that is encoded as vectors in the input layer of a neural network.
"We are at a point now where we are training our own domain adapted BERT model," said Nivargi, referring to the popular "BERT" natural language processing model developed by Google that is an adaptation of the "Transformer" model of language "attention" processing.
Unlike some applications of BERT or Transformer, which "fine tune" the system, a relatively simpler task, Moveworks is "pre-training" the BERT model, which means developing the initial corpus of text material that will form the basis of the deep learning network's fundamental statistical model of the distribution of words in language.
"We believe this domain-adaptive model will be a fundamental differentiator" for the company's software, said Nivargi.
Moveworks employs the "base" model of BERT, with 12 layers of neurons and 768 hidden units, and 110 million parameters. The number of tokens, 2 billion, is an especially large vocabulary for a Transformer-based system. Most such systems use vocabularies numbering in the hundreds of thousands of tokens. The large vocabulary perhaps makes sense in the specialized domain of help desks as opposed to general human language usage.
Before any training of BERT can be done, a lot of pre-processing of data had to be done, a lot of cleansing of the data, said Nivargi. "We had to build those pipelines, to store that securely, confidentially," he said of the troves of customer data. Moveworks has its own data centers with masses of GPUs for training, it doesn't rely on public cloud facilities except for the serving up of prediction results.
"A lot of effort goes into amplifying the data," said Nivargi, including performing "transfer learning" and meta-learning." Those efforts include adding context.
"We look at who the employee is, what time of the day or what day of the week they are making a request," he explained. "That gives us more ranking signals" with which to train the system.
In addition to using BERT for the natural language tasks, the company is using GPT2 for language generation, since it works better than BERT for that purpose, said Nivargi. "That's still fairly new for us, we've been employing it for less than three months now," he added.
"We can't just say we've succeeded on the language processing," observed Nivargi, "We have to resolve the ticket, end to end — it's a matter of integrating with enterprise systems, it's a multi-front battle."
A given user utterance can result in "dozens of models" being employed, he said. Moveworks has "create composite metrics of our own" to know how the company is doing against its own baseline.
A "task conclusion" is a metric for measuring success, what Moveworks calls the "good answer rate," or GAR, akin to the standard "recall" measure in statistics. 85% is the GAR that Nivargi and team strive for.
The technology is improved as the system gains greater use, and usage is indeed rising, said CEO Shah. Whereas marquee customer Broadcom earlier this year was using the software to resolve 25% of trouble tickets, that percentage has now crossed the 40% threshold and is on its way to half of all ticket resolutions, said Shah.
Usage is also enhanced by the Moveworks software being proactive. Moveworks is integrated into enterprise applications such as Slack, so it can show up in a number of places. "We are for the first time reaching out to employees," said Shah. "If you're locked out of Okta, for example, and we see that, our bot will notice that, and reach out to you."
"It's a virtuous cycle," he said, "More people get exposed to the system, more things get resolved, and that brings more people into the system."
Besides continuing to engineer the product, some of that funding will help to develop new partnerships for distribution, said Shah. Of the company's total pipeline of business, 45% is from referrals, he said. "That's really strange for enterprise software," he observed, "people don't usually talk about what they just bought, but we've been different."
As for the future of language modeling, Moveworks's adaptation of BERT and Transformer and GPT2 isn't running up against computing constraints — yet. "I imagine there will be a level where things saturate," said Nivargi, the CTO. On the plus side, the fact that a lot of domain-specific knowledge — about the enterprise, about IT, about help desks — can be hard-coded, so to speak, can make some of the machine learning work more efficiently than is the case for very large and very general natural language processing systems.
"If you encode that explicitly, you can do a lot more with these parameters with this level of data," said Nivargi.
"With Facebook, they have two billion users and all these arcane and obscure languages," he observed, alluding to the social network's recent, massive language translation models. "With us, our mission is figuring out intent, mostly in an English-language context, and then taking a user down a path to a resolution, so it's a different optimization target."