Artificial intelligence computer maker Cerebras Systems, which has built chips and computers, and now makes super-computers dedicated to speeding up deep learning, on Tuesday announced services to speed the use of very large language models that are becoming increasing popular for not only research but also commercial use.
"We believe that large language models are under-hyped, not over-hyped," said Cerebras co-founder and CEO Andrew Feldman in a press briefing. "We are just beginning to see the impact of them; there will be winners and new emergents in each of three layers in the ecosystem, in the hardware layer, the infrastructure layer, and the application layer."
Feldman predicted, "Next year you will see a sweeping rise in the impact of large language models in various parts of the economy."
Partnering with cloud computing service provider Cirrascale, Cerebras is offering what it calls "pay-per-model" compute time, a flat rate to train to convergence a large language model such as OpenAI's GPT-3 on clusters of its CS2 computers designed for deep learning.
The service is branded as Cerebras AI Model Studio.
Prices, ranging from $2,500 dollars to train a 1.3-billion-parameter model of GPT-3 in 10 hours to $2.5 million to train the 70-billion-parameter version in 85 days, are on average half the cost that users would pay to rent cloud capacity or lease machines for years to do the equivalent work. And the CS2 clusters can be eight times as fast to train as clusters of Nvidia A100 machines in the cloud.
Cirrascale is using a mix of clusters of owned CS2s and machines that Cerebras owns, as well as the Andromeda supercomputer, which is located at the colocation facilities of Santa Clara, California-based Colovore, where Cirrascale also has equipment installed.
The service will automatically scale the size of clusters depending on the scale of the language model, said Feldman. The company emphasizes that training performance improves in linear proportion to adding more machines.
Scaling to the largest clusters would rise in price to a premium, said Feldman. For example, Andromeda's 16-machine cluster is four times as large as a four-way CS2 cluster, but using it would cost a customer probably five times as much money because it's reaching a higher level of performance.
The most important immediate benefit of cutting the cost of large-model training may be to give access to large model development to parties that couldn't afford the sorts of enormous lease costs typically required, said Feldman.
"We've seen again and again that knowing pricing in advance, and the time it will take, are real issues for a whole class of customers, and we hope to overcome those issues," he said.
The alternative, said Feldman, is for companies to spend extensively to lease hardware for years at a time.
"If you think of the way the biggest models are being trained today, and they are all on dedicated clusters that are on several-year leases," said Feldman. "There are companies right now who have raised huge money and have tremendous valuations who in their wildest dreams have never owned hardware."
Also Tuesday, Cerebras announced that its Andromeda supercomputer, which it unveiled earlier this month, a cluster of 16 CS2 machines, will be used by Jasper, a venture-backed startup that runs large language models as a service for business applications such as generating press releases and blog posts.
Jasper, which has nearly a hundred thousand paying customers for its generative text function, serves enterprises that need to train large language models with customer data, such as a particular knowledge base, product catalog, and corporate "voice."
"They want personalized models, and they want them badly," said Dave Rogenmoser, Jasper's CEO, in the same press briefing. The idea, he said, is to get the marketing department "all talking with the same voice" and for new hires to "get up to speed all speaking with the same voice" as the rest of the company. That includes things like a model generating Facebook ads using the customary language of the client.
The ability to cut the cost of training and dramatically speed up training time of large language models "is a huge draw for us" to working with Cerebras, said Rogenmoser.
Jasper recently closed on a Series A round valuing the company at $1.5 billion, said Rogenmoser.
Using the dedicated clusters can be not only faster and cheaper, but more nuanced, said Cerebras' head of product, Andy Hock, in the same press briefing.
"One of the things we observe more broadly in the market is that many companies would like to be able to quickly research and develop these large-scale models, but the infrastructure that exists in traditional cloud just doesn't make this kind of large-scale research and development easy," Hock said.
"Being able to ask questions like, should I train from scratch [a large language model], or should I fine-tune an open-source public check-point, what is the best answer, what is the most effective use of compute to lower the cost of goods to deliver the best service to my customers -- being able to ask those questions is costly and impractical in many cases of traditional infrastructure."
The Cerebras clusters enable Jasper and other to ask those questions, he said.
Both announcements were made on the occasion of the 36th annual Neural Information Systems Conference, or NeurIPS, the premiere conference of the AI field, taking place this week in New Orleans.