When it was first introduced by Meta's Mark Zuckerberg last fall, there was skepticism in some corners about the metaverse, the systems of avatars and virtual worlds that Zuckerberg is building and which he says will be the next version of the internet.
Richard Kerris, who runs a team of a hundred people at chip giant Nvidia who work on building technology for the metaverse, known as Omniverse (more here), is not at all skeptical about that future world.
He is skeptical about one thing, though.
"The only thing I'm skeptical about is how people tend to talk about it," Kerris told ZDNet, on a recent trip through New York City to meet with developers.
"People are misinterpreting metaverse as a destination, a virtual world, a this or that," Kerris observed. "The Metaverse is not a place, it's the network for the next version of the Web.
"Just replace the word metaverse with the word network, it'll start to sink in."
The network, in the sense that Kerris uses it, is a kind of sinewy technology that will bind together rich media on many websites, especially 3D content.
"In much the same way the Web unified so many things […] the next generation of that Web, the core underlying principles of that will be 3D, and with that comes the challenge of making that ubiquitous between virtual worlds.
"The end result would be, in much the same way you can go from any device to any website without having to load something in — remember the old days -- What browser do you have? What extension?, etc. — all that went away with HTML being ratified. When we can do that with 3D, it's going to be transformative."
No surprise being from Nvidia, which sells the vast majority of graphics chips (GPUs) to render 3D, Kerris made the point that, "We live in a 3D world; we think in 3D," but the Web is a 2D reality. "It's limited," he said, with islands of 3D rendering capabilities that never interconnect.
"The consistency of the connected worlds is what is the magic that's taking place," he said. "I can teleport from one world to another, and I don't have to describe it each time that I build it."
The analog to HTML for this new 3D ecosystem is something called USD, universal scene description. As ZDNet's Stephanie Condon has written, USD is an interchange framework invented by Pixar in 2012, which was released as open-source software in 2016, providing a common language for defining, packaging, assembling, and editing 3D data.
(Kerris, an Apple veteran, has something of a spiritual if not actual tie to Pixar, having worked at LucasFilm for several years in the early noughts. See more in his LinkedIn profile.)
USD is capable of describing numerous things in a 3D environment, from lighting to the physics behavior of falling objects.
In practice, Kerris imagines the Omniverse-enabled, USD-defined metaverse as a road trip where people hop from one 3D world to the next as effortlessly as browsing traditional sites. "I can go from a virtual factory to a virtual resort to a virtual conference room to a virtual design center, to whatever," said Kerris.
Within those environments, 3D rendering will allow people to move past the cumbersome sneakernet of file sharing. "And it allows a lot more capability in what I do," he said, offering the example of product designers.
"With metaverse, and ubiquitous plumbing for 3D, we'll be in that 3D environment at the same time, and rather than sharing a Web page, we can move around. You can look at something on this side of the product. I can be looking at something else, but it's like we're in the same room at the same time."
Nvidia, said Kerris, started down the path on USD six or seven years ago "because we simulate everything we build [at Nvidia] before we build it in the physical world," he said. Nvidia has peers in industry working on realizing technology, including Ericsson, which wants to simulate antennae. "They all want a reality simulation," he said of companies in the USD fold.
Using the technology, said Kerris, one can go much deeper into the realm of digital twins, simulations of products and structures that allow for intervention, experimentation, and observation.
"Until the advent of consistent plumbing, it was done in a representative mode," he said, such as an illustration of a building in Autodesk. "It wasn't true to reality. I couldn't show you exactly how it would be in a windstorm," which isn't good because, as he put it, "I want to be damn straight about stuff I'm building in the physical world."
The "core base of a situation that's true to reality," using USD, will allow designers to more accurately simulate, backward and forward, including things such as tensile strength.
"I'd love to have a house that's structurally sound before I design the marble finish," he observed. "If I'm building a digital twin of a house I'm building, it's layers of stuff on there, things for structural engineers, and polish that others are going to come in and finish." The important thing is knowing it's "true to reality" for materials and things holding the structure together, he said.
By making possible those richer interactions in 3D, Kerris said, "In the same way that the Web transformed businesses, and experiences, and communication, so will the metaverse do that, and in a more familiar environment, because we all work in 3D."
Different companies are contributing to USD in different ways. For example, Nvidia has worked with Apple to define what's called rigid body dynamics.
"And there's more to come," he said.
Nvidia has been developing the Omniverse tools as a "platform," what Kerris calls "the operating system for the metaverse."
"People can plug into it. They can build on top of it. They can connect to it. They can customize it — it's really at their disposal, much the same way an operating system is today."
The USD standard has come "quite far" in terms of adoption, Kerris said, with most 3D companies using it. "Every company in entertainment has a USD strategy today," he observed. "The CAD [computer-aided design] and mechanical engineering, it's coming. They either have plans or they are participating in helping to define what's necessary."
"HTML was the same way in early days," he said. It lacked support for video in early days, with third-party plugins such as Adobe Flash dominating before standards evolved.
Will digital twins ignite the world's imagination about the metaverse? It seems somewhat too industrial-focused, ZDNet observed.
Ordinary people will gain interest as they realize it is connectedness, not a single destination. "As they realize it's the next generation of the Web, I can visit a remote location without the need of a headset, or [without] installing specific browsers. That's one aspect," Kerris said. "In their everyday life, as we share photos today, you'll be able to share objects. You know, your kid comes home, and they made something and they'll be able to share it with the grandparents."
"It'll just become part of what you do, whether you're buying a piece of furniture for your house and you'll go into your phone. You'll sync with the home. You'll drop the furniture in. You'll walk around it — that's the thing people will take for granted, but it's the seamless connection."
The same for designing one's custom car finish, he offered. "You'll actually be connected to the factory making that car" to check out all the aspects of it.
"It's going to change everything," he said.
There will be multiplier effects, said Kerris, as digital twins allow for trialing multiple scenarios, such as with training robots.
"Today, they would plug a computer into that robot, and input it with information" to train the robot in one physical space, he said. In a digital twin environment, with a robot in the simulated room, "You can train not only one robot but hundreds," using "hundreds of scenarios the robot could encounter."
"Now, that robot is going to be thousands of times smarter than it would have been if you'd only trained it one time." Nvidia has, in fact, been pursuing that particular approach for many years by doing autonomous driving training of machine learning in simulated road environments.
Although autonomous driving hasn't reached its promised development, Kerris believes the approach is still sound. "I can build a digital twin of Palo Alto," the Silicon Valley town. "And I can have thousands of cars in that simulation, driving around, and I can use AI to apply every kind of simulation I can think of — a windstorm, a kid running out, chasing a ball, an oil slick, a dog — so that these cars in simulation are learning many thousands of times more scenarios than a physical car would."
Nvidia has been doing work, combining the simulated trials with real-world driving with car maker Mercedes for Level 5 autonomous driving, the most demanding level.
"The efficiency is pretty amazing," he said, meaning, how well the autonomous software handles the road scenarios. "By using synthetic data to train these cars, you have a higher degree of efficiency" when combining scenarios.
"I would much rather trust myself riding in a car trained in a simulated environment than [in] one trained in a physical environment." There still will be a role for the real-world data that comes from cars on the road.
As for the time frame for the vision, Kerris noted that "we are seeing it already in warehouses," which are rapidly adopting the robot-training regime. That includes Amazon, where a developer downloaded Omniverse and evangelized it within Amazon. The enterprise version of Omniverse, which is a subscription-based product, was taken up by Amazon for more extensive robot training.
Amazon currently is in production with the software for its pick-and-place robots.
"The beauty is they discovered by using synthetic data generation they were able to be more efficient with stuff rather than just rely on the camera" on the robot for object detection. Those cameras often would get tripped up by reflective packing tape on packages, Kerris said. Using synthetic Omniverse-generated data got around that limitation. That's one example of being more efficient in robotics, he said.
Consumers will probably feel the effects of such simulations in the results.
"There are a hundred thousand warehouses on the planet," Kerris said. "They are all looking at using robotics to be safer, more efficient, and to better utilize the space." People "may not be aware that's taking place, but they'll reap the benefits of it."
In some situations, consumers will "know, because they're getting things a lot faster than in past," he said. "Behind the curtain, things will be much more efficient than they were six months ago." The same goes for retailers such as Kroger, which is using Omniverse tools to generate synthetic data to plan how to get produce to consumers faster.
As for self-driving cars, "The presumption that all these cars will be autonomous today, it's a bit — it's not there yet," he conceded. "But will we have autonomous taxis, and things that will take us form here to there? Oh, yeah, that's easy."
But, "For a car that drives up to you and it will drive you to New Jersey autonomously? We have a little ways to go."
As for direct consumer experiences, "People will start to see the ability to experience locations," Kerris said. Leisure industry executives are interested, for example, in how to showroom a hotel room to consumers in advance of a trip in a way better than photos. "I'm going to allow you to teleport into the room, experience it, so your decision will be based on an immersive experience. Look at the window, see what my view is going to be," Kerris said.
The impact on education "is going to be huge," Kerris said. Today, physical location means some inner-city schools might not experience lavish field trips. "An inner-city school is not exactly going to have a field trip to do a safari in Africa," he mused. "I think that virtual worlds [that] are seamlessly connected can bring new opportunities by allowing everybody to have the same experience no matter what school they're in."
An avatar of researcher such as Jane Goodall could "inspire learning," he suggested. "Think about what that does for a student."
While emphasizing 3D, Kerris is not pushing virtual reality or augmented reality, the two technologies people tend to focus on. Those things are part of the picture, but 3D doesn't have to be with a headset on, he asserted.
For one thing, today's VR tools, such as VR videos on YouTube that use conventional VR headsets, have been quite limited, Kerris said. "It's not seamless; it's not easy; it's not like a website," he observed.
In addition to stints at Apple, Amazon, and LucasFilm, Kerris briefly ran marketing for headset developer Avegant. Those headsets were not VR. They were made to be private, immersive movie screens attached to your face using Texas Instruments DLP projection chips. The quality of the product, Kerris reflected, "was phenomenal," but it was too expensive to make, costing $800 at retail. And the fact that a laser would project onto the retina "scared everyone," he said. (Avegante is still in business, developing a technology called liquid crystal-on-silicon.)
What needs to happen is for today's disparate virtual environments to receive that sinewy tissue of USD and related technology. "They're all disconnected," said Kerris of today's proto-metaverse, such as Oculus Rift. "If they were just simple websites, where you could bop around and go experience it, the opportunity would be much greater."
Rather than having to have an Oculus headset, "If I could experience it with this being a window into that world," he said, holding up his smartphone, "chances are a lot higher I would go check it out."
Will USD make that happen?
"Yes. That's absolutely the goal of USD to unify 3D virtual worlds."
Still, showrooming hotel rooms doesn't sound like it will jumpstart things. When is the Tim Berners-Lee event that will make it all happen for consumers in a grassroots way?
"When did the Web become something that became ubiquitous with consumers?" he asked, rhetorically. "Well, it started with email, then I could send a picture, then, all of a sudden I could do video. It kind of evolved as it went along."
Kerris alluded to the early days of mobile websites on iPhone, when Steve Jobs first unveiled the technology in January 2007 onstage at Macworld, when Kerris was with Apple, and later, on a video chat via FaceTime,
"What was the transformative thing that allowed the Web to be in everybody's pocket? It's kind of like that," he said. "It almost happened when you didn't know it, and then people take it for granted."