Nvidia researchers use deep learning to create super-slow motion videos

The researchers used Nvidia Tesla V100 GPUs and cuDNN-accelerated PyTorch deep learning framework to train their system on more than 11,000 videos shot at 240 frames-per-second.

Video: Nvidia smashes estimates with strong Q1 results

A team of Nvidia researchers this week are demonstrating how they've used deep learning to tackle a common challenge: producing a slow-motion video with existing video footage.

Read also: Nvidia makes Kubernetes on GPUs available | Nvidia doubles down on AI | Nvidia aims to extend its lead in AI | Nvidia wants GPUs reserved for those who need it | Nvidia accelerates artificial intelligence, analytics | NVIDIA swings for the AI fences

nvidia-slomo-dancestrip11.png

Nvidia AI system fills in missing frames between the left and right images (in green borders), enabling high quality video play back in slow motion.

(Image: Nvidia)

The research team, presenting their paper at the 2018 Conference on Computer Vision and Pattern Recognition (CVPR), developed a deep learning-based system that can produce slow-motion videos -- slowed down to any frame rate -- from a 30-frame-per-second video. The result is a high-quality video that looks smooth and seamless in comparison to the existing state-of-the-art methods. Check out the video below.

"You might want to do this because your kid's having a soccer game, and you're taking video but in hindsight... say, 'It'd be nice if I could see it in slow motion,'" Jan Kautz, senior director of visual computing and machine learning research at Nvidia, told ZDNet.

The research could be applied to professional use cases as well. For instance, professional athletes or dancers may want to slow down footage dramatically to study their form.

Kautz's team used Nvidia Tesla V100 GPUs and cuDNN-accelerated PyTorch deep learning framework to train their system on more than 11,000 videos shot at 240 frames-per-second.

Once the system was trained, it could understand how high-frame-rate videos can be broken down-- frame by frame, pixel by pixel.

With that understanding, the system can look at two sequential frames from a 30-frame-per-second video and predict what the frames in between would have looked like if it had been shot at a higher frame rate. The system then "fills in the frames in between and stacks them all together to get a new, slower video," Kautz explained.

Producing high-quality results like this can be challenging for a couple reasons, Kautz explained: First, some objects in videos are simply moving really fast, and it can be hard to predict where they're going. Second, when objects move in a video, they can sometimes obscure other objects -- or reveal objects that were previously obscured. Consequently, the system has to be able to deal with objects that may be visible in one frame but not the next. This method accounts for both of these challenges.

Kautz's team used a separate dataset to validate the accuracy of their system.

While there are obvious use cases for this system, it remains a research prototype. Kautz's team hasn't tried to optimize it in a way that could easily put it in the hands of users -- via a smartphone, for instance.

Read also: 10 things you need to know about GeForce Now (CNET)

"The processing power required for doing this is more than a what a phone would have in this point in time," he said, "but you could imagine uploading [video] to a server -- there are ways of making it work and giving it to users."

Kautz added, "Being a research organization, our goal is to push the state of the art forward, learn from the things we do, and hopefully improve products - it could be our own products or partners' products."