An MIT algorithm has managed to produce sounds able to fool human listeners and beat Turing's sound test for artificial intelligence.
Researchers from the Massachusetts Institute of Technology are using Alan Turing's tests, developed in the 1950's, as a benchmark to see if humans can create machines with a high enough level of artificial intelligence which is "indistinguishable" from humans.
The academic institution has already used Turing's work to develop a system which is able to pass a "visual" Turing Test by writing characters which fool us.
Now, a team from MIT's Computer Science and Artificial Intelligence Lab (CSAIL) have created a deep learning algorithm which passes the Turing Test for sound.
MIT's team spent several months recording approximately 1,000 videos containing at least 46,000 sounds produced by hitting, prodding and scraping various objects with a drumstick.
These videos were then fed into the algorithm which deconstructed and analyzed the sounds' pitch, volume and other characteristics. The algorithm is then able to 'predict' the sounds of a video by looking at the sound properties of each frame and stitching together bits of audio which match similar sounds within its database.
As a result, the AI was able to simulate different hits -- whether it was a "thud," staccato taps or rustles.
"When you run your finger across a wine glass, the sound it makes reflects how much liquid is in it," said CSAIL PhD student Andrew Owens, leader of the research. "An algorithm that models such sounds can reveal key information about objects' shapes and material types, as well as the force and motion of their interactions with the world."
The next stage was testing. In an online study, participants saw two videos of collisions -- one of which contained the true recorded sound, and the other the algorithms'. When asked which video was real, subjects picked the fake sound twice as often as the real one.
The algorithm's creation of 'dirty' sounds such as leaves and dirt, in particular, fooled the audience the majority of the time.
There is still room to improve the system. For example, if the drumstick moves erratically, the algorithm is more likely to miss or create a fake beat. According to the research team, refinements can also be made as the algorithm currently only applies to "visually indicated sounds," -- those which are caused by physical interactions in the video -- and no others.
Despite these challenges, MIT is paving the way forward in the AI field by using Turing's tests as a starting point. In the future, the team wants to improve the robotic ability to interact with surroundings and environments as a whole, including the sense of sound.
"A robot could look at a sidewalk and instinctively know that the cement is hard and the grass is soft, and therefore know what would happen if they stepped on either of them," said Owens. "Being able to predict sound is an important first step toward being able to predict the consequences of physical interactions with the world."
The research has been partly funded in part by the National Science Foundation and Shell. MIT's paper will be presented later this month at the annual conference on Computer Vision and Pattern Recognition (CVPR) in Las Vegas.