When MIT CSAIL PhD student Mark Hamilton saw the "Rembrandt and Velazquez" exhibit in Amsterdam's Rijksmuseum last year, he was surprised to see that some works of art that have no connection on paper, can look eerily similar in reality.
The show's curators had paired Francisco de Zurbarán's The Martyrdom of Saint Serapion, a 17th century Spanish religious painting, with Jan Asselijn's The Threatened Swan, a Dutch canvass from a similar age. While the artists never met each other during their lives, the two works show some clear visual resemblance.
It got Hamilton thinking about the other hidden links that could be uncovered in the history of art. The researcher and his team, in partnership with Microsoft, have now unveiled a new algorithm that takes image retrieval technology a step further, to run through millions of paintings across thousands of years and find unexpected parallels in themes, motifs, and visual styles.
Dubbed "MosAIc", the system is currently running on the databases of works from the Metropolitan Museum of Art and the Rijksmuseum. From a single image, the tool can uncover connections in whatever culture or media the user is interested in, and quickly reach a number of closest possible works that match the original query.
MosAIc, for instance, was presented with the Dutch Double Face Banyan, an anonymous item of clothing from the late 18th century, and found similarities with a Chinese ceramic figurine. The connection can be traced to the flow of porcelain and iconography from Chinese to Dutch markets between the 16th and 20th centuries.
To develop MosAIc, the research team used an image retrieval system and the well-known "k-nearest neighbors" (KNN) algorithm, which is widely used to find objects based on similarity, for product recommendation for example.
Typically, however, image retrieval systems that are enabled by the KNN algorithm present some limitations. The scope of a query is effectively limited: in the case of paintings, users could only ask for similar artwork from a specific artist. Or, they could run so-called "unconditional" queries, and gradually filter their way through results until they got an accurate answer, a process that is costly and time consuming.
Hamilton and his team, instead, created a conditional image retrieval system (CIR), which delegates the filtering to the algorithm. The researchers still used the KNN algorithm, but enabled it to add "conditions", like texture, content, color or pose, while the program is running, until it reaches the closest match for the original query.
The process is called a conditional KNN tree: the algorithm groups similar images together in a tree-like structure, and starting from the trunk, applies new filters as it climbs up, following the most promising branch until it finds the most accurate image.
SEE: Managing AI and ML in the enterprise 2020: Tech leaders increase project development and implementation (TechRepublic Premium)
Hamilton said: "Restricting an image retrieval system to particular subsets of images can yield new insights into relationships in the visual world. We aim to encourage a new level of engagement with creative artifacts."
While recognizing that the technology does not break speed records, the team of researchers said that CIR can improve result diversity in a simple and efficient way.
And the new technology is not limited to artwork queries. Hamilton and his colleagues anticipate a number of applications for the new algorithm, including using MosAIc to better study deepfakes, and particularly where deepfakes most struggle to model reality.
The algorithm, while working its way to the top of the tree to find an image that best matches a real picture, at the same time leaves behind – on its branches – the pictures that it believes fail to represent the original input.
By going back to those branches, the researchers could visualize which images are deepfakes, as well as which conditions, or filters, convinced the algorithm to leave them behind – typically, because the deepfake failed to accurately represent a certain element of reality, like a microphone or a hat.
Although sometimes invisible to the human eye, those "blind spots" are what distinguish a sophisticated deepfake from a genuine image.
Hamilton hopes that MosAIc will be used in many other fields ranging from social science to medicine. "These fields are rich with information that has never been processed with these techniques and can be a source for great inspiration for both computer scientists and domain experts," he said.