The group's work employs a type of AI algorithm called generative adversarial networks (GAN), the favored deep-learning technique for creating 'deep fakes' or manipulating real video and images to create very convincing fakes.
Instead of deep fakes, Kaidi Cao of Tsinghua University, Jing Liao of Microsoft Research, and Lu Yuan from Microsoft AI Perception and Mixed Reality, used GANs to create a cartoon generator capable of outputting caricatures like a human artist.
The deep neural network consists of two caricature GANs, or CariGANs: CariGeoGAN for modeling the geometric exaggeration of a person's face and transferring it from photo to cartoon, while CariStyGAN transfers the style of caricatures to a given face picture.
The researchers focus on shape exaggeration and appearance stylization on the basis that these are the two key aspects to drawing a caricature that is warped but still recognizable.
As a cartoon-generating app, the researchers configured the CariGANs so that a user can control how exaggerated the face becomes and the style of the caricature by tuning the parameters of the model or by supplying an example caricature that the GAN can mimic.
To capture various caricature styles and how features are exaggerated, the researchers drew on over 8,000 caricatures found on the internet. The photos were drawn from Microsoft's MS-Celeb-1M dataset, previously used in an AI project to recognize celebrities.
And to test how recognizable the CariGAN caricatures are, the researchers conducted two perceptual tests, showing human participants a selection of its caricatures and asking them to pick the one correct subject from a choice of five photos of faces with similar attributes.
The study found that hand-drawn characters had the highest recognition rate among participants. However, CariGAN performed better on this measure than previous techniques for generating characters.
The second study tested how faithful CariGAN outputs were to the hand-drawn caricature styles. To do this they showed participants eight caricatures by artists showing a particular style.
Then they showed one hand-drawn image of a subject and five caricatures from the CariGAN model of the same subject. Next, they asked participants to rank them from "the most similar to given caricature samples" to "the least similar to caricature".
Participants ranked the CariGAN output better than the hand-drawn one 22.95 percent of the time, meaning the AI output is indistinguishable from real hand-drawn caricatures sometimes. The researchers note an "ideal fooling rate" would be 50 percent of the time.
Anyone interested in taking the tests the study's participants did can try them in the supplementary materials link on the CariGAN project page.