SISR makes every cam a super-resolution device

Of all the startling things AI systems have learned to do in the last decade, the one that seems most like magic is single image super-resolution (SISR). Essentially SISR uses AI to take a single lo-res photo and turns it into a hi-res image. Pixels from nothing!

You've seen the movie where a crappy surveillance camera catches an image of a distant car. The boss says "enhance," and suddenly the license plate is readable. I thought it was Hollywood hyperbole. But for good or ill, it's an incredibly active area of research with amazing results.

The latest refinement is detailed in a paper -- Single Image Super-Resolution via a Holistic Attention Network -- from researchers at Northwestern U and the State Key Laboratory of Information Security of the Chinese Academy of Sciences and others.

The most successful process to date uses convolutional neural networks (CNN) to learn a mapping function from the low-res to the hi-res image. Deep CNN consists of multiple layers of artificial neurons, with every node of a layer connected to every node of the adjacent layer. A deep network enables the CNN to learn a complex mapping between the lo-res and hi-res image.

Yet the architecture introduces a problem: Each layer can only respond to the layer above it. All the details of the shallower layers are encoded in what the deeper layers see in the single adjacent layer. Detail gets lost.

Which is a problem. It's like going to a committee that has had many meetings, with only access to the last meeting's notes. There's a lot of history -- detail -- that you won't be privy to.

Likewise, SISR CNNs lose detail the deeper the network goes. The contribution of the paper is to detail a method to overcome this loss of history in deep CNNs.


The paper introduces an architecture that uses two new modules to look at correlations across multiple layers and to learn the interdependencies of features in each layer. Essentially, the new modules give you a summation of all the committee meetings you missed with the most significant details highlighted.

But the proof is in the pudding. Here's an example of the quality of SISR results using the new modules versus older architectures.


The take

Leaving aside the implications of turning every surveillance camera into a super-resolution device, this is pretty amazing technology. If COVID-19 mask-wearing becomes a continuing part of pandemic management I expect similar concepts could be used to "see" through masks to identify individual faces.

Yet I also see the paper as a reminder of just how young AI is as a discipline, and how much more development we can expect in the next 50 years.

Adding a vertical component to the horizontal stack of CNN layers doesn't seem like it should have been a huge conceptual leap. Yet here we are.

Comments welcome.