Video cameras are so good and storage so cheap that we can make lengthy videos of our everyday activities that even the makers don't want to watch. Techies are hard at work to enable automated editing that emphasizes the interesting bits. Here's a report from the front lines.
Video is the fastest growing consumer of storage worldwide, and billions of people don't yet have video cameras. But it can also be a huge time sink as we watch videos whose information rich parts are buried in minutes of repetitive or irrelevant content.
If we're going to domesticate the video explosion, we need to automate editing so videos show us what what is interesting to humans. But how?
We first extract the semantic information (e.g., people, car plates, charming environments) from each frame of the input video. These data define a semantic profile of the video which we use to split the stream into relevant and non-relevant segments. For each type of segment, we calculate different speed-up rates, assigning lower rates to the relevant segments.
The semantic analysis is the key piece. Some of the analysis uses well-understood tools, such as facial recognition. Another piece is assuming that content closer to the center of the frame is more important.
But another critical piece is a convolutional neural network, CoolNet, that rates the "coolness" of a video frame after being trained on web video statistics (presumably, popularity). The combination of techniques gives a value of semantic interest, which then feeds the level of speed up applied to the final output.
Particle Swarm Optimization
Semantic content isn't the only important consideration for an individual frame. Parameters such as jitter, quality, and velocity are also important. Their variety and complexity means that it is easy for users to make poor choices, so the researchers propose using Particle Swarm Optimization (PSO) to automate the selection of parameters.
The PSO algorithm is an iterative method that groups particles arranging them randomly in the search space. At every iteration, the particles positions (parameters values) are updated to follow the local and global best particles. The solution is given by a fitness equation defined according to the problem.
Got that? Me neither.
But the bottom line is a video where the important (to humans) parts are presented at rates that allow us to see and appreciate what we value, while speeding up the boring parts.
The Storage Bits take
My first YouTube video, a 1 minute guide to replacing a MacBook hard drive, used the speedup technique to condense about 10 minutes of clock time content down to 1 minute. And it racked up almost 450,000 views. So there's market demand for shorter and smarter videos.
I'm sure YouTube would love to automagically edit millions of videos, both to lower infrastructure costs, and to make their content more interesting. While this paper is certainly not the last word in optimizing videos, it certainly points the way to a more interesting - and quicker - video future.