This CMU App Watches Boring Video So You Don't Have To
Say you have a large volume of digital video — hours of nanny-cam footage, perhaps, or a wedding reception.
And it’s boring, deadly boring.
But suppose that, somewhere on that tape, something interesting does happen. Maybe it's just five seconds’ worth of attention-worthy images, buried under a mountain of redundant and predictable ones.
How do you find and extract those brief interludes of relevance? Back in the dark ages, a human editor would have to sit through the whole thing, watching for something out of the ordinary, and snipping out the boring parts.
Now, thanks to software being developed at Carnegie Mellon University, a robot can do that tedious work for you.
Bin Zhao is a PhD student in CMU’s Machine Learning program. Together with his advisor, Prof. Eric P. Xing, he created LiveLight: an algorithm that can sift through hours of stultifying tape and show you only the parts it thinks will interest you.
"What we are trying to achieve is to sandwich this video into, for example, a one-minute trailer video," Zhao said. "So the user can actually spend much less time to watch our trailer video, but at the same time not miss any exciting or interesting parts from the original video."
To make it work, the software has to be taught to recognize novelty. That means creating what Zhao calls a “dictionary” of visual elements based on the first 20 to 30 seconds of video. Anything appearing later in the tape that doesn’t match that baseline inventory gets flagged as potentially interesting.
"So, anything that cannot be explained by this dictionary will be thought of as something novel, something interesting," he said.
It shouldn’t come as a surprise that Zhao has his eye on the social media marketplace. With more people shooting video using wearable devices, he's betting consumers will see value in an app that can easily identify and post the highlights of their daily activities. Google would seem to agree, judging from the Glass maker's funding support for the project.
A pair of U-S military research initiatives, the Office of Naval Research and the Air Force Office of Scientific Research, also chipped in support for the research phase. For those who harbor concerns about the ability of state agencies to amass huge volumes of data unchecked by either legal or technical restrictions, that funding may cast the innovation in a slightly sinister light.
Zhao acknowledges that making surveillance more efficient is likely the most immediate application for LiveLight, which he's been testing on security footage taken from subway stations.
"The normal thing would be people just exiting the train, and then [going] through the gate and [exiting] the subway station," Zhao said. "That is normal stuff. The things that we think are interesting, the things that the algorithm captures, for example, [are] when people try to enter the station through the subway exit – meaning they are trying to ride the train without paying for it."
"Besides that," Zhao added, "the other interesting stuff are when somebody is walking around the subway exit looking suspicious."
If the idea of a computer watching you for signs of suspicious behavior is disturbing, it may help to know that the algorithm is not yet efficient enough to do so in real time. At present, an hour-long video can take up to two hours for the software to analyze – in other words, not significantly faster than a human editor could do the same job.
But that could soon change. Zhao and his academic advisor are currently seeking investors for their startup company, as they try to scale up the computational power that drives the software. With a beefier back-end, Zhao says, they’re confident they can get the processing time for an hour's worth of video down to twenty minutes or less. That claim has gotten LiveLight noticed by surveillance firms, and also from companies that host and stream online videos and need a cheap way to analyze their content.
Meanwhile, Zhao’s team is doing internal testing of a newly developed mobile app.
Read their recently published paper and check out demo videos here.