In the influx of deep learning startups, one stealth mode venture, Vicarious, has already made something of a name for itself. As reported in MIT Technology Review, the Silicon Valley-area company is working to give computers the power of imagination, or something close to it.
The article reports that this innovative effort is going against the tide of deep learning front-runners like Google, Microsoft and Amazon, seeking a more evolved approach to deep learning that is closer to how the human brain works. Not content with deep neural networks that are trained to just recognize words, symbols and images, Vicarious has the ambitious goal to help a machine make sense of the real-world environment by imbuing it with an understanding of lower-level features, for example biological features, like water.
Not many details have been disclosed yet, but Vicarious maintains that the next-generation neural-network algorithm that it is designing will be capable of emulating human-like imagination. And the company claims that such a system will be able “to learn from less data, and to recognize stimuli or concepts more easily.”
“We are really rapidly approaching the amount of computational power we need to be able to do some interesting things in AI,” said the 33-year-old CEO D. Scott Phoenix. “In 15 years, the fastest computer will do more operations per second than all the neurons in all the brains of all the people who are alive. So we are really close.”
One element of the company’s approach is to implement feedback connections, taking a page from how the brain’s neural system works. While feed-forward models are common in many deep learning systems, so increasingly are recurrent neural networks RNNs, which use feedback mechanisms that introduce loops. These systems are more powerful but also more complicated. Baidu has had success with using this approach for developing speech recognition systems in noisy environments, as demonstrated at NVIDIA’s GTC 15 event and laid out in this paper.
Vicarious has been parsimonious in its disclosures, but it did demo a successful captcha solver in 2013. Even humans have a tough time correctly identifying the disfigured numbers, letters and puzzles intended to thwart bots and malicious actors, but the CEO says the new system can imagine what the characters would look like sans disguise. A paper that lays out the framework for the captcha solver is expected sometime this year.
It might sound fanciful without more substantiating technical details, but there’s no doubt the startup is generating buzz and unlocking budget. It has already raised $72 million and its investor roster is full of Silicon Valley A-listers like Dustin Moskovitz, former-CTO of Facebook, Adam D’Angelo, cofounder of Quora, as well as Peter Thiel, Mark Zuckerberg, Jeff Bezos, and Elon Musk. Samsung and Wipro are also contributing capital.
The potential application set for the deep learning system includes visual recognition and interpretation, as well other sophisticated human-centric skills, like language and logical reasoning.
A few more clues are found on the company’s website, which provides this boilerplate and also reveals an impressive talent pool.
We are building a unified algorithmic architecture to achieve human-level intelligence in vision, language, and motor control. Currently, we are focused on visual perception problems, like recognition, segmentation, and scene parsing. We are interested in general solutions that work well across multiple sensory domains and tasks.
Using inductive biases drawn from neuroscience, our system requires orders of magnitude less training data than traditional machine learning techniques. Our underlying framework combines advantages of deep architectures and generative probabilistic models. We use modern software engineering practices, and we strive to maintain a codebase and a culture that are a joy to work in.
For additional information, read the original piece by Will Knight, at MIT Technology Review.