Yahoo releases CaffeOnSpark deep learning software to open source community

The machine learning software is now open for further development and use by applications outside of Yahoo's ecosystem.
Written by Charlie Osborne, Contributing Writer

Yahoo has opened the gates for open-source developers to take advantage of CaffeOnSpark, deep learning software which powers Flickr's image capabilities.


On Wednesday, members of Yahoo's Big ML Team Andy Feng, Jun Shi and Mridul Jain, said in a blog post that CaffeOnSpark, a deep learning system, is now available to the open-source community for further development.

Based on the Apache Spark open source cluster computing framework, Yahoo says the system bolsters the Spark framework, which already comes with Spark MLib, a package of non-deep learning algorithms for classifying data. CaffeOnSpark takes this further, giving Spark applications the ability to use dataframes to extract predictions and models based on collections of user-generated data.

"We believe that deep learning should be conducted in the same cluster along with existing data processing pipelines to support feature engineering and traditional (non-deep) machine learning," the team says. "We created CaffeOnSpark to allow deep learning training and testing to be embedded into Spark applications."

Yahoo says that CaffeOnSpark also eliminates unwanted data movement and enables deep learning to be conducted on Big Data clusters directly, which improves the speed and efficiency of such tasks.

Machine learning powers many of today's applications. From the IBM Watson group to Google products and Yahoo's Flickr image search service, deep learning software gives technology vendors the chance to analyze vast amounts of data and elicit patterns and predictions based on this data.

CaffeOnSpark has recently been applied to Flickr to improve image recognition capabilities through training with Hadoop clusters.

While Yahoo is not one of the frontrunner technology companies you would necessarily link with artificial intelligence, releasing CaffeOnSpark to the open-source community can only benefit the industry as whole.

The CaffeOnSpark deep learning system is available on GitHub under an Apache 2.0 license.

Yahoo's move trails after other firms which have also released AI and deep learning systems under open-source licenses. Last week, Google released TensorFlow, software designed to make developing machine models for large-scale experiments, to the open-source community.

In January this year, Facebook and Microsoft also issued open-source licenses for deep learning software; namely, Microsoft's Computational Network Toolkit and Facebook's Torch artificial intelligence project deep learning modules.

10 things you didn't know about the Dark Web

Read on: Top picks

Editorial standards