Open source hasn't made huge inroads in web search but Apache's Lucene/Solr platform is beginning to make gains in enterprise search, particularly in light of the acquisition binge of proprietary giants.
First, there was Microsoft's acquisition a few years back of FAST, which became integrated into Microsoft SharePoint. Then there was HP's purchase of Autonomy, followed by Oracle's pickup of Endeca and more recently, IBM's acquisition of Vivisimo.
The CEO of Lucid Imagination, whose LucidWorks enterprise search platform is based on Lucene/Solr (which are used in combination), said fears of vendor lock-in and platform abandonment are driving more interest in open source alternatives.
"Clients are asking about open source and Lucene and Solr technologies as a hedge against the future downside of acquisitions," said Paul Doscher, CEO of Lucid Imagination, who said he was in discussion with a Gartner Group analyst on this very topic just yesterday.
"FAST won't be supported on Linux or any other platorm. We're seeing a migration away from FAST customers that don't want to implement a Microsoft stack," Doscher said. "Customers [of Microsft, Oracle and IBM] now have to be concerned about pricing and support and from a roadmap perspective. They're interested in Lucene and Solr because there's no vendor lock in. "
Lucene (and Nutch) were invented by former Yahoo engineer and now Cloudera architect Doug Cutting. Lucene is used by Facebook, Twitter, Groupon, Boeing, Ford and Shopzilla, to name a few.
Lucene has been around for about a decade and Solr, the related search server, for roughly half that. Apache released the alpha of Lucene 4.0 and Solr 4.0 earlier this month and Nutch 2.0 was released today.
Several LucidWorks current and former employees are closely involved in Apache's Lucene/Solr projects, including Erik Hatcher, who sits on Apache Lucene Project Management Committee.
He sees a bright future for the open source search platform in the big data era and notes that Apple, Microsoft, Zappos, Orbitz, Wells Fargo, The Motley Fool, Cisco, the CIA, USDA and NIH are all Lucene/Solr users.
"Nutch is a web crawler. In order to crawl at large scale, it needs a large amount of space to store it, and thus it uses Hadoop for distributing the crawling jobs and storage of the crawled content. To make the content searchable, it needs to be indexed," Hatcher said via email. "Lucene is a top notch open source search library, which wrapped by Solr making it a search engine service. Crawled content from Nutch can then be sent to Solr for indexing, which internally uses Lucene for the core indexing, and searches then are directed to Solr (which again uses Lucene internally for searching). The pieces are most definitely in place to compete in enterprise search. I personally have worked with companies that have replaced the Google Search Appliance with Solr and our enterprise LucidWorks platform."
Google still seems untouchable in the web search (and advertising) arena. But the release of Apache's related Nutch 2.0 web crawling platform has some wondering if open source one day will be a viable contender -- if a big enough vendor or consortia of vendors invest the CapX necessary to go up against the Big G.