Apache Lucene and its high level services wrapper, Solr, provide extremely powerful full-text search, among other search functionality, and are widely used across the internet.
Think search has nothing to do with big data? Think again, because Lucene and Hadoop have a special relationship. To start with, Doug Cutting, now chief architect at Cloudera, is the man behind both projects. Next, Lucene can work over HDFS; process data stored in HBase; and Mahout can use Lucene indexes. Lucene and Solr are also included with certain distributions of Hadoop. In fact, as I reported earlier this month, MapR's Hadoop Distributions include the full LucidWorks suite, which is based on Lucene/Solr (and LucidWorks, is the major commercial entity behind Lucene).
Today, Lucene/Solr 4.3.0 was released and made available for immediate download. The 4.3.0 package includes improvements in numerous areas, including query performance, spatial processing, and the read-side schema API. 4.3.0 also includes numerous enhancements to Lucene's faceted search capabilities, whereby dimensions can be used in search, much as they are used for drill-down analysis in data warehouses and OLAP cubes.
Full details of the release — and links to download both the Lucene and Solr components of it — are available at the Lucene project's home page. And if you're intrigued by all this, check out the website for Lucene/Solr Revolution, LucidWorks' first annual conference on Lucene and Solr, which was just held in San Diego.