Caption by: Jonathan Bennett
Google's Mini search appliance is aimed at smaller organisations that need to find documents on their local network or pages on their own web sites. It's recently been updated to add more features — some helpful when you're using it on your intranet, some when it's the search on your web site. It can generate a sitemap file for your site for submission to Google or other search engines that support the format, link into Google's Analytics service giving you detailed usage figures for your site, and can protect sensitive content on your intranet while still indexing it for authorised users. The OneBox feature from the larger Google search appliance has also been brought to the Mini.
We reviewed the Google Mini last year and since then the hardware hasn't changed: it's a 1U rack-mount unit that — although it has keyboard, mouse and monitor ports — you just plug into the power and the network; everything else is done with a browser.
Using the Mini hasn't changed much either: you tell it where you'd like it to start, a list of URL patterns that match the content you want to index, and another set of patterns that define exclusions. Once this is done, the appliance starts crawling and gathering results, up to the document limit stipulated in your licence — 50,000 in the case of our review system.
To look at those results you define one or more Front Ends, which are different interfaces to the Mini. Each Front End can include just a subset of the crawled documents or the whole lot, and can have its appearance customised. A wizard is available if you just want to tweak the search pages, but you'll have to know XSLT to customise it heavily.
Google's Sitemap Generator feature allows you to explicitly tell the search engine's crawlers where content is on your site. It's a very useful feature for larger web sites that don't get crawled properly, but compiling the sitemap file itself can be a tortuous process, since you need to list every page on your site that you want to be indexed. The Mini can now solve this problem for you by generating a sitemap from the pages it crawls itself. Go to the Crawl Diagnostics option in the user interface, choose the list view option and hit the export button, and a sitemap is yours.
We tested this process on the whole of ZDNet UK, and found that getting a good sitemap is a bit more involved than just setting the appliance loose on your site. You need to use the tuning facilities in the Mini — including or excluding certain patterns in URIs — to get the best set of results. This is particularly important if the size of your site is close to or exceeds the document limit of the Mini. The sitemap you get from the Mini is perfectly usable, but it may not reflect exactly how you'd like your site to be crawled by Google: some of the priority numbers may not be what you're expecting, for example. However, compared with either creating a map by hand or writing a custom sitemap generator for your site it's a much, much simpler way of getting 80 percent of the same result.
The Mini appliance now also links into Google's Analytics service, so that the code necessary to track usage statistics is included automatically in the results pages for the appliance. This is done on a per-Front End basis, so that if you have, for example, one Front End for internal users and one for external users, you can track their usage separately.
The OneBox feature has long been included on the enterprise-level Google Search Appliance, but has only made it onto the Mini with the latest release. OneBox allows you to extend the appliance's capabilities by adding software modules that connect to applications such as CRM or other business intelligence systems on your network. However, in most cases this requires extra infrastructure, such as a server on which to host OneBox modules — all the Mini appliance does is to aggregate the results from those modules.
You can configure what triggers a OneBox search, from a keyword at the start of a search string, to a regular expression. On the Mini you're limited to how many OneBox results can be returned for a single search, configurable from one to four. These results are shown at the top of the page, above the normal search results.
The recent updates to the Google Mini make it far more useful in an intranet, since it now can index sensitive documents without the risk of exposing their content to the wrong people; the OneBox feature, while not straightforward, does allow you to integrate other systems into the search results. Although there are cheaper ways of setting up search for your intranet or your web site, there are few, if any, simpler than the Google Mini — and now it's that bit more helpful.
Caption by: Jonathan Bennett