Information taxonomy is a Web developer's best friend because it can help reach those two most elusive goals of effective Web design: user satisfaction and return on investment. Conversely, even the most efficient search engine cannot completely overcome problems caused by poorly conceived or completely absent information taxonomy.
Although search engines have become much more sophisticated in recent years, finding information on the Internet is still often a hit or miss proposition. You may get zero hits or you may get millions of hits. The second scenario is often like trying to find a needle in a haystack, or perhaps like trying to find the right needle in a haystack filled with a million other needles. The volume of unstructured Web content keeps on growing, which further worsens the problem. Even on an intranet Web site, where the content scope is much smaller, frustration will ensue if the site is not well organized or structured.
Before the advent of the Internet, you could rely on librarians when you needed serious information. Being professionally trained in information retrieval, librarians know how to construct complex queries using Boolean logic, pluses, minuses, or other symbols. The queries are executed against structured information that is properly categorized and labeled with call numbers. But once the Web came along, many people thought the answer to readily accessible information was to dump data on the information superhighway as fast as possible, without regard for its organization.
Thus enters the search engine, laboring hard to extract information from the wilds of the Web. Unfortunately, most of us do not have degrees in library science, so we're limited to conducting crude keyword searches. Now here we are, years after the Internet's arrival, facing haystack after haystack of electronic information—and often becoming frustrated. Yes, more content is available on the Web. But if the Web content is not structured, we can't find it easily, can't find it at all, or—worse yet—find the wrong information.
What does lack of structure cost?
The cost of unstructured Web content is hard to quantify because, depending on the query, the questioner rarely knows in advance whether the answers are available on the Web or where they reside. However, we can qualify the cost as:
Taxonomy intersects with many other areas of Web development, including Web site design, content management, and Web search processes. Let's take a look at the benefits of taxonomy in the context of these areas.
The benefits of taxonomy in Web site design
As I described in a previous article, there are two aspects of taxonomy: taxonomy structure and taxonomy view. Web site design mainly involves taxonomy view, which presents Web content logically by grouping information into topics. By applying taxonomy view in Web site design, we can create a positive Web navigation experience through intuitive organization and labeling. When Web content is logically arranged and clearly labeled, site visitors can navigate and locate information easily and therefore will keep coming back.
The benefits of taxonomy in content management
Libraries organize books and journals according to the Library of Congress Classification System or the Dewey Decimal System. Each item is tagged with a set of standard attributes, such as call number, subject heading, title, and author. Books and journals can then be stored and easily retrieved using manual or computerized card catalogs based on these different types of labels.
Similarly, an enterprise can organize its information resources (documents, Web pages, etc.) using taxonomy structure. Taxonomy structure provides a hierarchical classification system that's based on a defined scope and context. During the content management process, information resources can be categorized and tagged consistently according to the taxonomy structure's standard nomenclature. The taxonomy structure is made available to content managers by providing a list of hierarchical categories during the content management workflow.
As a result, enterprisewide or companywide information stored in a content management system is associated with one or multiple categories. These categorized information resources can later be made available for more effective retrieval on the Web through a taxonomy view or a search engine, just like the traditional library catalogs. The ultimate goal of a content management system is to make enterprise or company content available whenever and wherever needed. Thus, by integrating a taxonomy framework into content management processes, we increase the utility of the content management system.
The benefits of taxonomy in the Web search process
Both taxonomy view and taxonomy structure are involved at different stages of the Web search process. Prior to search execution, a search engine spiders and indexes Web content within a target scope. Some search engines, such as Autonomy, can use taxonomy structure to learn characteristics of each category by analyzing a sample set of documents. This learning capability can help fine-tune the search relevancy.
During search execution, taxonomy structure can be exposed as taxonomy view by providing multilevel hierarchical categories on a Web site, similar to Yahoo's directory listing. Site visitors can browse a list of Web content for each category or drill down to a specific category and then execute a search within the selected category scope. For instance, they might search within Computers & Internet ? Software ? Natural Language Processing.
After search execution, search results can be organized based on one or multiple system-defined taxonomy structure(s). For example, Convera's RetrievalWare can be configured to provide multiple ways to classify search results, such as sorting by country, language, or subject. This type of capability greatly alleviates the haystack problem. Instead of sifting through thousands of hits one by one, a site visitor can filter by country, language, or subject. So a search on "content management system" on the Yahoo Web site (as of April 20, 2003, Search the Web) results in 3,790,000 hits. If these hits are organized by subject, such as best practices, tools, and so on, navigating the results to arrive at the desired answer is much more manageable and less time-consuming.
By integrating taxonomy throughout the Web search process, we can provide a more efficient search experience. Less time is wasted in failed searches or in finding the wrong information, thereby facilitating an effective decision-making process.
Builder.com originally published this article on 1 July 2003.