X
Business

From facts on Google-retrieved Web pages to historical graphs

A newly published Google Patent application entitled "Displaying Facts On A Linear Graph" appears to explain a method for extracting factual information from web pages retrieved via Google searches, and then organizing that info into tabular forms.So why would this technology be advantageous?
Written by Russell Shaw, Contributor

A newly published Google Patent application entitled "Displaying Facts On A Linear Graph" appears to explain a method for extracting factual information from web pages retrieved via Google searches, and then organizing that info into tabular forms.

So why would this technology be advantageous? A good argument for this is right in this Patent's apps literature:

The World Wide Web and other information storage and retrieval systems contain a great deal of information. With the advent of search engines and other similar tools it has become relatively easy for a user to locate particular information. For example, one can obtain a wealth of information about World War II by simply searching for the phrase "World War II" on the Web.

Information on the web is not always in a format that is easy for the user to comprehend. A web site describing biographic information for a person might not present the information in a graphical manner. Some web sites, like sites providing stock prices and other financial information, provide limited graphing abilities.

For example, a person can interact with such sites to create and manipulate graphs showing stock prices over time. Even on these sites, however, the person is restricted to a limited set of graphing options.

Moreover, the sites do not permit the user to graph and compare data from across different web sites. For example, there might be three different web sites respectively describing major US battles of WWII, WWII battles from 1939-1941, and Russian WWII battles. These sites contain related information and a user might find it beneficial to display information from all three sites in a single, easy to understand format.

Therefore, there is a need in the art for a way to enable the user to organize and view ordered information from web pages in a way that makes it easier to comprehend.

For me to show you how this would work, I think the best approach would be to show you Figure 3, along with the accompanying text.

FIG. 3 is a high-level block diagram illustrating modules within a presentation engine 300 according to one embodiment. As used herein, the term "module" refers to computer program logic and/or data for providing the specified functionality. A module can be implemented in hardware, firmware, and/or software. Some embodiments have different and/or additional modules than those shown in FIG. 3. Moreover, the functionalities can be distributed among the modules in a different manner than described here.

A presentation engine 300 presents objects and facts in an automated and customizable manner. That is, the presentation engine 300 provides flexible tools that enable a user to automatically create customized views of the facts associated with one or more objects. The presentation engine 300 thus allows the user to view information in a way that is more comprehensible to the user.

In one embodiment, the modules of the presentation engine 300 are implemented as a JavaScript program that executes on client devices such as personal computers, mobile telephones, personal digital assistants (PDAs), handheld gaming devices, etc.

The JavaScript program interfaces with the data processing system 106 to access data and functionalities provided by it. The JavaScript program itself is controlled by a web browser, operating system, or other entity executing on the client device. In one embodiment, the JavaScript program uses

Asynchronous JavaScript and XML (AJAX) technology to interface with the data processing system 106 and obtain data. In other embodiments, the presentation engine 300 is implemented using different coding techniques and/or technologies, relies solely on client-side data, and/or executes on the server.

An object access module 310 receives objects and facts from the repository 115 and/or another source. In one embodiment, the object access module 310 is an object requester 152 and receives objects from the service engine 114 in response to search queries executed by a user.

For example, assume that the repository 115 stores a set of objects related to United States presidents, other politicians, and related public figures, including at least one object for President Bill Clinton. Also assume that the user executes a query for objects or facts that match the phrase "Bill Clinton." In response, the object access module 310 receives objects or facts related to Bill Clinton, including the object for Bill Clinton, an object for Hillary Clinton, an object for Monica Lewinsky (don't go there, reader-Russ) and facts and objects for other people, events, and/or places related to Bill Clinton.

In another embodiment, the object access module 310 interfaces with an Internet search engine, such as the search engine provided by GOOGLE INC. of Mountain View, Calif. The object access module 310 provides a search query to the search engine and receives in return web pages and/or other electronic documents satisfying the query.

In one embodiment, the object access module 310 and/or another module allows the user to select particular returned objects (or documents) and designate the objects for further analysis. In this manner, the user can execute multiple different searches on the repository 115 and designate objects from the different searches for further analysis.

In another embodiment, the user does not need to designated objects; all objects returned in response to a query are automatically designated for analysis.

In one embodiment, a collection module 312 receives the objects designated for further analysis. The collection module 312 can store multiple collections of objects simultaneously. In one embodiment, the user specifies the collection in which the collection module 312 stores the objects.

In other embodiments, the collection module 312 stores designated objects in a default collection if the user does not specify a particular collection. The collection module 312 provides an interface allowing the user to manipulate the collections and the objects within the collections. For example, the user can view the objects within different collections, and add and remove objects.

Although much of this discussion focuses on a collection containing objects related to Bill Clinton, a collection can hold arbitrary and heterogeneous objects. For example, a collection can contain objects for the atomic elements, baseball teams, the actors M. Emmet Walsh and Harry Dean Stanton, and the country China. Some heterogeneous objects may have attributes in common, while other objects might not have any common attributes.

Moreover, in one embodiment, the collection module 312 stores data and/or references to data other than objects from the repository 115. These data are not necessarily structured in the same manner as the objects, or even structured at all. For example, the collection module 312 can store web pages or links to web pages on the Internet received in response to a search.

In addition, the collection module 312 can store data describing attributes of consumer products offered for sale, such as data describing prices and features. Likewise, the collection module 312 can also hold facts extracted from their associated objects.

In one embodiment, the data of collection module 312 describe facts having positions in inherent linear orders. A fact with a position in an inherent linear order is one where the value of the fact has a defined position in an inherent linear sequence. For example, values that are dates have inherent positions in the time dimension.

Likewise, values that are prices, populations, or other numeric values have positions in inherent linear orders based on their numeric values (e.g., from low to high numbers). In contrast, a value that is a name or other arbitrary text string does not have a position in an inherent linear order (except perhaps alphabetical order).

In one embodiment, the inherent order is a predefined order, such as an order of steps associated with a manufacturing process or a decision process. Often, the name of an attribute indicates that its fact has a position in an inherent linear order.

For example, the word "date" in the name "date of birth" suggests that the corresponding value is a date and, therefore, the fact has a position in an inherent linear order in the time dimension.

A storage module 314 stores the collections and/or other data used by the presentation engine 300 for its operation. The storage module 314 acts as a place where other modules in the presentation engine 300 can store and retrieve information.

In one embodiment, the storage module 314 is a logical construct that uses storage allocated from virtual memory on the client device on which the presentation engine 300 is executing. In other embodiments, some or all of the storage provided by the storage module 314 is located on a server connected to the client via the network 104. For example, the collection module 312 can store collections in the storage module 314.

The storage module 314 stores the data describing the collections and references to the objects within the collections in a memory on the client. However, the objects themselves are stored on a server accessible via the network 104 and retrieved from the server when necessary or desired.

A user interface (UI) generation module 316 generates a UI for the user. Generally, the UI allows the user to view and manipulate the objects and/or other data within the collection module 312 or otherwise accessible to the presentation engine 300.

In addition, the UI allows the user to control other aspects of the presentation engine 300, such as executing a search for new objects and designating objects or other data for storage in a collection. In one embodiment, the UI is generated automatically based on preferences established by the user and/or other parameters.

For example, the UI generation module 316 can be configured to automatically generate a timeline, map, and/or other type of graph based on the facts of objects returned in response to a search query. The UI is displayed on a display device of the client and contains buttons, list boxes, text boxes, images, hyperlinks, and/or other tools with which the user can control the presentation engine and UIs.

In one embodiment, the UI generation module 316 includes other modules for generating specific UI elements. These modules include a search interface module 318 for providing a search interface that the user can use to generate searches on the objects in the repository.

In one embodiment, the search interface UI is a text box in which the user inputs textual queries. Other embodiments support different types of queries and query input techniques.

In one embodiment, the search interface module 318 provides search queries to the service engine 114, which executes the query against the repository 115, Internet search engine, and/or other collections of data and returns the results to the object access module 310.

Depending upon the embodiment, additional processing of the query, such as term expansion, is performed by the search interface module 318 and/or the service engine 114.

The query syntax supported by the search interface module 318 and/service engine 114 allows the user to provide terms/words that an object (or fact associated with an object) must contain in order to be returned in response to the query.

For example, a search for the words "Bill Clinton" will return all of the objects having the words "Bill" and "Clinton" in any of their facts. In one embodiment, the query syntax supports synonym expansion, so that a search for "Bill" will also match "William." A user can provide quotes around the query term to force searching for the exact term.

In addition, the query syntax supports type-specific restrictions to limit results to a particular object type domain. For example, a query that includes "type:person" limits results to objects containing facts about a person and a query that includes "type:country" limits results to objects containing facts about a country.

Further, an embodiment of the query syntax allows the user to specify attributes of interest. These attributes are displayed in association with the object. For example, the query "attribute {"date of birth"} specifies the date of birth attribute and the query "attribute {inauguration}" specifies the date of inauguration attribute.

In one embodiment, the presentation engine 300 automatically displays these attributes on a graph upon establishing the collection of objects in response to the query. The presentation engine 300 can be configured to show other attributes on the graph as well.

In one embodiment, the query results are returned ranked by relevance. The syntax allows the user to specify the maximum number of objects to match and thereby allows the user to restrict results to only the most relevant matches.

Further, the syntax allows the user to specify the maximum number of facts to show for an object. For example, the query: Bill Clinton attribute {"date of birth"} type:person/max entities=10, max facts=1 returns the facts of objects for 10 people, and shows only the date of birth fact for those objects.

In contrast, the query: name:"Bill Clinton" attribute {"date of birth"}/max entities=1, max facts=15returns the object for one person--Bill Clinton--and shows his date of birth and up to 14 other significant facts about him.

A linear graph presentation module 320 generates UI elements displaying facts of objects on linear (one-dimensional) graphs. In one embodiment, the linear graph is a timeline and shows dates described by the facts of the objects being graphed. In other embodiments, the linear graph is based on a linear order other than time, such as on price or another numeric value, alphabetical order, etc.

In one embodiment, the linear graph presentation module 320 provides UI tools allowing the user to sort and/or colorize the graphed data to make the results easier to understand. For example, in the search "type: `US president`" the user might decide to display different presidents' events in different colors, to better contrast their life spans and periods of activity. Alternatively, the user might use the module 320 to cause each attribute to take on a different color (consistent across all objects) to better highlight when important events occurred.

Similarly, an embodiment of the linear graph presentation module 320 allows the user to choose the types of facts shown on the linear graph. For example, for a query including "type: country" the module 320 presents a default graph that views important dates such as when countries were founded, when revolutions occurred, when key leaders assumed power, etc.

The user can use the module 320 to switch the view to a different set of facts, such as gross domestic product (GDP), thereby causing the linear graph presentation module 320 to show the GDPs of the countries in the data set in numeric order. In one embodiment, this GDP view contains several entries for each country, for example showing GDPs for several different years and/or GDPs assuming different exchange rates.

A map presentation module 322 generates UI elements displaying facts of objects on maps. The facts of objects specify geographic locations, such as countries, places of births and deaths, etc. either explicitly (e.g., latitude/longitude) or implicitly (e.g., by name).

The map presentation module 322 generates maps showing these geographic locations. The maps may be stored in the storage module 314 and/or downloaded from a server on the Internet or another network 104 when required.

In one embodiment, the UI generation module 316 displays multiple types of graphs (including maps) simultaneously. Thus, the UI generation module 316 can generate a UI that displays dates from an object on a timeline, and simultaneously displays locations from the object on a map.

Likewise, the UI generation module 316 can display multiple instances of the same graph type simultaneously.

Now, let's see how the Presentation Engine described above would translate into, say, a historical graph charting some of the tracked events imported into the Engine.

Now I'll show you Figure 5 and include some of the explanatory text:

FIG. 5 illustrates a linear graph presented by the presentation module 300 according to one embodiment. FIG. 5 illustrates a sample graph that is a timeline 5 10. For purposes of example, assume that the timeline 510 is automatically presented in response to a query on objects in the repository 115 that match the string "Bill Clinton." Further, the query is limited to five results and a maximum of 50 facts.

The timeline 510 has a scale covering 65 years, from 1940 to 2005. The timeline itself is represented by a horizontal line 512, and each year is indicated by a hash mark 514, a short vertical line intersecting the horizontal line. Icons (identified with reference numeral 516) on the timeline 510 identify dates from facts of objects identified in response to the query.

In one embodiment, the icon 516 is a rectangle. A single icon 516 at a location on the timeline indicates that a single fact occurs at the date represented by that location. If there are multiple facts having dates at a single location on the timeline, the icons representing the facts are stacked vertically. For example, in the sample "Bill Clinton" query there are many facts having dates within the year 2000. These facts are represented by a stack 518 of icons at the location on the timeline 510 representing the year 2000.

In one embodiment, each icon 516 on the timeline 510 has a visual characteristic that indicates the source of the fact identified by the icon. The visual characteristic is a shape, color, shading, graphical image, and/or other distinctive characteristic. In another embodiment, the icon 516 is an image associated with the information being displayed, such as a birthday cake for the date of birth or a picture of Bill Clinton for a date associated with Bill Clinton.

In one embodiment, the icons 516 are based on images obtained from source documents, possibly resized. In other embodiments, they are stock images. In addition, a key 520 associated with the timeline 510 describes the meaning of the icons. In FIG. 5, each icon 516 represents a different person associated with Bill Clinton, and the key 520 identifies the person associated with the icon.

Thus, the key 520 references Bill Clinton, Al Gore, Hillary Clinton, William Cohen, and George W. Bush. Adjacent to each name is an icon corresponding to the icon 516 appearing on the timeline 510 for facts associated with that person.

In one embodiment, a user viewing the timeline 510 uses a cursor or other technique to select icons on the timeline and/or perform other manipulations of the timeline. When an icon 516 is selected, the presentation engine 300 displays information about the fact identified by the icon.

In one embodiment, the information is displayed beneath the timeline 510 at a location adjacent to the selected icon. The information includes the name of the object or other source in which the fact is found, the attribute of the fact, and the value for the attribute. In one embodiment, the name includes an embedded hypertext link that the user can use to reference more information from the object or other source of the fact.

The attribute, for example, is "Date of Birth," "Elected," "Inauguration," or "Impeached." The value is the date (e.g., "Jan. 20, 1993") and/or a string of text that contains a date (e.g., "On June 1.sup.st, Congress voted to . . . "). In FIG. 5, icon 516 is selected and the information 520 is "Al Gore" (the name of the object), "Date of Birth" (the attribute for the fact), and "31 Mar. 1948" (the value of the fact).

The presentation engine 300 provides a set of controls 522 that the user uses to manipulate the timeline 51 0. In one embodiment, the controls are formed of four arrows respectively pointing left, right, up, and down.

The user selects the left and right arrows to move along the axis of the timeline 510 and selects the up and down arrows to zoom in/out of the timeline. Other embodiments provide different and/or additional controls.

But we're not done yet. Figure 6 will show you a full implementation. So let's look at Fig 6 along with accompanying text:

FIG. 6 illustrates a linear graph presented by the presentation engine 300 according to one embodiment. The graph of FIG. 6 is a timeline 610 produced in response to the query: TABLE-US-00001 Bill Clinton" attribute{"date of birth"} with a maximum of 16 results and two facts per result.

This query identifies facts associated with 16 objects in the repository 115 that contain the exact text "Bill Clinton" and indicates that the "date of birth" attribute is of interest. In response, the query produces objects for 16 people related to Bill Clinton, including Al Gore, Monica Lewinsky, and Martha Raye (who received the Presidential Medal of Freedom from Bill Clinton in 1993).

Since the query indicates that the "date of birth" attribute is of interest, the presentation module 300 automatically places the facts having this attribute on the timeline 610. The key 620 and icons on the graph show that Martha Raye was born in 1916, and Monica Lewinsky was born in 1973. In addition, the icons show a cluster of people born in the 1940-1950 time period who are contemporaries of Bill Clinton.

Editorial standards