If you need to absorb the information in a long document, then ideally you should read all of it. An executive summary will give you an overview and list the key points, but to get the full picture you really need to read every word. That’s fine in theory, but in the real world time is short -- hence the need for executive summaries in the first place. If an important document doesn’t have one of these, Corpora Software's Summarize! hopes to plug the gap, by using linguistic analysis to produce a summary automatically.


The software works with plain text, Microsoft Word, PDF and HTML documents. As well as installing its own client software, Summarize! places toolbar icons in Word, Internet Explorer and Outlook, providing quick access to summaries from within these applications. When used with email, it produces not only a summary of the selected email, but also of any attachments. However, if you use another browser (Mozilla Firefox for example), or a non-Microsoft word processor, you don’t get the shortcut.

Within supported applications, selecting the Summarize! toolbar icon initiates the creation of a summary of the current document, email or Web, which then appears in the Summarize! application window. From within Summarize! Itself, you can select documents using a file browser or Web sites by typing their URL; you can also paste in copied text.

According to its help file, Summarize! works by first extracting the main themes of a document, and then analysing each sentence to see how well they relate to the themes, ranking them and choosing those that best reflect the themes. The software can identify different words that relate to the same concept, and, importantly, includes rules to ensure that its output flows as naturally as possible.

You can set a variety of preferences, including on-screen fonts and colours, and the length of summaries (either by specifying the number of words or selecting a percentage of the original document). You can also configure different profiles -- lists of terms that the software will use to skew its summaries in a specific direction -- and switch between them as required.

Summaries can be saved as XML, HTML or plain text, and can be emailed from within Summarize! as well as printed. You can choose to view just the summary, or the original document with the summary sections highlighted. The latter option could be useful if you want to use a summary as a guide to reading an entire document. Similarly, the software can identify a list of key terms that occur within a document, and then highlight these in the summary.


The most important thing about a tool like this is that it functions well enough to give you absolute confidence in it. There's no point relying on it as an everyday tool if the summaries it produces are off the mark.

Our tests were tough, but they replicate the kinds of conditions under which we feel Summarize! will be used in the real world. First, we asked it to produce a ten percent summary of a 33-page white paper in PDF format: Mobile Working: a Buyer’s Guide for SMEs.

Several aspects of the summary concerned us. For example, it dealt poorly with information stored in tables, extracting row headings and contents but failing to relate them to one another. It missed out some sections of the document, deciding that they were not relevant. A pair of case studies with their own section heading in the original document were included in the summary but the heading was not, so they seemed out of context.

We tried again with a second PDF document, this time over a hundred pages long, containing numerous tables as well as some charts and graphs. This document was structured ‘report style’ with a hierarchy of numbered paragraphs and an executive summary. Again we asked for a ten percent summary.

As before, the summary dealt poorly with content stored in tables. It also failed to identify the executive summary as such, missed noting the end of the summary and start of the main document, and, despite the very orderly nature of the paper, did not use the numbered paragraph system as a guide to its own summary production. As a result of these failings, the inherent structure and continuity of the summary was compromised.

We asked for a 30 percent summary, which resulted in the inclusion of more, but not all, of the numbered paragraphs from the original. A 30 percent summary of the buyer’s guide was also more useful, but it still missed section headings and other information that's crucial for understanding the document in context.

Next we asked for a 30 percent summary of a ZDNet product review and got a more positive result based around what was essentially straight text with just a couple of headings. Nuances from the original document were lost, but the summary itself was cogent and readable.

We completed these tests without producing a profile based on specific keywords. These will skew a summary to include terms you prefer, and so function as a sort of search tool. However, profiles will not deal with the general operational issues raised above.

Some obvious usability features have not been included in the software. On the plus side, the Summarize! creates a list of what it considers to be a document’s key terms; if you select one it is highlighted in the document summary. But it's a pity you can’t easily toggle between summary and full document to see the summary information in context, or add a key term to a profile with a simple right click.

Following our tests, the success or otherwise of Summarize! will depend on the type of document you present it with, and what you expect it to produce.

The software seems to deal poorly with tabulated information, to ignore graphs and charts, and to have trouble maintaining the continuity provided by document headings. These are all things that come easily to a human when skimming documents. In this respect, Summarize! is no match for a quick run-through of a document yourself, armed with that low-tech solution -- a highlighter pen.

Based on our tests, we would not like to rely on Summarize! to produce reliable executive summaries of important documents. However, it could prove useful for searching longer documents for specific information aided by user-defined profiles.

