Xerox is expected to announce new document processing software this week
that could improve the efficiency of content management systems.
According to a statement from the company, scientists from Xerox Research
Center Europe will announce new software Thursday that can examine the contents
of an electronic document and then classify it by subject.
The software, which Xerox intends to license to other
technology companies, could be used to automatically route documents into a
content management system. Content management is a fast-growing
category of business applications that store and catalog corporate text, ranging
from e-mail messages to regulatory filings.
Xerox's categorizing software could improve the efficiency of such systems by
automating the storage of documents and making it easier for workers to find the
document they need. The system uses a hierarchical method that recognizes
relationships between one category and another.
"A misshelved book in a library might as well be lost," Xerox researcher Eric
Gaussier said in the statement. "It's the same with documents that haven't been
properly categorized; the document itself may have to be re-created...Our new
software...will ensure that documents are properly classified for future
retrieval and that the right information gets into the right hands as quickly as
The technology could also be used to automatically route e-mail messages to
the correct person in an organization, Xerox said.
The software uses machine-learning techniques to minimize setup and to
recognize new categories of documents as they emerge, Xerox said. The Java-based
code can parse documents in more than 20 languages and work with systems based
on Unix, Linux and Windows.