X

Innovation

Home Innovation Artificial Intelligence

AWS makes Textract generally available for extracting text from documents

Amazon says no machine learning expertise is needed to use the to use the service, which automatically extracts text and data from tables or forms.

Written by Stephanie Condon, Senior Writer May 29, 2019 at 4:02 p.m. PT

Amazon Web Services on Wednesday announced the general availability of Textract, a fully managed service that uses machine learning to automatically extract text and data, including from tables and forms. Textract was one of multiple AI-powered tools and services unveiled at last year's AWS re:Invent conference that requires no machine learning expertise to use.

Special Feature

Special Feature: Managing AI and ML in the Enterprise

This ebook, based on the latest ZDNet / TechRepublic special feature, advises CXOs on how to approach AI and ML initiatives, figure out where the data science team fits in, and what algorithms to buy versus build.

Typically, companies use optical character recognition (OCR) software to extract text and data from files like contracts, tax documents, expense reports or patient forms. However, traditional OCR technologies can't recognize common layouts like forms and tables. They consequently generate a lengthy and often inaccurate text dump.

Also: Top cloud providers 2019: AWS, Microsoft Azure, Google Cloud; IBM makes hybrid move; Salesforce dominates SaaS

Top Cloud Providers

Cloud computing technology

Top cloud providers: AWS, Microsoft Azure, and Google Cloud, hybrid, SaaS players

Here's a look at how the cloud leaders stack up, the hybrid market, and the SaaS players that run your company as well as their latest strategic moves.

By comparison, AWS has called Textract an OCR ++ service. It can, for instance, see a document with a table and recognize that the data belongs in rows and columns. "It's able to identify there's a table and able to lay out for you what that table should look like so you can use and read that data," AWS CEO Andy Jassy said at re:Invent.

Textract's API supports multiple image formats including scans, PDFs and photos, and customers can use it with database and analytics services like Amazon Elasticsearch Service, Amazon DynamoDB and Amazon Athena. They can also use it with other machine learning services like Amazon Comprehend, Comprehend Medical, Amazon Translate or Amazon SageMaker.

Customers using the service already include The Globe and Mail, PwC, Healthfirst, UiPath, Teradact, Ripcord, BluePrism and Alfresco.

Textract is now available in the US East (Ohio) region, US East (N. Virginia), US West (Oregon) and EU (Ireland). AWS will bring it to additional regions in the coming year.

See inside an Amazon fulfillment center where many of the workers don't get paid

Amazon

Editorial standards

Show Comments

Related

ai45gettyimages-1010594410

Meet Amazon Q, the AI assistant that generates apps for you

Vyond's video generator adds AI that businesses will love. Try it for yourself

Yelp AI Assistant

Yelp's new AI assistant can help you find service pros for all your spring projects