Protecting sensitive information in PDF documents

Adobe's Lori DeFurio shares how you can share confidential electronic documents with peace of mind.
Written by Lori DeFurio, Contributor on
With the increasing exchange of e-documents via e-mail and the Internet, it is important to ensure that the exchange of documents is secure. When companies and government bodies accidentally share confidential or sensitive information, not only would it result to financial loss, it could also damage their credibility.

Therefore, before you distribute a PDF, you may want to examine the document to check if there is any sensitive content or private information that you do not wish to share with your recipients.

There are features in Acrobat 8 Professional that will enable you to remove sensitive or confidential information (that you may not even know exist) in PDF files. In PDF files with text, words and phrases can be searched for and automatically marked for redaction; information in image-only PDF files can be manually selected for redaction. When Acrobat 8 applies redaction, the marked text or selected image areas are permanently and securely removed from the PDF file.

You can use the Examine Document command (available also in the Standard format) to find and remove hidden content from a PDF document. The new examine document feature can scan through your document and alert you to hidden information that you may not be aware of, including hidden text, metadata, comments, file attachments and other elements.

Redacting sensitive content
Redaction, by definition, means removing information from documents. In the old days of paper, xacto knives were used to cut text from the paper and it was then photocopied with a black sheet of paper behind it. More recently, in the current paper world, companies will go through the document with a black marker to 'erase' content that are private and confidential.

In the electronic world, we can mark text or select image areas for permanent and secure removal from the PDF file. Acrobat 8 Professional's new redaction and metadata removal tools will help mitigate the risk of unintended disclosure of information while submitting private and confidential documents to anybody such as vendors, or worst--opposing counsel or to the courts.

The Redaction tool let you permanently remove (redact) visible text and images from PDFs. In place of the removed items, you can have redaction marks that appear as colored boxes, or you can leave the area blank. You can specify custom text or redaction codes to appear over the redaction marks.

Note: If you want to locate and remove specific words or phrases, use the Search And Redact tool instead.

Text marked for redaction (left), and redacted (right)

How do I redact sensitive information in a document?
1. Choose View > Toolbars > Redaction.
Optional: To set the appearance of redaction marks, click Redaction Properties.

2. Select the Mark For Redaction tool.

3. Mark items you want to remove by doing any of the following:

  • Double-click to select a word or image.
  • Press Ctrl/Control as you drag to select a line, a block of text, an object, or an area. Use this method to select areas of a page in a scanned document.

To preview how your redaction marks will look, hold the pointer over the marked area.

4. To redact the marked items, click Apply Redactions in the Redaction toolbar.

5. Click OK to remove the items. The items aren’t permanently removed from the document until you save it.

Alternatively you could use the Search & Redact tool to find and remove words or phrases in one or more PDFs that contain searchable text.

Note: The Search & Redact tool doesn’t search secured (encrypted) PDFs.

1. Choose View > Toolbars > Redaction, and select the Search And Redact tool.

2. In the text box, type the word or phrase you want to remove.

3. Specify if you want to search the current PDF or PDFs in another location.

4. Select Whole Words Only and Case-Sensitive if you want to apply these conditions to the search.

5. Click Search And Redact.

6. In the search results, click the plus sign next to the document name to see all occurrences of the word or phrase. Then, select the occurrences you want to mark for redaction:

  • To select all occurrences on the list, click Check All.
  • To select individual occurrences, click the check box for each one you want to redact. Click the text next to a check box to view the occurrence on the page.
  • To mark none of the occurrences, close the Search window or click New Search to start over.

7. If you selected occurrences that you want to mark for redaction, click Mark Checked Results For Redaction.

The Search window closes, and the items you checked on the list are shown marked for redaction.

If you haven't saved the file, you can select redaction marks in the document and press Delete to remove the redaction mark. The redaction marks become permanent after you save the file.

8. To remove the marked items, click Apply Redactions in the Redaction toolbar, and then click OK. The items aren’t permanently removed from the document until you save it.

9. If you want to search for and remove hidden information in the document by using the Examine Document feature, click Yes. Otherwise, click No.

10. Choose File > Save, and specify a filename and location. If you don’t want to overwrite the original file, save the file to a different name, location, or both.

Redaction limitations, issues and best practices
Keep in the mind the following when taking on projects that require redaction:

1. After you Apply Redactions, the Examine Document function will appear. This allows you to find additional hidden text, metadata, etc. and remove it.

2. You will be prompted to rename the document when you choose File—>Save. Acrobat does not rename the document for you, so exercise care.

3. Redactions must be applied individually, to each document.

4. Search and Redact will only find text in searchable documents. Use Document OCR Text Recognition to prepare the file first. You can batch OCR in Acrobat Professional.

5. Search and Redact does not offer pattern matching. e.g. number strings, etc.

6. Carefully review all documents prior to submission in discovery. A two-person review team will catch many more errors than a single person.

Examining a PDF for hidden content
If you want to examine every PDF for hidden content before you close it or send it in email, specify that option in the Documents preferences (choose Edit > Preferences [Windows] or Acrobat > Preferences [Mac OS], and select Documents on the left).

1. Choose Document > Examine Document
If items are found, they are listed in the Examine Document dialog box with a selected check box beside each item.

2. Make sure that the check boxes are selected only for the items that you want to remove from the document which could include:

Metadata includes information about the document and its contents, such as the author's name, keywords, and copyright information, that can be used by search utilities. To view metadata, choose File > Properties.

File Attachments
Files of any format can be attached to the PDF as an attachment. To view attachments, choose View > Navigation Panel > Attachments.

Annotations And Comments
This item includes all comments that were added to the PDF using the comment and markup tools, including files attached as comments. To view comments, choose View > Navigation Panel > Comments.

Form Field Logic Or Actions
This item includes form fields (including signature fields), and all actions and calculations associated with form fields. If you remove this item, all form fields are flattened and can no longer be filled out, edited, or signed.

Hidden Text
This item indicates text in the PDF that is either transparent, covered up by other content, or the same color as the background. To view hidden text, click Preview. Click the double arrow buttons to navigate pages that contain hidden text, and select options to show hidden text, visible text, or both.

Hidden Layers
PDFs can contain multiple layers that can be shown or hidden. Removing hidden layers removes these layers from the PDF and flattens remaining layers into a single layer. To view layers, choose View > Navigation Panel > Layers.

Bookmarks are links with representational text that open specific pages in the PDF. To view bookmarks, choose View > Navigation Panel > Bookmarks.

Embedded Search Index
An embedded search index speeds up searches in the file. To determine if the PDF contains a search index, choose Advanced > Document Processing > Manage Embedded Index. Removing indexes decreases file size but increases search time for the PDF.

Deleted Hidden Page And Image Content
PDFs sometimes retain content that has been removed and which is no longer visible, such as cropped or deleted pages, or deleted images.

Click Remove All Checked Items to delete selected items from the file, and click OK.

Note: When you remove checked items, additional items are automatically removed from the document: digital signatures; document information added by third-party plug-ins and applications; and special features that enable Adobe Reader users to review, sign, and fill in PDF documents.

Choose File > Save, and specify a filename and location. If you don't want to overwrite the original file, save the file to a different name, location, or both.

The selected content is permanently removed when you save the file. If you close the file without saving it, you must repeat this process, making sure to save the file.

Lori DeFurio is a developer evangelist on Adobe Systems' Acrobat Team.

Editorial standards