NSA issues guidance on redacting Word, PDF

By SHAUN WATERMAN, UPI Homeland and National Security Editor

WASHINGTON, Jan. 23 (UPI) -- The National Security Agency has issued technical guidance for U.S. officials on redacting or editing sensitive documents for release following a series of embarrassing incidents in which so-called metadata stored in electronic formats like Microsoft Word or Adobe PDF files has been accidentally exposed.

Both types of files are "complex, sophisticated computer data formats," reads the guidance document produced by the NSA's Information Assurance Division, which is responsible for the integrity of U.S. government computer networks.


The document, called "Redacting with confidence: How to safely publish sanitized reports converted from Word to PDF," says that these files can "contain many kinds of information, such as text, graphics, tables, images, (and) meta-data."

Metadata is information associated with the file, like a note of the author and the date the file was created.

This "complexity makes (documents in these and other formats) potential vehicles for exposing information unintentionally, especially when downgrading or sanitizing classified materials," the NSA concludes.

Although the document, dated December 2005 and posted on the Web site of the Federation of American Scientists last week, provides no concrete illustrations, there were at least two occasions last year when exactly such unintentional exposure of U.S. official documents took place.


Reporters checking the metadata for the 35-page "National Strategy for Victory in Iraq" that President Bush unveiled last November found its author to be a National Security Council adviser named Peter Feaver.

Feaver is a Duke University political scientist who was recruited to join the White House staff in June 2005 after he and several colleagues presented the administration with an analysis of polls about the Iraq war.

Another kind of metadata is the so-called "undo stack," a list of every editing change made in the file, saved by the program so that they can be reversed using the "undo" function.

On April 30 last year, U.S.-led coalition forces in Baghdad posted on the Web a redacted version of their report into the shooting death of Italian special agent Nicola Calipari at a checkpoint on the city's notorious airport road.

Calipari was shot by U.S. troops when the car he was in approached the checkpoint. He was escorting freed Italian journalist Giuliana Sgrena, who had been held hostage by Iraqi insurgents.

Military officials redacted key information about checkpoint procedures and events on the night in question from the report before posting it to the Web. But a few clicks of the mouse was all it took to restore the redacted parts.


"The key concept for understanding the issues that lead to the inadvertent exposure is that information hidden or covered in a computer document can almost always be recovered," says the NSA.

In the case of the Calipari report, the officers who prepared it apparently believed that when a document was converted to a PDF format, the "undo stack" disappeared.

"It was believed that once a document was converted to a PDF, it would not be able to be reversed [to] allow the information to be viewed," Army Lt. Col. Steven Boylan, who led the post-mortem into the accidental release told Government Computer News last year.

In actual fact, as the NSA document says, "numerous people have learned to their chagrin, merely converting a Microsoft Word document to PDF does not remove all metadata automatically."

Indeed, because there is software designed to make the two programs work together, the "undo stack" and all the file's other metadata is copied over into the new format.

The document goes on to say that Microsoft Word is "used throughout the (Department of Defense) and the Intelligence Community for preparing documents, reports, (and) notes," whereas "Adobe PDF is used very extensively by all parts of the U.S. Government and military services for disseminating and distributing documents ... over computer networks and the Internet ... PDF is often used as the format for downgraded or sanitized documents."


In other words, the kind of file conversion that was so disastrously misunderstood in the Calipari case is quite common.

Boylan told Government Computer News that in the future documents would be redacted physically and then scanned into PDFs so that inadvertent exposures of classified material would happen again.

But according to the NSA, that is not now necessary.

"The way to avoid exposure," the document says, "is to ensure that sensitive information is not just visually hidden or made illegible, but is actually removed from the original document."

The final version of the original document is then copied and pasted into a brand new document (thus purging the "undo stack" and other potentially sensitive metadata), and then finally converted into a PDF.

Latest Headlines


Follow Us