Dec. 29 (UPI) -- Researchers have developed a way to mine electronic government records for evidence of significant historical events.
For journalists and historians, sorting through government records is a tedious but important job. Though many government records aren't released until after a news event has come and gone, once published, the documents can shed new light historical happenings.
Data scientists at Columbia University in New York have developed a way to more quickly and efficiently identify insightful and important government documents.
Researchers tested their new data mining methodology on newly available classified records dated between 1973 and 1977. The records include 1.4 million declassified cables, as well as metadata related to 400,000 documents delivered via diplomatic pouch.
Each document is tagged with codes relevant to its subject matter. UNGA marks documents related to the United Nations General Assembly, for example. Cables related to South Vietnam are coded VS. Finland-related messages are tagged FI.
When plotted along a timeline, these codes reveal spikes corresponding with world events. UNGA spikes correspond with the annual assembly, while a large uptick in VS messages corresponds with the fall of Saigon in April of 1975.
These spikes are rather obvious. For more useful insights, scientists compared the messaging timeline with corresponding background activity -- the news as it was being reported at the time.
The technique allowed researchers to rank the significance of world events, and to identify important events that were missed by journalists at the time.
For example, scientists identified several spikes to government records related to administrative problems, such as transport issues and changes to the visa records system.
"The ranking finds a wide range of other important events, such as the Carter administration's prioritization of human rights, the president of Egypt, Anwar Sadat's surprise visit to Israel in 1977, the Southeast Asian 'Boat People' crisis of 1975-76, the 1973 Yom Kippur War and Portugal's withdrawal from Angola in 1975-76 and so on," as reported by the MIT Technology Review.
The researchers published the results of their data mining analysis online at arXiv.org.