Computer scientists at Concordia University in Montreal report their system, called BlogSum, has potentially vast applications, gauging things like consumer preferences and voter intentions by sorting through Web sites, examining real-life self-expression and conversation, and producing summaries of the information.
"Huge quantities of electronic texts have become easily available on the Internet, but people can be overwhelmed, and they need help to find the real content hiding in the mass of information," said Leila Kosseim, one of the lead researchers at Concordia's Computational Linguistics Laboratory.
Computer analysis of informally written language poses unique challenges, the researchers said, because blogs, forums and the like contain opinions, emotions and speculations, not to mention spelling errors and poor grammar.
BlogSum uses "discourse relations" to crunch the data, they said, filtering and ordering sentences into coherent summaries.
This study is an example of Natural Language Processing, which combines artificial intelligence and linguistics to enable computers to derive meaning from human language.
"The field of natural language processing is starting to become fundamental to computer science, with many everyday applications -- making search engines find more relevant documents or making smart phones even smarter," Kosseim said.