Oct. 24 (UPI) -- Computers are getting better at translation, but they're not yet as capable as humans, who can capture nuance in the language and translate text for different affect, depending on the audience.
To develop more sophisticated translation algorithms, capable of translating for style, researchers at Dartmouth College turned to the Bible.
Because the Bible has been translated so many times, the text and its many derivatives offer machine learning algorithms a uniquely vast dataset from which to learn.
Researchers used the many translations of the Bible's 31,000 verses to produce more than 1.5 million unique translation pairings. The dataset allowed the algorithms to learn how the same text can be translated myriad ways -- each offering a unique style.
"The English-language Bible comes in many different written styles, making it the perfect source text to work with for style translation," Keith Carlson, a doctoral student at Dartmouth, said in a news release.
Because the Bible and its many translations are so expertly indexed, algorithms for aligning text -- to ensure each snippet of translation represents the same verse -- were unnecessary.
Researchers characterized differences in style by training the algorithms to recognize passive or active voices, as well as vocabulary. These factors helped the algorithms recognize different translations as more or less simple or formal.
To help the algorithm recognize the full spectrum of linguistic styles, researchers used 34 stylistically distinct Bible translations, with the King James Bible representing the most complex end of the spectrum and the "Bible in Basic English" representing the least complex end of the spectrum.
After augmenting and training two commonly used machine learning algorithms, utilizing their Bible dataset, researchers translated passages from Moby Dick. The algorithms successfully translated passages from the Herman Melville novel for different audiences, including versions for young readers and non-native English speakers.
Researchers detailed their new algorithms this week in the journal Royal Society Open Science.
"Text simplification is only one specific type of style transfer. More broadly, our systems aim to produce text with the same meaning as the original, but do so with different words," said Carlson.