It’ll have language teachers the world over ripping up their vocab books: near-real-time speech conversion from one language to another has just become a reality.9 November 2012 / Paul Marks, senior technology correspondent
Microsoft Research has demonstrated not only how to convert spoken English into Mandarin with just a few seconds’ delay – but also how to output that Mandarin speech in the vocal style of the original speaker.
The technology was demonstrated by Microsoft’s research chief Rick Rashid in Tjianjin, China, on 25 October – but the news has taken a while to trickle out.
Rashid spoke just eight English sentences into the lab’s new speech-recognition, translation and generation system, yet the company reports the Mandarin output wowed a crowd of 2000 students and academics (jump to 7:30 in the video above to hear the output).
The system’s advanced capability stems from a blizzard of improvements at all stages of the speech-to-speech process. Software like Nuance’s Dragon Naturally Speaking have quietly blazed the trail for speech recognition in offices – and now products based on it, like Apple’s Siri iPhone assistant can recognise spoken questions and search for answers on the web. Microsoft’s Kinect has a speech interface too.
While such systems go wrong a lot – typically erring on one out of every four or five words, says Rashid – they now have a better way to recognise what people are saying. Microsoft’s trick is to use a novel neural networking (machine learning) system that reduces word-recognition errors to one in seven or eight. That means the translation engine, Bing Translate, has a far better chance of creating intelligible Mandarin text to feed into the speaking engine.
But the real prize here is the generation of Mandarin speech in a voice like that of the speaker’s: if you can preserve the speaker’s vocal cadence in the translation, their meaning will be more apparent and the conversation will be all the more effective. This was done by having Rashid train a machine-learning algorithm for a full hour, rather than the quick recitation of a stock page of text that software like Dragon Naturally Speaking asks for.
It’s a great start, by the look of things. “In a few years,” Rashid told his audience, who rapturously applauded each line of machine-spoken Mandarin, “we hope we’ll be able to break down the language barriers between people.”
Article from New Scientist