"What we wanted to do was translate the whole web into every major language," says Luis von Ahn, cofounder of Duolingo. For all its vaunted accessibility, the web is still fundamentally fractured along the fault lines of language."Right now a very large fraction of it is in English. If you don't know English you can't access it. But there are fractions in other languages too," he says. That means, for example, news reports from around the globe, or great works of literature and scholarship unavailable to those without the necessary language skills.
When von Ahn began thinking about translating the entire web, he naturally considered letting the computers do it. But machine translation just isn't very good and, in his estimation, probably won't be for another 20 to 30 years. Computers alone wouldn't be enough. Instead, he turned to a familiar approach: crowdsourcing. After all, he'd already developed projects reliant on communities of data-processing users; one of those became Google Image Labeler, helping to index the search giant's image collection, and another became reCAPTCHA, which helps digitize books.
"We realized we're going to need people to help us translate the whole web."
But crowdsourcing translation seemed like different kind of problem. Almost anyone could provide image keywords or decipher a distorted bit of text. That didn't require specialized knowledge. Translation, though, demands bilingual fluency, which dramatically narrowed the size of his potential crowd. And even if you could reach those potential translators, what incentive would they have to contribute? "We realized we're going to need people to help us translate the whole web," he says. A lot of people. And they would need a motivation to keep translating.
Duolingo originated as a way to solve both of those problems by appealing to the more than one billion people who want to learn another language: they could learn as they translated. The site, which von Ahn founded with one of his Carnegie Mellon PhD students, Severin Hacker, goes public today after extended testing in private beta. Its 125,000 active users have already translated about 75 million sentences according to the company, and since launching in late 2011 von Ahn and his team have been tweaking the system to keep translators engaged. "We're serving two masters here, he says, "We're trying to get the translations, but we're trying to get people to actually learn a language, otherwise they won't come back."
Here's how it works. A would-be translator chooses a language, either English, German, French, or Spanish (with Portuguese, Italian, and Chinese "coming later"). Introductory lessons explain the basics — the use of masculine and feminine nouns in Spanish, for example — and then users begin building a bilingual vocabulary. Novices get simpler sentences (like the much-beloved "Where is the library?"), while more fluent users get more complex tasks. Duolinguists also rate one another's work; not everyone can provide great translations, von Ahn admits, "but if you get five to ten people doing the same sentence, one's going to be really good." Iteration and feedback is often a key to successful crowdsourcing, but, again, Duolingo isn't just about a building a better translation. It's also about making better translators, by actually teaching users a new language.
That was a big unknown when von Ahn and his team began the project. Could people really learn while translating? Would it be satisfying enough to keep them motivated? The private beta gave the Duolingo team time to tinker with its approach. "We've been tuning the system to find the best place both for the translations and for the language learning," he says.
"If you get five to ten people doing the same sentence, one's going to be really good."
Opening to the public will be a new test, as Duolingo had kept fairly tight control over the private beta. Asked about any potential problems, von Ahn says,"We have to be more careful with people trying to poison the system. We actually haven't had any trouble with this at all." The fix for misbehaving individuals, of course, is more use of the crowd; in this case that means having more people checking and translating the same sentence. With enough iteration and feedback, good translations should win out.
And more users means the potential to translate more websites. Right now the focus is on Creative Commons sites, but von Ahn suggests commissioned translations as one way of keeping Duolingo free for users. If Duolingo had a million users, von Ahn estimates, it would take just 80 hours to render the entire English Wikipedia into Spanish. That may not be the whole web, but it'd be a pretty good start.