Home > Summit blog > Multilingualism and the Commons

Summit blog

Multilingualism and the Commons

Wojciech Gryc | Sapporo | Local Context Global Commons
July 31, 2008 8:10 AM
woj_flags.jpgIn a web of open content and collaboration, it is often assumed that simply setting your information "free" is all you need to promote it to the world. Often times -- especially on an international level -- this takes much more effort. Such was the discussion of the Open Content, Open Translation: Multilingual Solutions session of the Local Context, Global Commons lab.

The session itself was a series of presentations, with Chris Salzberg moderating and focusing on the difficulties of working in a multilingual web. Leonard Chien presented on Chinese translation on the Global Voices site, and Hanako Tokita did the same for Japanese. Kyo Kaguera talked about his QRedit tool for translating English texts to Japanese, and Mohammad Daoud presented how machine translation is being used by the Digital Silk Road (DSR) project. Shun Fukazawa finished with a presentation of the Japanese Wikipedia and Wikia sites.

While the discussions focused on diverse topics, many themes came up on a regular basis, and everyone agrees that the commons is a multicultural, multilingual environment. While statistics are difficult to get, it appears that less than a third of the web's users use English as a first language, and only a third of all websites are in English. Unfortunately, building a multilingual web is more complex than simply using an automated translation service. Computers have yet to understand local contexts, cultural references, and do not have a proper grasp of grammar.

One example of a multilingual project is Global Voices' Project Lingua, which translates blog posts on the main GV website into 15 different languages. This is done through a global network of volunteers who choose posts of interest and then translate them. This is an important aspect of the GV project, as most posts on the site focus on politics and the translation allows people to share news and opinions from multiple viewpoints, cultures, and regions. A similar process is employed by language-specific Wikipedia groups, such as the Japanese-lanaguage Wikipedia group.

Translation is extremely difficult, especially in a distributed context. For example, when translating from English to Chinese, one has to decide whether Traditional or Simplified Chinese will be used. Furthermore, a volunteer from Taiwan may use different characters or metaphors to describe events than a volunteer from Beijing. As such, volunteer management is often more structured and complex than one would initially assume.

User interfaces are a key challenge for effective translation. While open source packages like MediaWiki and Wordpress have support for multilingual interfaces or posts, the translation process itself requires more than just a dictionary and somewhere to type. Translators often use sites like Wikipedia or extensively search the web to understand the cultural contexts of certain sayings and metaphors. For example, how would you translate "It was raining cats and dogs" into Swahili -- and more importantly, how would an African volunteer figure out that this is a metaphor for "heavy rain"?

This brings up a difficult problem in user interface design: How can sotware developers build user interfaces for translation that not only provide support for using dictionaries and typing text, but actually help search for the meanings of analogies, supported fonts, verb conjugations, and other language-specific features? For example, how would a translator converting an English text to Japanese know when to use a formal (polite) or informal verb conjugation, when the original writer never even had to consider such a choice? Luckily, tools like QRedit are already trying to solve the problem.

The case for openness is extremely strong. Good dictionaries are rarely free or open, making it difficult to build free high quality tools. This is especially true for domain-specific terminology, such as with medical or scientific concepts. Building such an open dictionary would require a large amount of volunteers, including native speakers, domain experts, and linguists.

And with all the focus on human translation, communities of translators, and user interfaces for multilingual websites, there is still a case for machine translation. Human translation is often expensive and time consuming - translating time sensitive materials or large swaths of the web is often impossible, and a poor software-based translation is still better than no translation at all. The Digital Silk Road, for example, has been working with machine translation tools to help its translators translate archived content. More importantly, this is being done by simply using the numerous multilanguage articles found in Wikipedia -- a freely and openly available resource.

It is such a ecology of collaboration that allows the commons movement to feed and grow off of itself. The Japanese Wikipedia volunteers and Global Voices Project Lingua build multilingual websites, while machine translation tools like those of the DSR can then use these to help approximate translations for new content. To perfect these translations, one can use some of the new user interfaces and plugins being developed, or build new communities based thereon. While challenges to this cycle do exist, they are not insurmountable.


0 TrackBacks

TrackBack URL for this entry :
https://icommonssummit.org/cgi-bin/mt/mt-tb.cgi/155

Leave a comment

Category Archives

Monthly Archives

Blog Tag