What's different about website translation?

Website translation and standard document translation have some very clear differences. Some are obvious and some quite obscure, but they all need to be accounted for in order to create an effective web presence.

Translating within a website file format

The original coding for the web is HTML, but there are a lot of other coding languages and systems out there (.asp, XML, .jsp, Flash etc.). The major difference between translating websites and translating normal text documents is that the translator needs to be able to work within the source code of the web page. The translator should have a good understanding of web page construction in order to be able to edit within, and protect, the source code. 

The alternative is double handling translated copy; translating content in a standard text document and then having a second person (the webmaster) edit it into the HTML. This of course opens the process to a variety of problems: operator error (cut and paste in the wrong place), operating system and application version incompatibilities leading to data corruption, not to mention basic inefficiency in terms of time and cost.

For this reason we only work in the source code, provided we also have access to the published content in the source language so we can have an overview of the context and layout.

Language encoding and character sets (charset)

On the web each language has it's own character set (called "charset") and a language encoding (typically signalled by the "html lang" tag). Essentially a language tag tells the browser what language to expect ("de" for German or "kr" for Korean), while the charset tag tells the browser what particular system of encoding is used for that language.

Some languages have multiple ways to encode the same characters. For example, Japanese can be represented by one of the following three encodings:

shift JIS



Each is a different system for encoding the Japanese characters and should work on most browsers, but you could not use these to also encode Korean or Thai.

There is an alternative called Unicode. The goal of the Unicode system is to have a unique code for ALL characters in all languages. The most familiar user of Unicode is Google; Google uses Unicode (UTF-8) as its charset. For multilingual websites we usually recommend Unicode for its simplicity and robustness. For an excellent description of the Unicode system and how it works please see Alan Wood's excellent Unicode site