Plain text files - in most of the cases with a .txt
extension - contain exclusively textual information. There is no clearly
defined way to inform the computer which language they contain. In (very)
simple terms, that means the computer will per default assume the text is
written in the same language the computer itself uses.
If you are Russian, it is very likely that your computer works in Russian too: the menus are in Russian, the files you open will be in Russian etc. In most cases, the computer makes the right assumption regarding the Contents of files in general: they all contain Russian and nothing Russian characters could not display.
Now, if you are a Russian translator who translates from Japanese, the Japanese files you will get, if they are plain text files, will most probably be considered by the computer to be files containing Russian. Because there is no information in the file itself that indicates to the computer in which language they are written. The Japanese file contents could be:
OmegaTとは、コンピュータを利用した翻訳ツールです。
Because it expects the contents to be Russian, your text editor could very well display it like this:
OmegaTВ∆ВЌБAГRГУГsГЕБ[Г^ВрЧШЧpµšЦ|ЦуГcБ[ГЛВ≈ВЈБB
However, it is far from Russian, it is Japanese characters wrongly displayed as Russian characters.
As any other application, OmegaT is subject to this problem too. It can only assume that per default plain text files can be displayed using the system defaults. That works well when the computer works in French for instance and the files are in English, or when the computer is German and you deal with get Italian files.
Why would that work with English and French but not with Russian and Japanese? Because English and French share a common character set. Namely Latin-1, or some variation of it. Until recently, Russian and Japanese have not shared any character sets. Most current Russian characters sets do not cover Japanese and vice versa. The result is what you have seen above.
The Japanese client works with a Japanese computer and creates text files that contain Japanese. The character set selected by the client computer will depend on the operating system and on other settings, but it is highly unlikely that the chosen (Japanese) character set will be correctly interpreted by the Russian computer.
How the textual information in the specified character set is physically transmitted (i.e. what are the numeric codes the computer uses to interpret and display text) depends on the encoding. When the computer reads the file, it "decodes" the information according to the encoding and displays it according to the character set. Roughly speaking, one encoding corresponds to one character set...
There are basically three ways to address this problem in OmegaT. They all involve the application of file filters in the Options menu.
.txt
extension - : in the Text
files section of the file filters
dialog, change the Source File Encoding from
<auto> to the encoding that corresponds to your
source .txt
file..txt
to .jp
for Japanese plain texts for
instance): In the Text files section of the file filters dialog, add a new Source
Filename Pattern (for example *.jp)
and select
the appropriate parameters for the source and target encoding..txt
to .utf8
.OmegaT will automatically interpret
the file as a UTF-8 file.OmegaT has by default the following short list available to make it easier for you to deal with some plain text files:
.txt
files are automatically
(<auto>) interpreted by OmegaT as being encoded in
the computer's default encoding..txt1
files are files in ISO-8859-1, covering most
Western Europe languages..txt2
files are files in ISO-8859-2, that covers most
Central and Eastern Europe languages.utf8
files are interpreted by OmegaT as being encoded in
UTF-8 (an encoding that covers almost all languages in the
world).You can check that yourself by selecting the item File
Filters in the menu Options. For example, when you
have a Czech text file (very probably written in the
ISO-8859-2 code) you just need to change the extension
.txt
to .txt2
and OmegaT will interpret its
contents correctly. And of course, if you want to be on the safe side,
consider converting this kind of files to Unicode, i.e. to the .utf8 file
format.
Legal notices | Home | Index of contents |