why the glossary can not be used properly (OmegaT support)

Technical forums » OmegaT support »
why the glossary can not be used properly
Track this topic

why the glossary can not be used properly

Thread poster: frankleng

frankleng
China
Local time: 17:52
English to Chinese
+ ...

Mar 24, 2010

i learnt how to create a glossary for OmegaT.
However,the glossary from english to chinese can be identified by OmegaT.
But,the glossary from chinese to english can not, why? Any settings to be made in OmegaT??
Anyone can tell me about this,pls? Thank you very much.

Didier Briel

France
Local time: 11:52
English to French
+ ...

Chinese requires the use of tokenizers

Mar 25, 2010

frankleng wrote:
However,the glossary from english to chinese can be identified by OmegaT.
But,the glossary from chinese to english can not, why? Any settings to be made in OmegaT??
Anyone can tell me about this,pls? Thank you very much.

By default, OmegaT uses Sun's tokenizer to identify words. While it works for a lot of languages, it doesn't work for Chinese, which means only isolated words (not in a sentence) can be found.

To improve the result, it is possible to install a specific tokenizer.
I recommend reading Marc Prior's Howto on OmegaT tokenizers.

Once you have the plugin installed, you can use the
org.omegat.plugins.tokenizer.LuceneChineseTokenizer
tokenizer.

This should improve word recognition.

If you need further help, I recommend subscribing to the OmegaT Yahoo support group.

Didier

frankleng
China
Local time: 17:52
English to Chinese
+ ...

TOPIC STARTER

thanks

Mar 25, 2010

thank you very much.

frankleng
China
Local time: 17:52
English to Chinese
+ ...

TOPIC STARTER

you are right,but new problem occurs

Mar 25, 2010

Didier Briel wrote:
--------------------
By default, OmegaT uses Sun's tokenizer to identify words. While it works for a lot of languages, it doesn't work for Chinese, which means only isolated words (not in a sentence) can be found.

To improve the result, it is possible to install a specific tokenizer.
I recommend reading Marc Prior's Howto on OmegaT tokenizers.

Once you have the plugin installed, you can use the
org.omegat.plugins.tokenizer.LuceneChineseTokenizer
tokenizer.

This should improve word recognition.

If you need further help, I recommend subscribing to the OmegaT Yahoo support group.

Didier

Your method works. However, the menu and context can not be shown properly. Nonsense characters are shown in chinese.
Do you know how OmegaT use fonts and how to change it to have it shown properly,pls? Or do something.

frankleng
China
Local time: 17:52
English to Chinese
+ ...

TOPIC STARTER

thanks, solved,

Mar 25, 2010

Thanks, the problem is solved now.
And, here, i'd like to record it so that the next green hand can know how to deal with it.

I run the program OmegaT-tokenizers.sh under terminal. (just change the property to make it executable),doulbe click and run.
Then, i saw this under terminal:
------------------------------
18152: Info: OmegaT-2.0.5_2 (Fri Mar 26 07:11:56 CST 2010) Locale zh_CN
18152: Info: Java: Sun Microsystems Inc. ver. 1.6.0_10, executed from '/usr/lib/jvm/java-6-sun-1.6.0.10/jre' (LOG_STARTUP_INFO)
18152: Info: Docking Framework version: 2.1.4
18152: Info: Hunspell loaded successfully from /home/frank/OmegaT/./native/libhunspell-i386.so
18152: Info: Event: application startup (LOG_INFO_EVENT_APPLICATION_STARTUP)
--------------------------------------------

I found that the Java program OmegaT-Tokenizer use is located in the system directory /usr/lib/jvm/java-6-sun-1.6.0.10/jre, but not OmegaT//lib/jre.
So,I created a folder called "fallback" under /usr/lib/jvm/java-6-sun-1.6.0.10/jre/fonts/, and copied a font file called uming.ttc into it.

Restart OmegaT-tokenizers.sh, it works. Chinese are recognized and the glossary too.

Thank you,Didier. Your idea very helpful.

[修改时间: 2010-03-25 23:51 GMT] ▲ Collapse

Didier Briel

France
Local time: 11:52
English to French
+ ...

Maybe a font issue

Mar 26, 2010

frankleng wrote:
Your method works. However, the menu and context can not be shown properly. Nonsense characters are shown in chinese.

What do you call "context"? Is it the Fuzzy Matches pane?

Is this situation a result of using the tokenizer, or was it the case before?

Do you know how OmegaT use fonts and how to change it to have it shown properly,pls? Or do something.

If you are speaking of the font for the "content" (i.e., Editor, Fuzzy Matches, etc.), it can be selected in Options/Fonts...

You must select a font compatible both with your source and your target language.

For the menu, the font is selected automatically according to the user interface language used, and should always be able to display the required characters.

I have no problem here, either with zh_CN or zh_TW, with or without the tokenizer.

Again, for more detailed answers and explanations, you should go to the Yahoo support group, where you are more likely to find other users with similar configurations.

Didier

Login to reply/comment

There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »

why the glossary can not be used properly

Forum rules

Help and orientation

Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators. Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way. More info »

Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business. More info »

Recent posts | FAQ | Rules | Moderators | Article knowledgebase

Your current localization setting

English

Select a language

More languages...

why the glossary can not be used properly

why the glossary can not be used properly

You have native languages that can be verified

Your current localization setting

Select a language