How to update previous translation memory? (OmegaT support)

Foros técnicos » OmegaT support »
How to update previous translation memory?
Track this topic

Páginas sobre el tema: [1 2] >

How to update previous translation memory?

Autor de la hebra: Harklas

Harklas
Local time: 23:49

Feb 16, 2010

OmegaT 2.0.5_02

I start a project called Test1, import the file "weekdays.ods" and translate Monday as Moon Day. Then I "Create Translated Document" and close the project.

I start a project called Test2, import the file "weekdays2.ods", and in folder TM I put the translation memory created by Test1.

It now suggests that I translate Monday to Moon Day, but I go direct to Tuesday and translate it as "Tyr's Day", after which I "Create Translated Document" and close the project.

I start a project called Test3, import the file "weekdays3.ods", and in folder TM I put the translation memory created by Test2.

It now suggests that I translate Tuesday to Tyr's Day like I did last time, but I go to Monday to see what it suggests. It suggests nothing, the TM I put in Test 2 wasn't in the TM produced by Test2

I need to somehow merge the translation memory created by Test1 and Test2, so I can have one growing TM for all projects, and not an ever-growing folder with one TM file for each project I have done ...

I'd be most grateful for a solution

▲ Collapse

Samuel Murray

Países Bajos
Local time: 23:49
Miembro 2006
inglés al afrikaans
+ ...

@Harklas

Feb 17, 2010

Harklas wrote:
It suggests nothing, the TM I put in Test 2 wasn't in the TM produced by Test2. ... I need to somehow merge the translation memory created by Test1 and Test2, so I can have one growing TM for all projects, and not an ever-growing folder with one TM file for each project I have done.

Background information:

1. OmegaT has no TM merge function built-in.
2. The TMs that you place in the /tm/ folder will be consulted during translation.
3. The TMs that appear in the root of your project folder contain all of the segments (and only those segments) that appear in your source and target files.
4. The TM that OmegaT reads from *and* writes to, is called project_save.tmx, and it is in your project's /omegat/ folder.

Applicable to your situation:

1. You can merge TMs using another program, if you like.
2. Or, you can re-use the TM called "project_save.tmx". Simply replace the empty one that is created at the start of each new project with the old one that you had saved from previous translations. Remember to close the project in OmegaT before replacing the file (otherwise OmegaT will overwrite it).
3. Unfortunately, the project_save.tmx file must have that name, and no other name. So if you want to save it somewhere (e.g. under a client's name, you can rename it to client123_project_save.tmx, but whenever you put it in a new project's /omegat/ folder, it needs to be renamed "project_save.tmx" before OmegaT will use it.

What would be really nice is if OmegaT could detect multiple TMX files in the /omegat/ folder, and merge them when it detects them, into a single project_save.tmx file.

Didier Briel

Francia
Local time: 23:49
inglés al francés
+ ...

To merge, you can use TMXMerger

Feb 17, 2010

Samuel Murray wrote:
1. You can merge TMs using another program, if you like.

For that, you can use TMXMerger.
You can get it from OmegaT Resources

It's a simple command line tool.

Didier

Samuel Murray

Países Bajos
Local time: 23:49
Miembro 2006
inglés al afrikaans
+ ...

LOL @ Didier

Feb 17, 2010

Didier Briel wrote:
It's a simple command line tool.

If there is a list of oxymorons from the opensource world, this item must be in the top 5. Reworded: the more cryptic something is, the easier it is to use.

I found this article very informative (if a little geeky):
http://www.burgaud.com/open-command-window-here/

For that, you can use TMXMerger.
You can get it from http://www.omegat.org/en/resources.html

Have you ever used TMXMerger, Didier? It seems very useful. How does it work?

Didier Briel

Francia
Local time: 23:49
inglés al francés
+ ...

Simple means: no bells and whistles

Feb 17, 2010

Samuel Murray wrote:

Didier Briel wrote:
It's a simple command line tool.

If there is a list of oxymorons from the opensource world, this item must be in the top 5. Reworded: the more cryptic something is, the easier it is to use.

Pardon my French.
I didn't mean to write it is easy to use. I meant it has no bells and whistles, even for a command line tool.

Have you ever used TMXMerger, Didier?

Yes, a few times.
I have not often the need to merge TMXs.

It seems very useful. How does it work?

Supposing one knows how to use a command line:
java -jar TMXMerger-1.0.jar first-tmx-to-merge second-tmx-to-merge final-merged-tmx
Actually, if you just launch TMXMerger with no TMX (i.e., java -jar TMXMerger-1.0.jar), it gives you reasonable instructions.

Didier

Samuel Murray

Países Bajos
Local time: 23:49
Miembro 2006
inglés al afrikaans
+ ...

Thanks, Didier

Feb 17, 2010

Didier Briel wrote:
Supposing one knows how to use a command line:
java -jar TMXMerger-1.0.jar first-tmx-to-merge second-tmx-to-merge final-merged-tmx
Actually, if you just launch TMXMerger with no TMX (i.e., java -jar TMXMerger-1.0.jar), it gives you reasonable instructions.

I did launch it using a command-line window, but it gave me no instructions whatsoever. I also tried adding the usual -h, --h, -help, --help, /?, /h and /help switches, all to no avail.

I must add that I did not use "java -jar". It it not obvious to me to do that -- with most commandline utilities in Windows, you just type the name of the program. I'm not sure why Java should be different. Using "java -jar" would be obvious to people who regularly use or have previously launched Java programs from the commandline.

Harklas
Local time: 23:49

PERSONA QUE INICIÓ LA HEBRA

Merged TMX shrinked to half size ...

Feb 17, 2010

Didier Briel wrote:

Supposing one knows how to use a command line:
java -jar TMXMerger-1.0.jar first-tmx-to-merge second-tmx-to-merge final-merged-tmx

Thanks! I used this Java tool for the two memories. It worked smoothly by putting the two memories and the Java tool in the same folder, and then klick SHIFT while right-clicking in the window (I have Vista) and then type the line you gave me.

But while my customers old TMX was 9246 KB and my new TMX was 11 KB, the merged TMX is 4169 KB.

It looks like half of the old TMX was destroyed somewhere in the process

Any experience from this?

In the meanwhile, Ill see if Murrays workaround will cause the same result ...

Didier Briel

Francia
Local time: 23:49
inglés al francés
+ ...

Size is no indication of the content of a TMX

Feb 17, 2010

Harklas wrote:
But while my customers old TMX was 9246 KB and my new TMX was 11 KB, the merged TMX is 4169 KB.

It looks like half of the old TMX was destroyed somewhere in the process

Any experience from this?

Where does your customer's TMX come from?

If it is from Trados, for instance, it usually contains a huge list of fonts, etc., which is useless (in OmegaT), but takes a lot of space. This is deleted when merging TMXs.

As you are under Windows, you can use Olifant to check the number of Translation Units of your TMXs, which is the important thing.

Didier

Samuel Murray

Países Bajos
Local time: 23:49
Miembro 2006
inglés al afrikaans
+ ...

Olifant, of course!

Feb 17, 2010

Didier Briel wrote:
As you are under Windows, you can use Olifant to check the number of Translation Units of your TMXs, which is the important thing.

I forgot about Olifant! Well, you can open one TMX file in Olifant, and then select "Import" and import the second TMX file, and this will give you the same result as merging. It's just nice if one could use a small application.

The size reduction isn't necessarily a problem. I just tested it by merging to nearly identical TMs. The two TMs had roughly 800 TUs each, 550 KB each. The combined TM using Olifant was roughly 1600 TUs large (950 KB), but after removing duplicates, it was roughly 800 TUs again (obviously). Merging the same two TMs using TMXmerge gave me a file that was only 250 KB large, but it contained nearly the same number of TUs as the Okapi file that had duplicates removed. This means that TMXmerge automatically removes duplicates when merging.

The downside of TMXmerge was that the user ID and creation date of each TU was removed by TMXmerge. At the time when TMXmerge was written, OmegaT did not support these attributes. Perhaps it's time for an updated version of TMXmerge that supports these attributes...

Harklas
Local time: 23:49

PERSONA QUE INICIÓ LA HEBRA

And Olifant counts Translation Units!

Feb 17, 2010

Thanks a lot! I downloaded it from here: http://sourceforge.net/projects/okapi/files/Olifant%20(Stable)/ (the latest one, #22)

And it showed me that the TU:s had indeed increased.

When summing up the two TM:s, I got a 10 lines higher figure than when counting the lines in the merged TM.

But that's a diff I'm sure I and the customer can live with. Bet those were just duplicates.

As for the other issues, user ID etc, let's just see if we get any feedback ...

Thanks a lot; it was indeed beneficial for our business to find you guys

▲ Collapse

trobinson

Updating previous translation memory

Nov 16, 2010

Samuel Murray wrote:

Background information:

1. OmegaT has no TM merge function built-in.
2. The TMs that you place in the /tm/ folder will be consulted during translation.
3. The TMs that appear in the root of your project folder contain all of the segments (and only those segments) that appear in your source and target files.
4. The TM that OmegaT reads from *and* writes to, is called project_save.tmx, and it is in your project's /omegat/ folder.

Applicable to your situation:

1. You can merge TMs using another program, if you like.
2. Or, you can re-use the TM called "project_save.tmx". Simply replace the empty one that is created at the start of each new project with the old one that you had saved from previous translations. Remember to close the project in OmegaT before replacing the file (otherwise OmegaT will overwrite it).

I found this explanation very helpful, however, simply copying the previous project_save.tmx file to the /omegat folder resulted in no matches. Does it need to be copied to both the /omegat folder and to the /tm folder of the new project?

Samuel Murray

Países Bajos
Local time: 23:49
Miembro 2006
inglés al afrikaans
+ ...

Was OmegaT closed when you copied?

Nov 16, 2010

avastor wrote:
I found this explanation very helpful, however, simply copying the previous project_save.tmx file to the /omegat folder resulted in no matches.

Was OmegaT closed (i.e. not running) when you copied the file? If not, then OmegaT will overwrite the copied file with a blank file as soon as you start doing something in OmegaT.

trobinson

Tags in OmegaT

Nov 18, 2010

Samuel Murray wrote:
Was OmegaT closed (i.e. not running) when you copied the file? If not, then OmegaT will overwrite the copied file with a blank file as soon as you start doing something in OmegaT.

Thanks, I'm sure I must have done that or something equally idiotic. I wonder if I might pick your brain further. After first loading a .docx file for translation, there are many long strings of tags surrounding various bits of text. As it's not clear what these tags are, and as the translated text will usually be completely different in terms of both the number of words and the word order, I'm wondering how best to deal with these.
For example, word word word word word word word word word.
In every segment there are many such tags. How to deal with these in the translation? Thanks again in advance.

Didier Briel

Francia
Local time: 23:49
inglés al francés
+ ...

Use the "Latest" version

Nov 19, 2010

avastor wrote:
After first loading a .docx file for translation, there are many long strings of tags surrounding various bits of text. As it's not clear what these tags are, and as the translated text will usually be completely different in terms of both the number of words and the word order, I'm wondering how best to deal with these.
For example, {/w23}{/w16}{w24}{w25}{w26/} word word word {/w31}{/w24}{w32}{w33}{w34/}{w35/} word {/w39}{/w32}{w40}{w41} word word word word word. word word word word word.
In every segment there are many such tags. How to deal with these in the translation?

I have replaced the "lesser than" and "greater than" characters in your example to make them visible.
The "Latest" version has a tag reduction feature.
Your example would become {t0/} word word word {t1/} word {t2/} word word word word word. word word word word word.

Some of the unnecessary tags are also caused by Word itself (because of spell checking, for instance).
I recommend reading HOWTO: Translating Word 2007 (Office Open XML, .docx) files in OmegaT.

Didier

Anders Dalström
Suecia
Local time: 23:49
Miembro 2008
inglés al sueco
+ ...

Olifant - segments from imported tmx are all blank

Apr 11, 2011

Thanks for the useful info in this thread. I have a problem though: I have just tried to use Olifant to merge two tmx-files, one which the client provided containing previous translations and the other the tmx-file which Omega T created after I finished the current translation, but the segments from the new/second TM all show as blank, i.e. I can see the segments from the TM the client sent me (1900 segments) but the 200 new segments which I have translated are all blank. I have tried to use al... See more

Páginas sobre el tema: [1 2] >

Login to reply/comment

Este foro no tiene moderador específicamente asignado.
Para denunciar violaciones a las reglas del sitio u obtener ayuda, póngase en contacto con el personal del sitio »

How to update previous translation memory?

Forum rules

Help and orientation

Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business. More info »

TM-Town
Manage your TMs and Terms ... and boost your translation business Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work. More info »

Contribuciones recientes | Preguntas frecuentes | Reglas | Moderadores | Base de artículos

Your current localization setting

español

Select a language

More languages...

How to update previous translation memory?

How to update previous translation memory?

You have native languages that can be verified

Your current localization setting

Select a language