Translation preprocessing software?
Thread poster: Andreas Nieckele
Andreas Nieckele
Andreas Nieckele  Identity Verified
Brazil
Local time: 13:58
English to Portuguese
Oct 9, 2009

Hello everyone,

Does anyone know if there is such a thing as translation "preprocessing" software?

Basically, what I would like to do is analyze a document before I start translating, so that it would list the most frequently used terms and expressions throughout the document and perhaps even compare it with a glossary and produce a list containing the terms and their equivalent translations. Preferably it should be possi
... See more
Hello everyone,

Does anyone know if there is such a thing as translation "preprocessing" software?

Basically, what I would like to do is analyze a document before I start translating, so that it would list the most frequently used terms and expressions throughout the document and perhaps even compare it with a glossary and produce a list containing the terms and their equivalent translations. Preferably it should be possible to export this list so I could discuss the terminology with the client beforehand, possibly saving me time later.

I already own Trados 2007 and as far as I know there is no such functionality. But maybe that's because I have a 64-bit system and couldn't get Multiterm to work (I managed to get Workbench and TagEditor working, however). I didn't see a similar function on Wordfast PRO also.

I appreciate any recommendations. Thanks.
Collapse


 
Attila Piróth
Attila Piróth  Identity Verified
France
Local time: 17:58
Member
English to Hungarian
+ ...
Term extraction Oct 9, 2009

Wordfast's free add-on, PlusTools contains a function called "+Extract" that does exactly this: it extracts words and word combinations that occur several times in the document. It is customizable to some extent: you can set the maximum length of expressions to look for (e.g., max. 10 words), the minimum number of occurrence (occurs at least 3 times in the document), and you can also add a list of ignored stopwords ("and", "of", "to", "in", etc.). PlusTools uses a purely statistical approach the... See more
Wordfast's free add-on, PlusTools contains a function called "+Extract" that does exactly this: it extracts words and word combinations that occur several times in the document. It is customizable to some extent: you can set the maximum length of expressions to look for (e.g., max. 10 words), the minimum number of occurrence (occurs at least 3 times in the document), and you can also add a list of ignored stopwords ("and", "of", "to", "in", etc.). PlusTools uses a purely statistical approach then, and presents a usually quite long list that has to be trimmed manually.

See this thread for more options and opinions.

Kind regards,
Attila
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 17:58
Member (2006)
English to Afrikaans
+ ...
Some ideas Oct 9, 2009

Andreas Nieckele wrote:
Basically, what I would like to do is analyze a document before I start translating, so that it would list the most frequently used terms and expressions throughout the document and perhaps even compare it with a glossary and produce a list containing the terms and their equivalent translations.


AntConc, Tenka Text and Corsis are names of free programs (some of them on Sourceforge) that come to mind. ExtPhr32 can also do term extraction (not sure about doublebyte languages). Comparing your extracted list against a glossary can be done using certain UnxUtils tools but I can't remember which (and don't bother unless you're a tinkerer).





[Edited at 2009-10-09 18:24 GMT]


 
Uwe Schwenk (X)
Uwe Schwenk (X)
Local time: 11:58
English to German
Across Oct 9, 2009

across conbtains actually a workflow for this, whereby the first step is to extract termnilogy candidates (you select by checking the words in the list presented and define whether its a term or a stop word).

The next step would be to translate the terms followed by the document translation, etc.

When you have the terms translated you can export them for example as CSV file and send to the client with the suggested translations.


Uwe


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 17:58
French to Polish
+ ...
DVX Lexicon Oct 10, 2009

Attila Piróth wrote:

Wordfast's free add-on, PlusTools contains a function called "+Extract" that does exactly this: it extracts words and word combinations that occur several times in the document.

DVX may create so called Lexicon.

It is customizable to some extent: you can set the maximum length of expressions to look for (e.g., max. 10 words), the minimum number of occurrence (occurs at least 3 times in the document),

DVX behaves in a similar way, i.e. the string length may be limited before the analysis. After the analysis, you may delete the longest/shortest strings and the most/less frequent strings (both functions are configurable).
The list may be resolved automatically using the existing termbases/TMs.

and you can also add a list of ignored stopwords ("and", "of", "to", "in", etc.).

Function not available in DVX.

PlusTools uses a purely statistical approach then, and presents a usually quite long list that has to be trimmed manually.

True.
So why I always strip the less frequent segments (the minimal number of occurencies vary depending of the project size).

DVX is considerably faster than +Extract.

Cheers
GG

[Edited at 2009-10-10 09:08 GMT]


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 17:58
French to Polish
+ ...
Multiterm Extract Oct 10, 2009

Andreas Nieckele wrote:


I already own Trados 2007 and as far as I know there is no such functionality. But maybe that's because I have a 64-bit system and couldn't get Multiterm to work [/quote]
A terminology extraction feature is available in Multiterm Extract which must be purchased separately.

BTW.
The 64-bit issue was partially fixed in the newest Trados releases, AFAIK.
A workaround for the older versions was published somewhere on Proz, search the archives.

Cheers
GG

[Edited at 2009-10-10 09:08 GMT]


 
Simon Cole
Simon Cole  Identity Verified
United Kingdom
Local time: 16:58
Member (2008)
French to English
Trados 2009 Oct 10, 2009

Trados Studio 2009 does exactly this as part of a Batch Task - Analyse File(s). You can specify how many times the segment repeats, etc. The extracted segments can be pre-translated and added to the TM before hitting the main document. As a T2007 user, the upgrade costs a lot less than buying the full application.

Unless you've been in a cave since June, you will of course be aware of debate here and elsewhere about the stability of T2009 - I have managed to tame it a bit by avoid
... See more
Trados Studio 2009 does exactly this as part of a Batch Task - Analyse File(s). You can specify how many times the segment repeats, etc. The extracted segments can be pre-translated and added to the TM before hitting the main document. As a T2007 user, the upgrade costs a lot less than buying the full application.

Unless you've been in a cave since June, you will of course be aware of debate here and elsewhere about the stability of T2009 - I have managed to tame it a bit by avoiding certain practices that provoke crashes (adding terms to a Termbase; using too many keyboard shortcuts quickly in succession; accidentally hitting Ctrl instead of Shift while typing, which can invoke any shortcut...). Still looking forward to release of functional SP1.
Collapse


 
Grzegorz Gryc
Grzegorz Gryc  Identity Verified
Local time: 17:58
French to Polish
+ ...
Subsegment level Oct 10, 2009

Simon Cole wrote:

Trados Studio 2009 does exactly this as part of a Batch Task - Analyse File(s).

No, T2009 Studio does not this in any serious way.

I.e. we discuss here the preprocessing at the subsegment level.
Of course, theorically, you can export frequent segments, resolve 'em and use the AutoSuggest creator but you must isolate at least 25000 segments... not easy...

Cheers
GG


 
claude
claude
Thailand
Local time: 17:58
English to French
Textanz and word macros Oct 12, 2009

I found a small tool called textanz which does the same as some of the tools previously mentioned, i.e. find repetitions and sort them by word numbers or frequency.
But as I worked a long time on specialized domains and specific documents (patents) I created some basic macros in Word doing some "search/replace" jobs and where the replaced words appeared in red. It is a good preparation before using some document analysis tool as all the "and, or, is, are, with, before, after, containing, c
... See more
I found a small tool called textanz which does the same as some of the tools previously mentioned, i.e. find repetitions and sort them by word numbers or frequency.
But as I worked a long time on specialized domains and specific documents (patents) I created some basic macros in Word doing some "search/replace" jobs and where the replaced words appeared in red. It is a good preparation before using some document analysis tool as all the "and, or, is, are, with, before, after, containing, comprising, it is preferred that, furthermore, etc." and many expressions often found in these documents were already translated. After some time, it got quite handy as it was a perfect complement of specific analysis.
I guess in some other context, software for example, you would place in this kind of macro all the "click", "press button", "to open".
Collapse


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

Translation preprocessing software?






Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »