Pages in topic:   < [1 2 3]
(Part of) the IATE database can now be downloaded as a massive TBX!
Thread poster: Michael Beijer
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 09:59
Member (2006)
English to Afrikaans
+ ...
@Erik Jul 15, 2014

Erik Freitag wrote:
Following Samuel's suggestion (thanks for that!), I've just tried to remove all languages except NL, EN and DE in Edit Pad Pro.


I did three languages at a time. That's how I managed to do it.


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 08:59
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
Xbench? Jul 15, 2014

Have you guys tried the latest build of Xbench? You just import the TBX, select the two languages you want, and then export them as tab-delimited UTF-8 text file (or TMX or Excel).

Here is German > English, e.g., as a tabbed text file, which I just created for a colleague on the CafeTran mailing list: https://www.dropbox.com/s/1cs2q0qcskiy5n2/IATE_DE-EN.zip

... See more
Have you guys tried the latest build of Xbench? You just import the TBX, select the two languages you want, and then export them as tab-delimited UTF-8 text file (or TMX or Excel).

Here is German > English, e.g., as a tabbed text file, which I just created for a colleague on the CafeTran mailing list: https://www.dropbox.com/s/1cs2q0qcskiy5n2/IATE_DE-EN.zip

Michael

[Edited at 2014-07-15 18:47 GMT]
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 09:59
Member (2006)
English to Afrikaans
+ ...
@Erik Jul 15, 2014

Erik Freitag wrote:
I've downloaded your file and tried to import it with MultiTerm Convert, but no luck. Error message: '<' is an unexpected token. The expected token is'='. Line 286973, position 9. (God how I love error messages that I can't copy and paste!)


Yes, I checked it with XML ValidatorBuddy and found that for some tags deep in the file the tag was truncated, so that that entry would be invalid XML (but still perfectly understandable to a human). It is unfortunate that your term converter stops at the first sign of an error and does not simply ignore the error and drop that particular term from the import.

The original TBX file does not have these errors. The errors must have been introduced by Edit Pad Pro.


[Edited at 2014-07-15 18:51 GMT]


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 09:59
Member (2006)
English to Afrikaans
+ ...
@Michael Jul 15, 2014

Michael Beijer wrote:
Have you guys tried the latest build of Xbench? You just import the TBX, select the two languages you want, and then export them as tab-delimited UTF-8 text file (or TMX or Excel).


The sample file you gave does not have the term IDs. Did you exclude them deliberately or does Xbench export the file without the IDs by itself?


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 08:59
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
everything and the kitchen sink Jul 15, 2014

Samuel Murray wrote:

Michael Beijer wrote:
Have you guys tried the latest build of Xbench? You just import the TBX, select the two languages you want, and then export them as tab-delimited UTF-8 text file (or TMX or Excel).


The sample file you gave does not have the term IDs. Did you exclude them deliberately or does Xbench export the file without the IDs by itself?


Sorry about that, here is another version with everything left in. I also left ‘Include segment even if source or target text is missing’ switched ON this time.

https://www.dropbox.com/s/65znuoeii132067/IATE_DE-EN-(all%20metadata).zip (229MB unzipped)

Michael


 
Giovanni Guarnieri MITI, MIL
Giovanni Guarnieri MITI, MIL  Identity Verified
United Kingdom
Local time: 08:59
Member (2004)
English to Italian
importing failed... Jul 15, 2014

as in it only imported the English entries, but not the Italian ones... obviously, I must have made a mistake somewhere...

 
Giovanni Guarnieri MITI, MIL
Giovanni Guarnieri MITI, MIL  Identity Verified
United Kingdom
Local time: 08:59
Member (2004)
English to Italian
I know what I did wrong... Jul 16, 2014

didn't add "Italian" as Index field... converting again now...

EDIT: 883326 entries converted... now importing into MultiTerm... fingers crossed!

[Edited at 2014-07-16 10:22 GMT]


 
Giovanni Guarnieri MITI, MIL
Giovanni Guarnieri MITI, MIL  Identity Verified
United Kingdom
Local time: 08:59
Member (2004)
English to Italian
well, it worked... Jul 16, 2014

but out of over 400.000 English entries, only 168.000 were available in Italian... rather disappointing...

 
DavidHardy
DavidHardy
Local time: 08:59
Portuguese to English
Language choice Suggestion Jul 21, 2014

Hello everybody,

Some great work here and thanks. Homework to do!

The IATE download page gives the email contact as [email protected]

I have emailed them to suggest that they consider including a "choose languages" option (or even subject domain) as Eur
... See more
Hello everybody,

Some great work here and thanks. Homework to do!

The IATE download page gives the email contact as [email protected]

I have emailed them to suggest that they consider including a "choose languages" option (or even subject domain) as Eurovoc does at http://eurovoc.europa.eu/drupal/?q=download/list_pt&cl=en


(The DGT Translation Memory resources also seem to have been made available trying to keep file sizes within reasonable limits - http://ipsc.jrc.ec.europa.eu/index.php?id=197 so perhaps subject domains would be the way to do this with the IATE database).



I would suggest emailing them with this/these suggestion(s), then this might be accessible to more translators who can handle importing tbx files into their CAT Tool, but not necessarily cutting and pasting tbx headers and splitting the file.
Collapse


 
Tamatoa Audouin
Tamatoa Audouin  Identity Verified
French Polynesia
Local time: 21:59
Member (2014)
Tahitian to French
+ ...
IATE Tbx file converted as tab-delimited .txt files for OmegaT Aug 24, 2014

Hi all,
I'm using OmegaT and would like to use the IATE .tbx file to create glossaries for each language pair, starting with the EN-FR pair.

I was able to discard the entries for all unwanted languages with EditPad Pro as explained by Samuel Murray - Regex removal of languages in Edit Pad Pro.

As I didn't know in what format I should save the file, I chose to save it as an .xml file with EditPad Pro.

After countless hours of browsing on the Internet I
... See more
Hi all,
I'm using OmegaT and would like to use the IATE .tbx file to create glossaries for each language pair, starting with the EN-FR pair.

I was able to discard the entries for all unwanted languages with EditPad Pro as explained by Samuel Murray - Regex removal of languages in Edit Pad Pro.

As I didn't know in what format I should save the file, I chose to save it as an .xml file with EditPad Pro.

After countless hours of browsing on the Internet I'm left without any decent solution.

Does anyone know how to make this .xml file work in OmegaT as a glossary?

Thanks in advance,
Tamatoa

[Edited at 2014-08-24 03:42 GMT]

[Edited at 2014-08-25 01:36 GMT]
Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 08:59
Member (2009)
Dutch to English
+ ...
TOPIC STARTER
@tamatoa Aug 24, 2014

I'm not exactly sure what format you want the extractions in, but I suggest you send Henk Sanderson an email. He has written a great little Unix script (or series of scripts) that converts the massive IATE .tbx file into a beautiful tab-delimited UTF-8 text file for use as a CafeTran glossary, or imprt into any other CAT tool really.

H
... See more
I'm not exactly sure what format you want the extractions in, but I suggest you send Henk Sanderson an email. He has written a great little Unix script (or series of scripts) that converts the massive IATE .tbx file into a beautiful tab-delimited UTF-8 text file for use as a CafeTran glossary, or imprt into any other CAT tool really.

He can be found here: http://www.proz.com/translator/1672159

His version contains all the metadata! He has also converted the domain codes to their actual names, where possible.

Michael

[Edited at 2014-08-24 11:14 GMT]
Collapse


 
Henk Sanderson
Henk Sanderson  Identity Verified
Netherlands
Local time: 09:59
German to Dutch
+ ...
Much improved and tidied-up language pairs extracted Oct 15, 2014

I spent months on improving upon extraction of language pairs, removing unwanted content (html-strings disturbing in a termbase), solving the synonym problems harming the Xbench extraction, and facilitating import of the language pairs into various CAT-tools by specifically formatting the output to the requirements of SDL Studio, DVX, CafeTran and others, just to mention some of the improvements.
For details see santrans.net


 
Ron Willems
Ron Willems  Identity Verified
Netherlands
Local time: 09:59
Member
English to Dutch
Very useful Oct 20, 2014

Henk Sanderson wrote:

I spent months on improving upon extraction of language pairs, removing unwanted content (html-strings disturbing in a termbase), solving the synonym problems harming the Xbench extraction, and facilitating import of the language pairs into various CAT-tools by specifically formatting the output to the requirements of SDL Studio, DVX, CafeTran and others, just to mention some of the improvements.
For details see santrans.net


Thanks Henk, from your files it took me only 30 mins to join everything (English-Dutch) in my preferred format: no-frills two-column tab-delimited text (for use in Xbench). Now I have a 30 MB .txt file with about 720,000 high-quality IATE translations (including synonyms, I don't know the exact number of unique source entries). A true delight.

[Edited at 2014-10-20 08:34 GMT]


 
Pages in topic:   < [1 2 3]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

(Part of) the IATE database can now be downloaded as a massive TBX!







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »