Pages in topic: < [1 2 3] | (Part of) the IATE database can now be downloaded as a massive TBX! Thread poster: Michael Beijer
| Samuel Murray Netherlands Local time: 09:59 Member (2006) English to Afrikaans + ...
Erik Freitag wrote: Following Samuel's suggestion (thanks for that!), I've just tried to remove all languages except NL, EN and DE in Edit Pad Pro. I did three languages at a time. That's how I managed to do it. | | | Michael Beijer United Kingdom Local time: 08:59 Member (2009) Dutch to English + ... TOPIC STARTER
Have you guys tried the latest build of Xbench? You just import the TBX, select the two languages you want, and then export them as tab-delimited UTF-8 text file (or TMX or Excel). Here is German > English, e.g., as a tabbed text file, which I just created for a colleague on the CafeTran mailing list: https://www.dropbox.com/s/1cs2q0qcskiy5n2/IATE_DE-EN.zip ... See more Have you guys tried the latest build of Xbench? You just import the TBX, select the two languages you want, and then export them as tab-delimited UTF-8 text file (or TMX or Excel). Here is German > English, e.g., as a tabbed text file, which I just created for a colleague on the CafeTran mailing list: https://www.dropbox.com/s/1cs2q0qcskiy5n2/IATE_DE-EN.zip Michael
[Edited at 2014-07-15 18:47 GMT] ▲ Collapse | | | Samuel Murray Netherlands Local time: 09:59 Member (2006) English to Afrikaans + ...
Erik Freitag wrote: I've downloaded your file and tried to import it with MultiTerm Convert, but no luck. Error message: '<' is an unexpected token. The expected token is'='. Line 286973, position 9. (God how I love error messages that I can't copy and paste!) Yes, I checked it with XML ValidatorBuddy and found that for some tags deep in the file the tag was truncated, so that that entry would be invalid XML (but still perfectly understandable to a human). It is unfortunate that your term converter stops at the first sign of an error and does not simply ignore the error and drop that particular term from the import. The original TBX file does not have these errors. The errors must have been introduced by Edit Pad Pro.
[Edited at 2014-07-15 18:51 GMT] | | | Samuel Murray Netherlands Local time: 09:59 Member (2006) English to Afrikaans + ...
Michael Beijer wrote: Have you guys tried the latest build of Xbench? You just import the TBX, select the two languages you want, and then export them as tab-delimited UTF-8 text file (or TMX or Excel). The sample file you gave does not have the term IDs. Did you exclude them deliberately or does Xbench export the file without the IDs by itself? | |
|
|
Michael Beijer United Kingdom Local time: 08:59 Member (2009) Dutch to English + ... TOPIC STARTER everything and the kitchen sink | Jul 15, 2014 |
Samuel Murray wrote: Michael Beijer wrote: Have you guys tried the latest build of Xbench? You just import the TBX, select the two languages you want, and then export them as tab-delimited UTF-8 text file (or TMX or Excel). The sample file you gave does not have the term IDs. Did you exclude them deliberately or does Xbench export the file without the IDs by itself? Sorry about that, here is another version with everything left in. I also left ‘Include segment even if source or target text is missing’ switched ON this time. https://www.dropbox.com/s/65znuoeii132067/IATE_DE-EN-(all%20metadata).zip (229MB unzipped) Michael | | | importing failed... | Jul 15, 2014 |
as in it only imported the English entries, but not the Italian ones... obviously, I must have made a mistake somewhere... | | | I know what I did wrong... | Jul 16, 2014 |
didn't add "Italian" as Index field... converting again now... EDIT: 883326 entries converted... now importing into MultiTerm... fingers crossed!
[Edited at 2014-07-16 10:22 GMT] | | | well, it worked... | Jul 16, 2014 |
but out of over 400.000 English entries, only 168.000 were available in Italian... rather disappointing... | |
|
|
DavidHardy Local time: 08:59 Portuguese to English Language choice Suggestion | Jul 21, 2014 |
Hello everybody, Some great work here and thanks. Homework to do! The IATE download page gives the email contact as [email protected] I have emailed them to suggest that they consider including a "choose languages" option (or even subject domain) as Eur... See more Hello everybody, Some great work here and thanks. Homework to do! The IATE download page gives the email contact as [email protected] I have emailed them to suggest that they consider including a "choose languages" option (or even subject domain) as Eurovoc does at http://eurovoc.europa.eu/drupal/?q=download/list_pt&cl=en (The DGT Translation Memory resources also seem to have been made available trying to keep file sizes within reasonable limits - http://ipsc.jrc.ec.europa.eu/index.php?id=197 so perhaps subject domains would be the way to do this with the IATE database). I would suggest emailing them with this/these suggestion(s), then this might be accessible to more translators who can handle importing tbx files into their CAT Tool, but not necessarily cutting and pasting tbx headers and splitting the file. ▲ Collapse | | | Tamatoa Audouin French Polynesia Local time: 21:59 Member (2014) Tahitian to French + ... IATE Tbx file converted as tab-delimited .txt files for OmegaT | Aug 24, 2014 |
Hi all, I'm using OmegaT and would like to use the IATE .tbx file to create glossaries for each language pair, starting with the EN-FR pair. I was able to discard the entries for all unwanted languages with EditPad Pro as explained by Samuel Murray - Regex removal of languages in Edit Pad Pro. As I didn't know in what format I should save the file, I chose to save it as an .xml file with EditPad Pro. After countless hours of browsing on the Internet I... See more Hi all, I'm using OmegaT and would like to use the IATE .tbx file to create glossaries for each language pair, starting with the EN-FR pair. I was able to discard the entries for all unwanted languages with EditPad Pro as explained by Samuel Murray - Regex removal of languages in Edit Pad Pro. As I didn't know in what format I should save the file, I chose to save it as an .xml file with EditPad Pro. After countless hours of browsing on the Internet I'm left without any decent solution. Does anyone know how to make this .xml file work in OmegaT as a glossary? Thanks in advance, Tamatoa
[Edited at 2014-08-24 03:42 GMT]
[Edited at 2014-08-25 01:36 GMT] ▲ Collapse | | | Michael Beijer United Kingdom Local time: 08:59 Member (2009) Dutch to English + ... TOPIC STARTER
I'm not exactly sure what format you want the extractions in, but I suggest you send Henk Sanderson an email. He has written a great little Unix script (or series of scripts) that converts the massive IATE .tbx file into a beautiful tab-delimited UTF-8 text file for use as a CafeTran glossary, or imprt into any other CAT tool really. H... See more I'm not exactly sure what format you want the extractions in, but I suggest you send Henk Sanderson an email. He has written a great little Unix script (or series of scripts) that converts the massive IATE .tbx file into a beautiful tab-delimited UTF-8 text file for use as a CafeTran glossary, or imprt into any other CAT tool really. He can be found here: http://www.proz.com/translator/1672159 His version contains all the metadata! He has also converted the domain codes to their actual names, where possible. Michael
[Edited at 2014-08-24 11:14 GMT] ▲ Collapse | | | Much improved and tidied-up language pairs extracted | Oct 15, 2014 |
I spent months on improving upon extraction of language pairs, removing unwanted content (html-strings disturbing in a termbase), solving the synonym problems harming the Xbench extraction, and facilitating import of the language pairs into various CAT-tools by specifically formatting the output to the requirements of SDL Studio, DVX, CafeTran and others, just to mention some of the improvements. For details see santrans.net | |
|
|
Ron Willems Netherlands Local time: 09:59 Member English to Dutch
Henk Sanderson wrote: I spent months on improving upon extraction of language pairs, removing unwanted content (html-strings disturbing in a termbase), solving the synonym problems harming the Xbench extraction, and facilitating import of the language pairs into various CAT-tools by specifically formatting the output to the requirements of SDL Studio, DVX, CafeTran and others, just to mention some of the improvements. For details see santrans.net Thanks Henk, from your files it took me only 30 mins to join everything (English-Dutch) in my preferred format: no-frills two-column tab-delimited text (for use in Xbench). Now I have a 30 MB .txt file with about 720,000 high-quality IATE translations (including synonyms, I don't know the exact number of unique source entries). A true delight.
[Edited at 2014-10-20 08:34 GMT] | | | Pages in topic: < [1 2 3] | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » (Part of) the IATE database can now be downloaded as a massive TBX! TM-Town | Manage your TMs and Terms ... and boost your translation business
Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.
More info » |
| Trados Business Manager Lite | Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |