Can your CAT tool open my test TMX file?
Autor de la hebra: Samuel Murray
Samuel Murray
Samuel Murray  Identity Verified
Países Bajos
Local time: 07:05
Miembro 2006
inglés al afrikaans
+ ...
Mar 19, 2013

G'day everyone

I'd like to know if my test TMX file (3 segments) can be opened in several other CAT tools. If you're willing to help me, please download the test TMX file and see if your CAT tool can read it (all three segments) and if you can access all three segments (they have the numbers 1, 2 and 3 at the end). I boobytrapped segment #2 and I want to see which CAT tools fall for it (it won't harm your computer). Also if you have TMX validation tools or converters, I'd love to
... See more
G'day everyone

I'd like to know if my test TMX file (3 segments) can be opened in several other CAT tools. If you're willing to help me, please download the test TMX file and see if your CAT tool can read it (all three segments) and if you can access all three segments (they have the numbers 1, 2 and 3 at the end). I boobytrapped segment #2 and I want to see which CAT tools fall for it (it won't harm your computer). Also if you have TMX validation tools or converters, I'd love to know if they had any problems with the file.

http://wikisend.com/download/460404/commatest2%20WfMemory.zip

Thanks
Samuel



[Edited at 2013-03-19 17:02 GMT]
Collapse


 
Bernard Lieber
Bernard Lieber  Identity Verified
Local time: 07:05
inglés al francés
+ ...
CAT Tools Mar 19, 2013

Hi Samuel,

Which CAT tools have you already tested it with?

First test with DéjàVuX2 latest build, no problem imports all 3 segments without any issue. Can't test with Studio 2011 as it's not my language combination

[Edited at 2013-03-19 19:17 GMT]


 
Joakim Braun
Joakim Braun  Identity Verified
Suecia
Local time: 07:05
alemán al sueco
+ ...
Yes Mar 19, 2013

Yes, my application Xoterm can read this.

When opened in a text editor it also looks like a perfectly good TMX file to me.
What do you mean, "booby-trapped"?


 
Anna Villegas
Anna Villegas
México
Local time: 00:05
inglés al español
SDL Trados 2007 Mar 19, 2013

The rain in Spain, falls mainly on the plains1.
Die reent in Spanje, val saggies op die blanje.

The rain in Spain, falls mainly on the plains2.
Die reent in Spanje‚ val saggies op die blanje.

The rain in Spain, falls mainly on the plains3.
Die reent in Spanje, val saggies op die blanje.

Is this right?



 
nrichy (X)
nrichy (X)
Francia
Local time: 07:05
francés al neerlandés
+ ...
Studio 2011 Mar 19, 2013

Opens without problems and without indicating the language codes. Same result as for Carvallo.

WFC 6.03t in Word 2010: no problem.

20130319~175434 SM 0 EN-ZA The rain in Spain, falls mainly on the plains1. AF-ZA Die reent in Spanje, val saggies op die blanje.
20130319~175422 N 0 EN-ZA The rain in Spain, falls mainly on the plains2. AF-ZA Die reent in Spanje‚ val saggies op die blanje.
20130319~175422 N 0 EN-ZA The rain in Spain, falls mainly on the
... See more
Opens without problems and without indicating the language codes. Same result as for Carvallo.

WFC 6.03t in Word 2010: no problem.

20130319~175434 SM 0 EN-ZA The rain in Spain, falls mainly on the plains1. AF-ZA Die reent in Spanje, val saggies op die blanje.
20130319~175422 N 0 EN-ZA The rain in Spain, falls mainly on the plains2. AF-ZA Die reent in Spanje‚ val saggies op die blanje.
20130319~175422 N 0 EN-ZA The rain in Spain, falls mainly on the plains3. AF-ZA Die reent in Spanje, val saggies op die blanje

WP Pro 3.1.3: I had to indicate language codes (English for South Africa and Afrikaans for South Africa). Same results as above.

Does this help?
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 07:05
inglés al húngaro
+ ...
Passes Mar 19, 2013

The TMXCheck TMX verifier, which IIRC I got from LISA's now defunct website, doesn't find fault with it.
What's the trick BTW? It looks like a totally plain & straightforward TMX to me.


 
Paz González
Paz González  Identity Verified
Chile
inglés al español
Fluency Translation 2013 Mar 19, 2013

Hi Samuel,

I could open them at the third attempt. The first time opened it just 1, the second time just 2 and the third time the 3 of them. I must tell you that I'm learning to work with CATs, so this was a great test for me and I don't know if I did it right the first and second time, maybe I'm not that's why I could do it at the third time.

With Fluency you can import the file without problem, but to open it you have to indicate the language pair.

Does t
... See more
Hi Samuel,

I could open them at the third attempt. The first time opened it just 1, the second time just 2 and the third time the 3 of them. I must tell you that I'm learning to work with CATs, so this was a great test for me and I don't know if I did it right the first and second time, maybe I'm not that's why I could do it at the third time.

With Fluency you can import the file without problem, but to open it you have to indicate the language pair.

Does this help?
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Países Bajos
Local time: 07:05
Miembro 2006
inglés al afrikaans
+ ...
PERSONA QUE INICIÓ LA HEBRA
Thanks, everyone Mar 19, 2013

Samuel Murray wrote:
I'd like to know if my test TMX file (3 segments) can be opened in several other CAT tools.


Thanks, everyone.

I have since learnt that the problem I had with the one tool that I had the problem with, is likely a problem with that tool only. The tool in question is Virtaal. The comma in the target field of the second segment in the TMX is not a comma, but it looks like one. It is a character whose Unicode number is 201A. Virtaal thinks it's character number 001A. Virtaal also thinks that character number 221A is character 001A (and this probably applies to others that end on "1A" as well). It is a fairly serious bug, IMO, because it causes Virtaal to think that the file ends at that character. And I have no idea when it is going to be fixed.


 
victor_lo (X)
victor_lo (X)
Local time: 14:05
Heartsome Translation Studio 8.2.0 Mar 20, 2013

Heartsome Translation Studio 8.2.0 imports all 3 segments successfully.

However, the language code af-ZA is not included by default, so I had to add it in Tools | Options | Languages. Also reported to their tech support, hope they will fix it later.


 
Ambrose Li
Ambrose Li  Identity Verified
Canadá
Local time: 01:05
inglés
+ ...
Little-endian UCS-2 files Mar 20, 2013

The file in question is little-endian UCS-2 (or UTF-16), so I will probably say it is not, technically speaking, treating U+201A as U+001A. It is seeing ASCII code 0x1A, period. It does not even know what the next byte (the second half of the Unicode) is. In short, this piece of software is not Unicode compliant.

 
Samuel Murray
Samuel Murray  Identity Verified
Países Bajos
Local time: 07:05
Miembro 2006
inglés al afrikaans
+ ...
PERSONA QUE INICIÓ LA HEBRA
Again, please Mar 20, 2013

G'day everyone

Okay, I wanted to test if the Unicode character U+201A is a problem for CAT tools, and it isn't.

However, if you would be so kind, I would also like to know how CAT tools respond to the Unicode character U+001A in a TMX file. This is an invalid character in XML.

Please download this second TMX
... See more
G'day everyone

Okay, I wanted to test if the Unicode character U+201A is a problem for CAT tools, and it isn't.

However, if you would be so kind, I would also like to know how CAT tools respond to the Unicode character U+001A in a TMX file. This is an invalid character in XML.

Please download this second TMX file:
http://wikisend.com/download/981020/substitute%20test%20WfMemory.zip
and open it in your CAT tool, and if it successfully opens, export it to TMX again, to see if the exported TMX contains the same content as the original TMX file. The character U+001A is present in all three segments, just before the word "saggies", in three different forms.

So far, I've tested it in XML Validator, TMX Validator, Wordfast Pro, Virtaal, OmegaT, Wordfast Classic, and Trados 2007.

Only Wordfast Classic and Trados 2007 accepts the file. Wordfast Classic removed the entities from the first two segments but retained the invalid character in the third segment. When I exported a TMX from Wordfast again, none of the segment contained the invalid character. I can't see inside Trados's TM, but when I exported a TMX from Trados again, none of the segments contained the invalid character. Wordfast Classic and Trados 2007 are forgiving, then, when reading.

So far, this experiment taught me that XML is intolerant of invalid characters even in their traditional entity form.

Thanks
Samuel
Collapse


 
Bernard Lieber
Bernard Lieber  Identity Verified
Local time: 07:05
inglés al francés
+ ...
DéjàVux2 Mar 20, 2013

I imported your tmx as a project file and not as a TM, displays the following: Die reent in Spanje, val saggies op die blanje. With an arrow after val. When exporting, is replaced by val SUB in each segment.

Also tried Alchemy Publisher 3.0 attaching your tmx as TM and the three segments are displayed correctly. Sorry, forgot to uncheck your first tmx (provides a 98% match) but Publisher can't load your second tmx at all

[Edited at 2013-03-20 12:00 GMT]

[Edited at
... See more
I imported your tmx as a project file and not as a TM, displays the following: Die reent in Spanje, val saggies op die blanje. With an arrow after val. When exporting, is replaced by val SUB in each segment.

Also tried Alchemy Publisher 3.0 attaching your tmx as TM and the three segments are displayed correctly. Sorry, forgot to uncheck your first tmx (provides a 98% match) but Publisher can't load your second tmx at all

[Edited at 2013-03-20 12:00 GMT]

[Edited at 2013-03-20 14:24 GMT]
Collapse


 
Ambrose Li
Ambrose Li  Identity Verified
Canadá
Local time: 01:05
inglés
+ ...
WfA Mar 20, 2013

WordFast Anywhere reports “Uploaded memory has no valid translation units”.

 
Samuel Murray
Samuel Murray  Identity Verified
Países Bajos
Local time: 07:05
Miembro 2006
inglés al afrikaans
+ ...
PERSONA QUE INICIÓ LA HEBRA
@Li Mar 21, 2013

Ambrose Li wrote:
WordFast Anywhere reports “Uploaded memory has no valid translation units”.


I wonder if WFA rejected the entire TM or only the three segments that were invalid (as there were only three segments in it, all will be rejected in this case).


 
Ambrose Li
Ambrose Li  Identity Verified
Canadá
Local time: 01:05
inglés
+ ...
WFA details Mar 21, 2013

I did a little test involving one of my real TM’s. WFA basically imported the whole test TM, but with the U+001A character removed.

So I’m now a little perplexed as to why it had refused to import your test TM.


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Can your CAT tool open my test TMX file?







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »