Pages in topic:   < [1 2 3 4 5 6 7 8 9] >
New free & open source aligner (for Windows, OS X and linux)
Thread poster: FarkasAndras
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Poland
Local time: 21:56
English to Polish
+ ...
So it's Hunalign? Apr 25, 2013

FarkasAndras wrote:

To the best of my knowledge, LF Aligner never merges lines*. I.e. every line break is a segment delimiter. Of course hunalign merges segments as it sees fit (merge several segments in one language and pair them up with one longer segment in the other language), but that's a different matter.
When LF Aligner asks you whether you want to revert to the unsegmented file versions, you can open the XXXXX_seg.txt files to see how the segmentation went.
If you're seeing merged lines, send me the log and the source files and I'll have a look.

* Unless you're using the pdf ('p') filetype.


I haven't seen such a prompt on reverting to unsegmented, but I will read the prompts more closely.

I'm using txt files that I extract from XML files, and preprocess before alignment.

Please be assured that I find LF Aligner very useful, it's one of the gems of free software.

Regards,

Piotr


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 21:56
English to Hungarian
+ ...
TOPIC STARTER
hunalign Apr 26, 2013

If you want to see what's going on exactly, make a copy of the segmented files after the segmentation is completed, before you choose whether to use the segmented file versions or revert to the unsegmented form of the texts. Then complete the alignment and compare the files (raw, segmented and aligned).
The segmenter doesn't merge lines, and hunalign only merges them as much as it needs to in order to bring the two texts into sync.


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Poland
Local time: 21:56
English to Polish
+ ...
What can be done, when alignment in Excel was interrupted? Apr 30, 2013

Thanks for your patience and explanations.

Now I have another problem. I had to restart my system because I lost my Internet connection and couldn't recover it without rebooting.

I saved the unfinished Excel file but I had to kill the console program when restarting the system.

Is there a way to recover a TMX from the Excel file when I have finished reviewing it?

Regards,

Piotr


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 21:56
English to Hungarian
+ ...
TOPIC STARTER
Sure Apr 30, 2013

Piotr Bienkowski wrote:

Thanks for your patience and explanations.

Now I have another problem. I had to restart my system because I lost my Internet connection and couldn't recover it without rebooting.

I saved the unfinished Excel file but I had to kill the console program when restarting the system.

Is there a way to recover a TMX from the Excel file when I have finished reviewing it?

Regards,

Piotr

Just use the TMX maker in aligner/other_tools when you're done with the review.


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Poland
Local time: 21:56
English to Polish
+ ...
Old version bug alert Jun 9, 2013

Users of LF Aligner versions prior to 4.04 should seriously consider upgrading to the latest version, if they care about the results of their work and client feedback.

My client thought I was doing sloppy alignment work and was not very happy, to put it mildly, while it turned out during "troubleshooting" that it was the procedure that exported XLS to TMX (the TMX maker?) that was to blame. So far I have found several confirmed cases of TUs not exported to TMX the same way as they w
... See more
Users of LF Aligner versions prior to 4.04 should seriously consider upgrading to the latest version, if they care about the results of their work and client feedback.

My client thought I was doing sloppy alignment work and was not very happy, to put it mildly, while it turned out during "troubleshooting" that it was the procedure that exported XLS to TMX (the TMX maker?) that was to blame. So far I have found several confirmed cases of TUs not exported to TMX the same way as they were aligned in XLS.

Andras is doing a great job and service to the translator community by providing the Aligner, but obviously, nobody's perfect, so that's what upgrades are for.

Regards,

Piotr
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 21:56
English to Hungarian
+ ...
TOPIC STARTER
specifics? Jun 9, 2013

What bug did you experience exactly? I do remember fixing numerous bugs in the tmx maker (they're listed in the changelog) but none of them matches your description.

Note that the behaviour of the tmx maker changed in recent versions: by default, it now skips segments that only have text in one language.


 
Piotr Bienkowski
Piotr Bienkowski  Identity Verified
Poland
Local time: 21:56
English to Polish
+ ...
Specifics Jun 9, 2013

FarkasAndras wrote:

What bug did you experience exactly? I do remember fixing numerous bugs in the tmx maker (they're listed in the changelog) but none of them matches your description.

Note that the behaviour of the tmx maker changed in recent versions: by default, it now skips segments that only have text in one language.


Some segments that were perfectly well aligned in the sheet were not so in the TMX file, for example translation broken off at a hyphen, and source nowhere to be found.

I am happy to report though that the latest stable build I downloaded from sourceforge does not have this bug. I tested it on Excel files that earlier produced bad TMX files with TMX maker.

Regards,

Piotr


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 21:56
English to Hungarian
+ ...
TOPIC STARTER
Odd Jun 9, 2013

Piotr Bienkowski wrote:

Some segments that were perfectly well aligned in the sheet were not so in the TMX file, for example translation broken off at a hyphen, and source nowhere to be found.

I am happy to report though that the latest stable build I downloaded from sourceforge does not have this bug. I tested it on Excel files that earlier produced bad TMX files with TMX maker.

Regards,

Piotr


That's interesting. I haven't seen this bug and nobody else reported it. If you have the files at hand, please send them (txt and tmx) to lfaligner (gmail) and let me know which version produced bad TMX files for you.


 
Noe Tessmann
Noe Tessmann  Identity Verified
Local time: 21:56
English to German
+ ...
batch alignment of EC proposals of 2012/2013 possible? Jun 11, 2013

Hi Andras,

the direct alignment of wep pages without downloading them is really the killer feature I was waiting for since a long time. I put it already several times on the feature wish list of MemoQ. This is a great achievement.


I am not so much into coding so I just ask if it would be possible to do a batch alignment of all commission proposals of the year 2012 and 2013 to complement the DGT EU TM.



Thanks in advance

No
... See more
Hi Andras,

the direct alignment of wep pages without downloading them is really the killer feature I was waiting for since a long time. I put it already several times on the feature wish list of MemoQ. This is a great achievement.


I am not so much into coding so I just ask if it would be possible to do a batch alignment of all commission proposals of the year 2012 and 2013 to complement the DGT EU TM.



Thanks in advance

Noe
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 21:56
English to Hungarian
+ ...
TOPIC STARTER
Not really Jun 11, 2013

Noe Tessmann wrote:

Hi Andras,

the direct alignment of wep pages without downloading them is really the killer feature I was waiting for since a long time. I put it already several times on the feature wish list of MemoQ. This is a great achievement.


I am not so much into coding so I just ask if it would be possible to do a batch alignment of all commission proposals of the year 2012 and 2013 to complement the DGT EU TM.



Thanks in advance

Noe


Hi, that is a feature I left out on purpose. LF Aligner only supports batch alignment of offline files (files saved to your computer). If I enabled batch downloads, it would be too easy to overload servers if too many people used the feature. Also, some webmasters frown on automated downloading even if it's not bringing their servers to their knees so it's not always easy to tell what's OK and what's not. I believe that the people running the eur-lex site don't have a problem with batch downloads but I decided to play it safe.
BTW I believe the DGT-TM does not contain COM proposals. It only contains adopted legislation.

FYI I'm in the process of generating multilingual EU mega-TMs right now. Contact me if you're interested.


 
transooner
transooner
China
non-European punctuations to split a paragraph into sentences Jul 14, 2013

The sentence splitter seems to have been made for European languages. What if I want to add new punctuation marks to split a paragraph into sentences? Such as" ;"?

 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 21:56
English to Hungarian
+ ...
TOPIC STARTER
What language? Jul 14, 2013

transooner wrote:

The sentence splitter seems to have been made for European languages. What if I want to add new punctuation marks to split a paragraph into sentences? Such as" ;"?


Well, the segmenter was indeed originally made for European languages. I wrote a completely different, much more primitive segmenter for Chinese and Japanese because the original segmenter didn't work with these languages at all. What language would you need this for? I can modify the zh/jp segmenter pretty easily.


 
transooner
transooner
China
sentence splitter, pdf Jul 15, 2013

It's Chinese. I hope you could tell us where to modify it (split-sentences.perl?) because sometimes it's better to make ";" an "anchor" after which to split sentences, sometimes it's not. So that would be more convenient.

I have a pdf problem which has already taken me hundreds of hours. I'm wondering if you could, from a programmer's perspective, shed some light on the issue.

The layout is the same with the source and translated file, that is, a paragraph in source
... See more
It's Chinese. I hope you could tell us where to modify it (split-sentences.perl?) because sometimes it's better to make ";" an "anchor" after which to split sentences, sometimes it's not. So that would be more convenient.

I have a pdf problem which has already taken me hundreds of hours. I'm wondering if you could, from a programmer's perspective, shed some light on the issue.

The layout is the same with the source and translated file, that is, a paragraph in source has a counterpart in translated file. But there are tables and blank spaces which make things tricky. When the text was supposed to go from left to right, pdf converters (to txt) arbitrarily make it from top to down. So is the case with tables. If I convert them to doc(x), tools coming with LF aligner would also make that kind of mistake. Plus, I don't want the numbers in the tables.

Could you share you wisdom with me? Thanks...

for making such a great tool.
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 21:56
English to Hungarian
+ ...
TOPIC STARTER
zh segmenter Jul 15, 2013

transooner wrote:

It's Chinese. I hope you could tell us where to modify it (split-sentences.perl?) because sometimes it's better to make ";" an "anchor" after which to split sentences, sometimes it's not. So that would be more convenient.

I have a pdf problem which has already taken me hundreds of hours. I'm wondering if you could, from a programmer's perspective, shed some light on the issue.

The layout is the same with the source and translated file, that is, a paragraph in source has a counterpart in translated file. But there are tables and blank spaces which make things tricky. When the text was supposed to go from left to right, pdf converters (to txt) arbitrarily make it from top to down. So is the case with tables. If I convert them to doc(x), tools coming with LF aligner would also make that kind of mistake. Plus, I don't want the numbers in the tables.

Could you share you wisdom with me? Thanks...

for making such a great tool.


Split-sentences.perl is the default "european" segmenter. The Chinese segmenter is in LF_aligner_XXX.pl so that's what you would need to modify (and then possibly repackage it into an exe). I could look into doing this next week but if ; is a sentence separator only some of the time, it may not be worth doing. I imagine it's not a case of "separator in this document, but not in that document". More like "separator in this sentence, but not in that one". If there is some simple algorithm that works most of the time (split if ; occurs before/after XYZ characters), that can be done.

WRT the pdf files, the problem is caused by the nature of the pdf format and possibly bad pdf authoring. There may not be a simple solution. You could try the other two pdf conversion modes (see readme). As a last resort, you could try OCR.


 
Ronja Addams-Moring
Ronja Addams-Moring  Identity Verified
Finland
Local time: 22:56
Finnish to Swedish
+ ...
Thank you, very impressed after first test Sep 3, 2013

I just tried out LF Aligner 4.04 for the first time and I must say that I am impressed. Thank you very much for this promising addition to my toolbox.

The source file in Finnish was a single-column DOCX and the target file in Swedish was a two-column PDF, which I converted on-line to DOCX and then fixed the layout and edited for about an hour to make sure that the files were as alike as I could make them.

I used the GUI tool to edit the alignment file and only needed to
... See more
I just tried out LF Aligner 4.04 for the first time and I must say that I am impressed. Thank you very much for this promising addition to my toolbox.

The source file in Finnish was a single-column DOCX and the target file in Swedish was a two-column PDF, which I converted on-line to DOCX and then fixed the layout and edited for about an hour to make sure that the files were as alike as I could make them.

I used the GUI tool to edit the alignment file and only needed to merge once before creating the TMX. I'm actually looking forward to checking the TMX and correcting the (relatively few and unsurprising) inaccuracies that were inherited from the file conversion PDF -> DOCX.

Environment: Windows 8, CAT tool where the TMX will be used: OmegaT.
Collapse


 
Pages in topic:   < [1 2 3 4 5 6 7 8 9] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

New free & open source aligner (for Windows, OS X and linux)







Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »