https://www.proz.com/forum/general_technical_issues/170341-pdfs_and_new_cat_tools.html

PDFs and new CAT tools
Thread poster: Miroslav Jeftic
Miroslav Jeftic
Miroslav Jeftic  Identity Verified
Local time: 17:17
Member (2009)
English to Serbian
+ ...
May 10, 2010

Recently I've noticed several CATs in their latest versions (from SDL, Alchemy, etc) promise PDF support. I haven't tried any of them, but it doesn't sound very convincing to me. Has anyone tried them out, any truth in all that or OCR is still the way to go?

I don't doubt that a text-only PDF will probably go well, but I really would like to hear if anyone tried to load a complex PDF, text with a lot of pictures, tables, pictures in tables, etc and what kind of result was produced i
... See more
Recently I've noticed several CATs in their latest versions (from SDL, Alchemy, etc) promise PDF support. I haven't tried any of them, but it doesn't sound very convincing to me. Has anyone tried them out, any truth in all that or OCR is still the way to go?

I don't doubt that a text-only PDF will probably go well, but I really would like to hear if anyone tried to load a complex PDF, text with a lot of pictures, tables, pictures in tables, etc and what kind of result was produced in the end.

[Edited at 2010-05-10 09:53 GMT]
Collapse


 
Stanislav Pokorny
Stanislav Pokorny  Identity Verified
Czech Republic
Local time: 17:17
English to Czech
+ ...
Very limited May 10, 2010

Hi Miroslav,
in my experience, the PDF filter (in fact an OCR add-in) in SDL Studio works quite well in the following scenario:
- PDF with a text layer
- text only or a picture now and then
- no complex tables
- no tight layout
- small size

It won't work for scanned PDFs, PDFs with tight layout or large (several MB) PDFs. Moreover, the converted PDF is usually full of tags, most of them unnecessary of course. So, I still prefer the traditional met
... See more
Hi Miroslav,
in my experience, the PDF filter (in fact an OCR add-in) in SDL Studio works quite well in the following scenario:
- PDF with a text layer
- text only or a picture now and then
- no complex tables
- no tight layout
- small size

It won't work for scanned PDFs, PDFs with tight layout or large (several MB) PDFs. Moreover, the converted PDF is usually full of tags, most of them unnecessary of course. So, I still prefer the traditional method:
1. Getting the editable source files, if possible.
2. If the client fails to provide me with them, I run an OCR, "clean" the converted text in terms of removing any redundant formatting and, finally, translate.
Collapse


 
Sushan Harshe
Sushan Harshe
India
Local time: 20:47
English to Hindi
+ ...
In Studio2009, it works as follows May 10, 2010

Hi Miroslav,

It is very simple to open .pdf in studio2009

[img]http://www.public.fotki.com/legalads/pdf-to-studio/1.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/2.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/3.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/5.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/6.html[/img]

the li
... See more
Hi Miroslav,

It is very simple to open .pdf in studio2009

[img]http://www.public.fotki.com/legalads/pdf-to-studio/1.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/2.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/3.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/5.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/6.html[/img]

the links for snapshots of process are here above; but I don't know why it is not showing the snaps specially taken and uploaded for you.

anyway its a public album!

Regards,

Sushan








[Edited at 2010-05-10 11:20 GMT]

[Edited at 2010-05-10 11:27 GMT]
Collapse


 
Miroslav Jeftic
Miroslav Jeftic  Identity Verified
Local time: 17:17
Member (2009)
English to Serbian
+ ...
TOPIC STARTER
:) May 10, 2010

Thanks Stanislav! I guess it is as I have thought, we are still far away from good support for 10MB+ worth of scanned pages, unfortunately.

 
Kristyna Marrero
Kristyna Marrero  Identity Verified
United States
Local time: 11:17
Try the latest version of WORDFAST ANYWHERE with support for scanned PDFs Apr 12, 2011

Hi Miroslav,

Last week, we released a new version of Wordfast Anywhere which features support for scanned PDFs. Using server-side OCR technology, translators have the ability to upload and convert scanned PDFs to RTF for translation.

Wordfast Anywhere is the world's leading web-based translation memory tool. It is offered free to all translators. As always, all content that you upload remains completely confidential inside of your private, password-protected workspac
... See more
Hi Miroslav,

Last week, we released a new version of Wordfast Anywhere which features support for scanned PDFs. Using server-side OCR technology, translators have the ability to upload and convert scanned PDFs to RTF for translation.

Wordfast Anywhere is the world's leading web-based translation memory tool. It is offered free to all translators. As always, all content that you upload remains completely confidential inside of your private, password-protected workspace. We invite you to try Wordfast Anywhere today at www.FreeTM.com.

Hope this helps,

Kristyna Marrero
Director, Sales & Marketing
Collapse


 
Miroslav Jeftic
Miroslav Jeftic  Identity Verified
Local time: 17:17
Member (2009)
English to Serbian
+ ...
TOPIC STARTER
:) Apr 12, 2011

Hi Kristyna,

Actually I have tried Wordfast Anywhere, few days ago I think, and while it was ok with the simpler pdfs I uploaded, as soon as I tried one of my "difficult" ones it returned conversion error


 
Michal Glowacki
Michal Glowacki  Identity Verified
Poland
Local time: 17:17
Member (2010)
English to Polish
+ ...
CATs don't like PDFs Apr 13, 2011

As far as I know even if a currently developed CAT "handles" PDFs the best you can get is the same result as when using your own OCR or a TXT copy of the text. I wouldn't expect this to change any soon. And no wonders, we need to remember that PDF was actually designed to be uneditable. I think most boasting about PDF handling is just marketing and sales, which crumbles easily when put into real action.

 
Miroslav Jeftic
Miroslav Jeftic  Identity Verified
Local time: 17:17
Member (2009)
English to Serbian
+ ...
TOPIC STARTER
:) Apr 13, 2011

Michal Glowacki wrote:

As far as I know even if a currently developed CAT "handles" PDFs the best you can get is the same result as when using your own OCR or a TXT copy of the text. I wouldn't expect this to change any soon. And no wonders, we need to remember that PDF was actually designed to be uneditable. I think most boasting about PDF handling is just marketing and sales, which crumbles easily when put into real action.


Fully agree


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

PDFs and new CAT tools






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »