PDFs and new CAT tools
Autor de la hebra: Miroslav Jeftic

Miroslav Jeftic  Identity Verified
Local time: 06:02
Miembro 2009
inglés al serbio
+ ...
May 10, 2010

Recently I've noticed several CATs in their latest versions (from SDL, Alchemy, etc) promise PDF support. I haven't tried any of them, but it doesn't sound very convincing to me. Has anyone tried them out, any truth in all that or OCR is still the way to go?

I don't doubt that a text-only PDF will probably go well, but I really would like to hear if anyone tried to load a complex PDF, text with a lot of pictures, tables, pictures in tables, etc and what kind of result was produced i
... See more
Recently I've noticed several CATs in their latest versions (from SDL, Alchemy, etc) promise PDF support. I haven't tried any of them, but it doesn't sound very convincing to me. Has anyone tried them out, any truth in all that or OCR is still the way to go?

I don't doubt that a text-only PDF will probably go well, but I really would like to hear if anyone tried to load a complex PDF, text with a lot of pictures, tables, pictures in tables, etc and what kind of result was produced in the end.

[Edited at 2010-05-10 09:53 GMT]
Collapse


 

Stanislav Pokorny  Identity Verified
República Checa
Local time: 06:02
inglés al checo
+ ...
Very limited May 10, 2010

Hi Miroslav,
in my experience, the PDF filter (in fact an OCR add-in) in SDL Studio works quite well in the following scenario:
- PDF with a text layer
- text only or a picture now and then
- no complex tables
- no tight layout
- small size

It won't work for scanned PDFs, PDFs with tight layout or large (several MB) PDFs. Moreover, the converted PDF is usually full of tags, most of them unnecessary of course. So, I still prefer the traditional met
... See more
Hi Miroslav,
in my experience, the PDF filter (in fact an OCR add-in) in SDL Studio works quite well in the following scenario:
- PDF with a text layer
- text only or a picture now and then
- no complex tables
- no tight layout
- small size

It won't work for scanned PDFs, PDFs with tight layout or large (several MB) PDFs. Moreover, the converted PDF is usually full of tags, most of them unnecessary of course. So, I still prefer the traditional method:
1. Getting the editable source files, if possible.
2. If the client fails to provide me with them, I run an OCR, "clean" the converted text in terms of removing any redundant formatting and, finally, translate.
Collapse


 

Sushan Harshe
India
Local time: 10:32
inglés al hindi
+ ...
In Studio2009, it works as follows May 10, 2010

Hi Miroslav,

It is very simple to open .pdf in studio2009

[img]http://www.public.fotki.com/legalads/pdf-to-studio/1.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/2.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/3.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/5.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/6.html[/img]

the li
... See more
Hi Miroslav,

It is very simple to open .pdf in studio2009

[img]http://www.public.fotki.com/legalads/pdf-to-studio/1.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/2.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/3.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/5.html[/img]

[img]http://www.public.fotki.com/legalads/pdf-to-studio/6.html[/img]

the links for snapshots of process are here above; but I don't know why it is not showing the snaps specially taken and uploaded for you.

anyway its a public album!

Regards,

Sushan








[Edited at 2010-05-10 11:20 GMT]

[Edited at 2010-05-10 11:27 GMT]
Collapse


 

Miroslav Jeftic  Identity Verified
Local time: 06:02
Miembro 2009
inglés al serbio
+ ...
PERSONA QUE INICIÓ LA HEBRA
:) May 10, 2010

Thanks Stanislav! I guess it is as I have thought, we are still far away from good support for 10MB+ worth of scanned pages, unfortunately.

 

Kristyna Marrero  Identity Verified
Estados Unidos
Local time: 00:02
Try the latest version of WORDFAST ANYWHERE with support for scanned PDFs Apr 12, 2011

Hi Miroslav,

Last week, we released a new version of Wordfast Anywhere which features support for scanned PDFs. Using server-side OCR technology, translators have the ability to upload and convert scanned PDFs to RTF for translation.

Wordfast Anywhere is the world's leading web-based translation memory tool. It is offered free to all translators. As always, all content that you upload remains completely confidential inside of your private, password-protected workspac
... See more
Hi Miroslav,

Last week, we released a new version of Wordfast Anywhere which features support for scanned PDFs. Using server-side OCR technology, translators have the ability to upload and convert scanned PDFs to RTF for translation.

Wordfast Anywhere is the world's leading web-based translation memory tool. It is offered free to all translators. As always, all content that you upload remains completely confidential inside of your private, password-protected workspace. We invite you to try Wordfast Anywhere today at www.FreeTM.com.

Hope this helps,

Kristyna Marrero
Director, Sales & Marketing
Collapse


 

Miroslav Jeftic  Identity Verified
Local time: 06:02
Miembro 2009
inglés al serbio
+ ...
PERSONA QUE INICIÓ LA HEBRA
:) Apr 12, 2011

Hi Kristyna,

Actually I have tried Wordfast Anywhere, few days ago I think, and while it was ok with the simpler pdfs I uploaded, as soon as I tried one of my "difficult" ones it returned conversion error


 

Michal Glowacki  Identity Verified
Polonia
Local time: 06:02
Miembro 2010
inglés al polaco
+ ...
CATs don't like PDFs Apr 13, 2011

As far as I know even if a currently developed CAT "handles" PDFs the best you can get is the same result as when using your own OCR or a TXT copy of the text. I wouldn't expect this to change any soon. And no wonders, we need to remember that PDF was actually designed to be uneditable. I think most boasting about PDF handling is just marketing and sales, which crumbles easily when put into real action.

 

Miroslav Jeftic  Identity Verified
Local time: 06:02
Miembro 2009
inglés al serbio
+ ...
PERSONA QUE INICIÓ LA HEBRA
:) Apr 13, 2011

Michal Glowacki wrote:

As far as I know even if a currently developed CAT "handles" PDFs the best you can get is the same result as when using your own OCR or a TXT copy of the text. I wouldn't expect this to change any soon. And no wonders, we need to remember that PDF was actually designed to be uneditable. I think most boasting about PDF handling is just marketing and sales, which crumbles easily when put into real action.


Fully agree


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

PDFs and new CAT tools

Advanced search






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use SDL Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

More info »



Forums
  • All of ProZ.com
  • Búsqueda de términos
  • Trabajos
  • Foros
  • Multiple search