how to translate a PDF format document in SDL Trados or SmartCat?
Thread poster: yangjiaojiao
yangjiaojiao
yangjiaojiao
United States
Local time: 02:57
Chinese to English
+ ...
Jan 18, 2017

Hello,

I am encountering a problem while I am doing freelance translation, any idea on how to solve this problem would be very helpful.
The question is "how to translate a PDF format document in SDL Trados or SmartCat".

I got an assignment from one of my client, the document is in PDF or scanned VCR, where there are some pictures and unrecognized Chinese hand-writing. The client required the finial version be bilingual word.

So I first tried to open
... See more
Hello,

I am encountering a problem while I am doing freelance translation, any idea on how to solve this problem would be very helpful.
The question is "how to translate a PDF format document in SDL Trados or SmartCat".

I got an assignment from one of my client, the document is in PDF or scanned VCR, where there are some pictures and unrecognized Chinese hand-writing. The client required the finial version be bilingual word.

So I first tried to open this PDF document on SDL Trados. I tried the method provided by SDL Trados, https://www.youtube.com/watch?v=H2_BydE8xyw, and I did exatcly what they recommended: Go to File-Options-File type-PDF-Converter, and set the rest as the video recommend. Then I tried to open the PDF document in my SDL Trados, unfortunately I got a bunch of unrecognized symbols which cannot be translated.

Then I tried to convert the PDF document into word. I used Google Doc to convert the PDF into word, it failed, most of the converted characters are incorrect. For example Google Doc would convert Chinese" 人“ into "入”. Then I some free online converter, but they did not do a good job either. The online converter would convert the PDF document into a picture on word, where it can't be edited.

Later I tried to open this PDF document on SmartCat, it did works. However I cann't use my translation memory and glossary that I established on SDL Trados while I am working on SmartCat. Thus I want to export my SDL Trados TM and glossary to SmartCat. But the format doesn't meet the requirements on SmartCat. I was able to export the SDL TM to XML format and then upload them to SmartCat. But the problem occurred while I am trying to export the SDL glossary to SmartCat. SmartCat can only accept "XLSX or XML" File Format for importing glossary. Thus I need to first export my SDL glossary to Excel, but how to do that? I don't have SDL Multi-term and I tried http://producthelp.sdl.com/kb/Articles/3448.html too, it seems that there is not an easy way to do it.

Any ideas on how to deal with this would be very helpful.

Thank you.

Jiaojiao
Collapse


 
Frank Zou
Frank Zou  Identity Verified
China
Local time: 14:57
Member (2016)
Chinese to English
+ ...
AFAIK Jan 18, 2017

As far as I know. You can NOT translate a scanned PDF in Trados. In such cases, most often my option is find a print shop let typist do the typing work and I do the translation myself. However, if it's a PDF converted from other document format, it can be translated in trados 2015.

 
Teresa Freixinho
Teresa Freixinho  Identity Verified
Brazil
Local time: 03:57
English to Portuguese
+ ...
PDF files processed by CAT tools Jan 19, 2017

SDL Studio 2015 can handle editable PDF files. At the end it produces a Word file. I've tested it. However, it doesn't make wonders. Sometimes, you have to work hard on formatting after translation. I mean ,Studio only cares about extracting text from the PDF. If your PDF file has bullets, tables and a lot of formatting, you will have to handle these details. So, it is always preferable to produce a Word copy from the PDF file, format it well and then import it to Studio.

 
Frank Zou
Frank Zou  Identity Verified
China
Local time: 14:57
Member (2016)
Chinese to English
+ ...
Yes Jan 19, 2017

Teresa Freixinho wrote:

SDL Studio 2015 can handle editable PDF files. At the end it produces a Word file. I've tested it. However, it doesn't make wonders. Sometimes, you have to work hard on formatting after translation. I mean ,Studio only cares about extracting text from the PDF. If your PDF file has bullets, tables and a lot of formatting, you will have to handle these details. So, it is always preferable to produce a Word copy from the PDF file, format it well and then import it to Studio.


Yes. I do that quite often. There are two major problems: too many tags and messed format.
Solution: use tools to remove most tags and deal with format after translation is done.

A critical problem could be that, for a PDF-converted docx file, you may fail to "generate target translation" after translation is done in the editor, which is quite frustrating. My option is convert the PDF-overted docx file into rtf file before I add it to the Studio.


 
Teresa Freixinho
Teresa Freixinho  Identity Verified
Brazil
Local time: 03:57
English to Portuguese
+ ...
Sure Jan 19, 2017

Sure, I also think that this is the best procedure, Eradicate. "Tag soup" alone can be solved with TransTools, which is a free software. However, since you have to work hard on formatting, it's much advisable to do everything in advance and import the Word file duly formatted into Studio. I believe there is no problem if the PDF is a plain (text-only) document without tables or anything else. I did a test with a medium-size PDF file with bullets and some tables and did not become impressed with ... See more
Sure, I also think that this is the best procedure, Eradicate. "Tag soup" alone can be solved with TransTools, which is a free software. However, since you have to work hard on formatting, it's much advisable to do everything in advance and import the Word file duly formatted into Studio. I believe there is no problem if the PDF is a plain (text-only) document without tables or anything else. I did a test with a medium-size PDF file with bullets and some tables and did not become impressed with the final result.Collapse


 
Ben Senior
Ben Senior  Identity Verified
Germany
Local time: 07:57
German to English
Emma's blog Jan 19, 2017

Yesterday Emma Goldsmith published her latest blog article on how to translate PDF files in Studio. It's well worth a look and as usual extremely well written. Here's the link
https://signsandsymptomsoftranslation.com/2017/01/18/pdf/

Ben


 
Emma Goldsmith
Emma Goldsmith  Identity Verified
Spain
Local time: 07:57
Member (2004)
Spanish to English
Thanks Jan 19, 2017

Ben Senior wrote:

Yesterday Emma Goldsmith published her latest blog article on how to translate PDF files in Studio. It's well worth a look and as usual extremely well written. Here's the link
https://signsandsymptomsoftranslation.com/2017/01/18/pdf/

Ben


Thanks for your kind words, Ben.
Just a few points re other comments in this thread:
- Studio can process editable PDFs from 2011 onwards, and scanned PDFs in 14 languages from 2015 onwards. Chinese isn't one of those languages.
- Errors not being able to generate target translation. I recommend always doing "save as" as soon as you open your document in the Editor window. That way you know whether you'll have problems saving as at the end of the translation.
- Multiterm is a separate program (with a separate download) but comes bundled with all Studio versions except Studio Starter.
- To export a termbase to Excel, it's easiest to use the Glossary Converter app: http://appstore.sdl.com/app/glossary-converter/195/


 
Teresa Freixinho
Teresa Freixinho  Identity Verified
Brazil
Local time: 03:57
English to Portuguese
+ ...
Hi Emma! Jan 19, 2017

Hi Emma,

After posting my comment above I thought about that article you wrote on PDF files processed by Studio. I am a big fan of your blog. You do a great job!

Greetings from a Brazilian colleague!


 
Georgi Kovachev
Georgi Kovachev  Identity Verified
Bulgaria
Local time: 08:57
Member (2010)
English to Bulgarian
+ ...
TransPDF Jan 20, 2017

You could also consider TransPDF (http://www.iceni.com/transpdf.htm?utm_source=TransPDF%20Users&utm_campaign=79ea23f462-EMAIL_CAMPAIGN_2017_01_17&utm_medium=email&utm_term=0_65abd5abd9-79ea23f462-68156569) – they have recently added OCR to the functionality of their... See more
You could also consider TransPDF (http://www.iceni.com/transpdf.htm?utm_source=TransPDF%20Users&utm_campaign=79ea23f462-EMAIL_CAMPAIGN_2017_01_17&utm_medium=email&utm_term=0_65abd5abd9-79ea23f462-68156569) – they have recently added OCR to the functionality of their software.

I hope Chinese is a supported language, though there is no information on their page.
Collapse


 
yangjiaojiao
yangjiaojiao
United States
Local time: 02:57
Chinese to English
+ ...
TOPIC STARTER
the OCR in SDL Trados 2016 doesn't recognize Chinese. Jan 26, 2017

Ben Senior wrote:

Yesterday Emma Goldsmith published her latest blog article on how to translate PDF files in Studio. It's well worth a look and as usual extremely well written. Here's the link
https://signsandsymptomsoftranslation.com/2017/01/18/pdf/

Ben


Thank you Ben, that is a very good blog. Unfortunately, the blog posted the OCR recognizance function is only available for a certain languages: Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish and Turkish. I do not think it works on Chinese, indeed I had never be able to successfully open a PDF with Chinese as the original language on Trados.


 
yangjiaojiao
yangjiaojiao
United States
Local time: 02:57
Chinese to English
+ ...
TOPIC STARTER
Thanks Jan 26, 2017

Georgi Kovachev wrote:

You could also consider TransPDF (http://www.iceni.com/transpdf.htm?utm_source=TransPDF%20Users&utm_campaign=79ea23f462-EMAIL_CAMPAIGN_2017_01_17&utm_medium=email&utm_term=0_65abd5abd9-79ea23f462-68156569) – they have recently added OCR to the functionality of their software.

I hope Chinese is a supported language, though there is no information on their page.



I just log in to that website. I did not find they do PDF to word conversion. They have the option to upload my document for translation. By the way, they have Chinese (Hong Kong, Singapore and Taiwan) but do not have Mandarin Chinese.


 
James Reiser (X)
James Reiser (X)  Identity Verified
United States
Local time: 02:57
Chinese to English
ABBYY FineReader 12 and Scanned PDFs Jan 26, 2017

I've had a client recently that provided PDFs of images of Chinese documents that appeared to be taken with a smartphone. I used ABBYY FineReader 12 to process these images and it did a surprisingly good job getting most of it into a Word document I could then load into SDL Trados 2017. I had to do some post editing in FineReader and the tool was very helpful in guessing possible garbles. You can shop around for various prices for the software.

 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

how to translate a PDF format document in SDL Trados or SmartCat?







CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »