Help requested: PDF to Word: How to copy tons of numbers
Thread poster: David Jessop
David Jessop
David Jessop  Identity Verified
Laos
Member
Spanish to English
+ ...
Apr 6, 2009

Hello,

I am working on a 10,000 word Spanish to English translation in which perhaps 4,000 of the “words” are actually numbers and statistics from a whole mess of tables dispersed throughout the document. To make matters even more challenging, they don´t copy or end up in a text dump with Acrobat Reader. Does anyone have an idea of how to semi-automate this process so I do not spend all my time in Word typing out numbers into the translated document? Is the only option to open
... See more
Hello,

I am working on a 10,000 word Spanish to English translation in which perhaps 4,000 of the “words” are actually numbers and statistics from a whole mess of tables dispersed throughout the document. To make matters even more challenging, they don´t copy or end up in a text dump with Acrobat Reader. Does anyone have an idea of how to semi-automate this process so I do not spend all my time in Word typing out numbers into the translated document? Is the only option to open Illustrator and select the text and copy it as raw text or is there a better way? Any feedback is appreciated.

Best,
David
Collapse


 
Vladislav Badalov
Vladislav Badalov  Identity Verified
Russian Federation
Local time: 14:14
Russian to English
+ ...
Fine Reader is the answer Apr 6, 2009

Hi, David,

Fine Reader will surely recognise all your numbers and even tables!


 
José Henrique Lamensdorf
José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 08:14
English to Portuguese
+ ...
In memoriam
OCR or InFix Apr 6, 2009

David Jessop wrote:
... they don´t copy or end up in a text dump with Acrobat Reader...


There are two major kinds of PDF files: distilled and scanned.

From what you said above, I guess yours has been scanned. In this case you'll need OCR (Optical Character Recognition) software, such as OmniPage, ABBYY, ReadIris or some other, to convert those "pictures" back into text.

If your file has been distilled, i.e. it was converted into PDF by, e.g. "printing" from MS Word (or any other program) to Acrobat Distiller, InFix, from http://www.iceni.com is a PDF editor. You may keep the tables as they are, and edit the text alone on the PDF itself.


 
Mihaela BUFNILA
Mihaela BUFNILA  Identity Verified
Romania
Local time: 14:14
English to Romanian
+ ...
Overwriting Apr 6, 2009

If you have the original text in Word, you might just overwrite the original with the target text and therefore leave the numbers just as they are.

If you have the original text in a picture format or on paper, you might use Abby FineReader http://www.abbyy.com to solve this.

HTH


 
Sergei Leshchinsky
Sergei Leshchinsky  Identity Verified
Ukraine
Local time: 14:14
Member (2008)
English to Russian
+ ...
FineReader Apr 6, 2009

but add "Numbers" to the list of languages used in ORC for better results.

 
Sangeeta Joshi
Sangeeta Joshi  Identity Verified
India
Local time: 16:44
German to English
Try Select Table and Copy Table Option Apr 6, 2009

Unless the pdf fiel is a scanned document, I usually use the select table and copy table option in Acrobat Reader itself, when I have to work on tables with lots of numerical figures.

 
Heinrich Pesch
Heinrich Pesch  Identity Verified
Finland
Local time: 14:14
Member (2003)
Finnish to German
+ ...
Communicate with the customer Apr 6, 2009

Tell them if they send you the original file (pdf is never an original file format) they can save lots of money as you only have to translate the text and can leave the numbers alone.

If they are not reasonalbe: Finereader is a possibility, but the formatting will be changed and it might look strange. But if the content in the tables must be translated there is no other way.

If you do not need to change anything in the table you might use snapshot software to take a pic
... See more
Tell them if they send you the original file (pdf is never an original file format) they can save lots of money as you only have to translate the text and can leave the numbers alone.

If they are not reasonalbe: Finereader is a possibility, but the formatting will be changed and it might look strange. But if the content in the tables must be translated there is no other way.

If you do not need to change anything in the table you might use snapshot software to take a picture from each table and insert it into a Word file.

Regards
Heinrich
Collapse


 
Viktoria Gimbe
Viktoria Gimbe  Identity Verified
Canada
Local time: 07:14
English to French
+ ...
Use AutoUnbreak Apr 6, 2009

AutoUnbreak lets you copy content from within a PDF and retain the formatting, too. AutoUnbreak copies tables, no problem. The output is an RTF file.

http://digital.hollmen.dk/products/autounbreak/index.htm

Make sure you read the instructions on the website, though, to ensure you don't run into any trouble.


 
Mikhail Popov
Mikhail Popov
Montenegro
Local time: 13:14
English to Russian
+ ...
Solid Converter PDF Apr 6, 2009

Solid Converter is a nice program, if your PDF document was distilled so it contains text symbols and numbers, not pictures.
If you document is just a package of scanned papers, use ABBYY FineReader - it's the best program in such case.


 
Marcus Geibel
Marcus Geibel
Germany
Local time: 13:14
English to German
Copy and paste Apr 6, 2009

Do you need to edit these numbers?
If not, you can use an Acrobat Reader tool to copy and paste them as pictures into your Word file. Here's how to:

(I have got a German version only, so I do not know wether the items are exactly as I translate them)

Go to the "Tools" (German: Werkzeuge) menu, choose "Select and zoom" (Auswählen und zoomen, should be the first option in the pulldown menu) and then "Snapshot tool" (Schnappschuss-WErkzeug, should be the bottom item
... See more
Do you need to edit these numbers?
If not, you can use an Acrobat Reader tool to copy and paste them as pictures into your Word file. Here's how to:

(I have got a German version only, so I do not know wether the items are exactly as I translate them)

Go to the "Tools" (German: Werkzeuge) menu, choose "Select and zoom" (Auswählen und zoomen, should be the first option in the pulldown menu) and then "Snapshot tool" (Schnappschuss-WErkzeug, should be the bottom item)

Then go to the text you want to copy and draw a selection frame around it by placing the selection tool in one corner and - with mouse button pressed - moving the cursor over the entire text. Release the mouse button when all text to be copied is within the frame, it will then be copied to the clipboard.

From there you can simply paste into your Word document.
There, you can edit it as any graphic (right-click and select from context menu)

Hope this helps.
Collapse


 
trebla
trebla
Canada
Local time: 07:14
French to English
PDF Problems Apr 7, 2009

Whoever invented PDF should be taken out and shot!

I spend more of my valuable time massaging these &^%$# PDF files than I can count - all of it unpaid for, of course.

When a file is a real mess, (i.e. it won't convert to Word without a struggle), I either use PDF Converter from Nuance or print the pages out, mask what I don't want, and scan them in with OmniPage. Then I block everything on, and press CTRL/SHIFT N to get rid of all the formatting.

Finally,
... See more
Whoever invented PDF should be taken out and shot!

I spend more of my valuable time massaging these &^%$# PDF files than I can count - all of it unpaid for, of course.

When a file is a real mess, (i.e. it won't convert to Word without a struggle), I either use PDF Converter from Nuance or print the pages out, mask what I don't want, and scan them in with OmniPage. Then I block everything on, and press CTRL/SHIFT N to get rid of all the formatting.

Finally, I go through and reconstruct the original in word, putting the original formatting back in.
Collapse


 
Viktoria Gimbe
Viktoria Gimbe  Identity Verified
Canada
Local time: 07:14
English to French
+ ...
Questions to trebla Apr 7, 2009

I am wondering about two things:

1. Why don't you charge for the extra time required to massage those PDFs?
2. Why don't you simply process your original PDF file using OmniPage? OmniPage is really efficient and can produce great results, although you really have to invest some time to learn to use it right. You seem to be going into a lot of unnecessary trouble to create your editable versions...

[Edited at 2009-04-07 15:36 GMT]


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 13:14
English to Hungarian
+ ...
burn at the stake Apr 7, 2009

trebla wrote:

Whoever invented PDF should be taken out and shot!

I spend more of my valuable time massaging these &^%$# PDF files than I can count - all of it unpaid for, of course.

When a file is a real mess, (i.e. it won't convert to Word without a struggle), I either use PDF Converter from Nuance or print the pages out, mask what I don't want, and scan them in with OmniPage. Then I block everything on, and press CTRL/SHIFT N to get rid of all the formatting.

Finally, I go through and reconstruct the original in word, putting the original formatting back in.


Fully agree with the sentiment. Pdf is my worst enemy.
If I can't copy and paste (scanned doc) I usually just resign myself to having to work from the pdf itself. I'm not really a fan of OCR in general, although that's starting to change.


 
David Jessop
David Jessop  Identity Verified
Laos
Member
Spanish to English
+ ...
TOPIC STARTER
Thanks! Apr 16, 2009

Thank you for everyone´s tips! I ended up using ABBYY FineReader. This did the trick for some of the tables but others came out really badly when transferring to Word. I had to spend a lot of time recreating them.

Best,
David


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

Help requested: PDF to Word: How to copy tons of numbers






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »