Help requested: PDF to Word: How to copy tons of numbers Thread poster: David Jessop
|
Hello, I am working on a 10,000 word Spanish to English translation in which perhaps 4,000 of the “words” are actually numbers and statistics from a whole mess of tables dispersed throughout the document. To make matters even more challenging, they don´t copy or end up in a text dump with Acrobat Reader. Does anyone have an idea of how to semi-automate this process so I do not spend all my time in Word typing out numbers into the translated document? Is the only option to open ... See more Hello, I am working on a 10,000 word Spanish to English translation in which perhaps 4,000 of the “words” are actually numbers and statistics from a whole mess of tables dispersed throughout the document. To make matters even more challenging, they don´t copy or end up in a text dump with Acrobat Reader. Does anyone have an idea of how to semi-automate this process so I do not spend all my time in Word typing out numbers into the translated document? Is the only option to open Illustrator and select the text and copy it as raw text or is there a better way? Any feedback is appreciated. Best, David ▲ Collapse | | | Fine Reader is the answer | Apr 6, 2009 |
Hi, David, Fine Reader will surely recognise all your numbers and even tables! | | |
David Jessop wrote: ... they don´t copy or end up in a text dump with Acrobat Reader... There are two major kinds of PDF files: distilled and scanned. From what you said above, I guess yours has been scanned. In this case you'll need OCR (Optical Character Recognition) software, such as OmniPage, ABBYY, ReadIris or some other, to convert those "pictures" back into text. If your file has been distilled, i.e. it was converted into PDF by, e.g. "printing" from MS Word (or any other program) to Acrobat Distiller, InFix, from http://www.iceni.com is a PDF editor. You may keep the tables as they are, and edit the text alone on the PDF itself. | | |
If you have the original text in Word, you might just overwrite the original with the target text and therefore leave the numbers just as they are. If you have the original text in a picture format or on paper, you might use Abby FineReader http://www.abbyy.com to solve this. HTH | |
|
|
but add "Numbers" to the list of languages used in ORC for better results. | | | Try Select Table and Copy Table Option | Apr 6, 2009 |
Unless the pdf fiel is a scanned document, I usually use the select table and copy table option in Acrobat Reader itself, when I have to work on tables with lots of numerical figures. | | | Heinrich Pesch Finland Local time: 14:14 Member (2003) Finnish to German + ... Communicate with the customer | Apr 6, 2009 |
Tell them if they send you the original file (pdf is never an original file format) they can save lots of money as you only have to translate the text and can leave the numbers alone. If they are not reasonalbe: Finereader is a possibility, but the formatting will be changed and it might look strange. But if the content in the tables must be translated there is no other way. If you do not need to change anything in the table you might use snapshot software to take a pic... See more Tell them if they send you the original file (pdf is never an original file format) they can save lots of money as you only have to translate the text and can leave the numbers alone. If they are not reasonalbe: Finereader is a possibility, but the formatting will be changed and it might look strange. But if the content in the tables must be translated there is no other way. If you do not need to change anything in the table you might use snapshot software to take a picture from each table and insert it into a Word file. Regards Heinrich ▲ Collapse | | | Use AutoUnbreak | Apr 6, 2009 |
AutoUnbreak lets you copy content from within a PDF and retain the formatting, too. AutoUnbreak copies tables, no problem. The output is an RTF file. http://digital.hollmen.dk/products/autounbreak/index.htm Make sure you read the instructions on the website, though, to ensure you don't run into any trouble. | |
|
|
Mikhail Popov Montenegro Local time: 13:14 English to Russian + ... Solid Converter PDF | Apr 6, 2009 |
Solid Converter is a nice program, if your PDF document was distilled so it contains text symbols and numbers, not pictures. If you document is just a package of scanned papers, use ABBYY FineReader - it's the best program in such case. | | | Copy and paste | Apr 6, 2009 |
Do you need to edit these numbers? If not, you can use an Acrobat Reader tool to copy and paste them as pictures into your Word file. Here's how to: (I have got a German version only, so I do not know wether the items are exactly as I translate them) Go to the "Tools" (German: Werkzeuge) menu, choose "Select and zoom" (Auswählen und zoomen, should be the first option in the pulldown menu) and then "Snapshot tool" (Schnappschuss-WErkzeug, should be the bottom item... See more Do you need to edit these numbers? If not, you can use an Acrobat Reader tool to copy and paste them as pictures into your Word file. Here's how to: (I have got a German version only, so I do not know wether the items are exactly as I translate them) Go to the "Tools" (German: Werkzeuge) menu, choose "Select and zoom" (Auswählen und zoomen, should be the first option in the pulldown menu) and then "Snapshot tool" (Schnappschuss-WErkzeug, should be the bottom item) Then go to the text you want to copy and draw a selection frame around it by placing the selection tool in one corner and - with mouse button pressed - moving the cursor over the entire text. Release the mouse button when all text to be copied is within the frame, it will then be copied to the clipboard. From there you can simply paste into your Word document. There, you can edit it as any graphic (right-click and select from context menu) Hope this helps. ▲ Collapse | | | trebla Canada Local time: 07:14 French to English
Whoever invented PDF should be taken out and shot! I spend more of my valuable time massaging these &^%$# PDF files than I can count - all of it unpaid for, of course. When a file is a real mess, (i.e. it won't convert to Word without a struggle), I either use PDF Converter from Nuance or print the pages out, mask what I don't want, and scan them in with OmniPage. Then I block everything on, and press CTRL/SHIFT N to get rid of all the formatting. Finally,... See more Whoever invented PDF should be taken out and shot! I spend more of my valuable time massaging these &^%$# PDF files than I can count - all of it unpaid for, of course. When a file is a real mess, (i.e. it won't convert to Word without a struggle), I either use PDF Converter from Nuance or print the pages out, mask what I don't want, and scan them in with OmniPage. Then I block everything on, and press CTRL/SHIFT N to get rid of all the formatting. Finally, I go through and reconstruct the original in word, putting the original formatting back in. ▲ Collapse | | | Questions to trebla | Apr 7, 2009 |
I am wondering about two things: 1. Why don't you charge for the extra time required to massage those PDFs? 2. Why don't you simply process your original PDF file using OmniPage? OmniPage is really efficient and can produce great results, although you really have to invest some time to learn to use it right. You seem to be going into a lot of unnecessary trouble to create your editable versions...
[Edited at 2009-04-07 15:36 GMT] | |
|
|
burn at the stake | Apr 7, 2009 |
trebla wrote: Whoever invented PDF should be taken out and shot! I spend more of my valuable time massaging these &^%$# PDF files than I can count - all of it unpaid for, of course. When a file is a real mess, (i.e. it won't convert to Word without a struggle), I either use PDF Converter from Nuance or print the pages out, mask what I don't want, and scan them in with OmniPage. Then I block everything on, and press CTRL/SHIFT N to get rid of all the formatting. Finally, I go through and reconstruct the original in word, putting the original formatting back in. Fully agree with the sentiment. Pdf is my worst enemy. If I can't copy and paste (scanned doc) I usually just resign myself to having to work from the pdf itself. I'm not really a fan of OCR in general, although that's starting to change. | | | David Jessop Laos Member Spanish to English + ... TOPIC STARTER
Thank you for everyone´s tips! I ended up using ABBYY FineReader. This did the trick for some of the tables but others came out really badly when transferring to Word. I had to spend a lot of time recreating them. Best, David | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Help requested: PDF to Word: How to copy tons of numbers Wordfast Pro | Translation Memory Software for Any Platform
Exclusive discount for ProZ.com users!
Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value
Buy now! » |
| CafeTran Espresso | You've never met a CAT tool this clever!
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Buy now! » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |