How to extract text from Acrobat Reader
Thread poster: Céline Graciet
Céline Graciet
Céline Graciet
Local time: 22:56
English to French
Jan 15, 2003

Hi everyone, I\'m trying to extract text from a PDF file, but to no avail. It won\'t let me highlight or select any of it. I even tried scanning it to then send the picture to Word, but the result was major mumble jumble. Hope the more technically minded amongst you will come to my rescue !

 
Natalie
Natalie  Identity Verified
Poland
Local time: 23:56
Member (2002)
English to Russian
+ ...

MODERATOR
SITE LOCALIZER
Hi Celine, Jan 15, 2003

I am pretty sure that your pdf file is in fact a grafical pdf, so the best way would be opening it in an OCR application able of reading pdfs (for example, FineReader 6) and convert it to text. You may contact me privately if you need any technical help.



Best,

Natalia


 
Endre Both
Endre Both  Identity Verified
Germany
Local time: 23:56
English to German
No way to get around scanning Jan 15, 2003

...or, more precisely, reading the PDF into a character recognition (OCR) software, if your PDF is an all graphics file (indicated by the impossibility of highlighting text).



The results of course depend on your OCR software and the settings you apply before recognition.



In any case, the procedure is likely to involve a lot of work (I\'ve just spent a few hours on a similar task) and only pays off if the text contains lots of repetitions and you can use
... See more
...or, more precisely, reading the PDF into a character recognition (OCR) software, if your PDF is an all graphics file (indicated by the impossibility of highlighting text).



The results of course depend on your OCR software and the settings you apply before recognition.



In any case, the procedure is likely to involve a lot of work (I\'ve just spent a few hours on a similar task) and only pays off if the text contains lots of repetitions and you can use a CAT software afterwards. Otherwise, just use a printout and type the translation into Word.



A basic rule of mine, BTW: no discounts for repetitions in PDF texts.



Feel free to get in touch with me directly if you think I can help you.



Endre

EB Communications
Collapse


 
TService (X)
TService (X)  Identity Verified
Local time: 23:56
English to German
Three possible reasons. Jan 16, 2003

1) There are some kinds of protected PDFs around, allowing you to view the contents only and preventing any attempt to copy.

Solution: Request an unprotected version.



2) Some PDFs cannot be opened correctly with the free version of Acrobat Reader.

Solution: Get Acrobat 5 - but it\'s quite costly.



3) Some PDFs just show \"garbage\" when copied and pasted into another application.

Solution: Contact me; I wrote a tiny algorithm
... See more
1) There are some kinds of protected PDFs around, allowing you to view the contents only and preventing any attempt to copy.

Solution: Request an unprotected version.



2) Some PDFs cannot be opened correctly with the free version of Acrobat Reader.

Solution: Get Acrobat 5 - but it\'s quite costly.



3) Some PDFs just show \"garbage\" when copied and pasted into another application.

Solution: Contact me; I wrote a tiny algorithm to decode that \"garbage\" using MS Access.
Collapse


 
monitor
monitor  Identity Verified
Local time: 23:56
English to German
+ ...
more than one solution Jan 16, 2003

Hi Céline

- first you should try to find out whether your actual pdf is copy protected. If this is the case safe the file under a new file name which in most cases removes the protect mode. In order to do so you need to have Adobe Acrobat, so not just the Reader.

- In Adobe Acrobat you can safe text directly while exporting into an rtf-file.

- you should also consider Gemini solo, a file / image extraction tool from inceni.com, which can be downloaded as trial vers
... See more
Hi Céline

- first you should try to find out whether your actual pdf is copy protected. If this is the case safe the file under a new file name which in most cases removes the protect mode. In order to do so you need to have Adobe Acrobat, so not just the Reader.

- In Adobe Acrobat you can safe text directly while exporting into an rtf-file.

- you should also consider Gemini solo, a file / image extraction tool from inceni.com, which can be downloaded as trial version for free (restricted usage) but it works.

Hope this is all fine for you

Kind Regards

Marcel

The protect mode cannot be ommited by using Acrobat Reader!

[ This Message was edited by:on2003-01-16 09:17]
Collapse


 
Céline Graciet
Céline Graciet
Local time: 22:56
English to French
TOPIC STARTER
thanks! Jan 16, 2003

Following on some of your advice, I downloaded a freeware OCR. Ok, it didn\'t work (wouldn\'t save my document as a Word doc) but it was good to try! It\'s called WebOCR and seems really good, if you can make it work...

 
dkalinic
dkalinic
Local time: 23:56
Croatian to German
+ ...
In memoriam
Abbyy FineReader works fine with PDF files Jan 16, 2003

You might try using Abbyy FineReader. It reads and extracts PDF files as Word documents. The graphics stays there too.



Greetings,

Davor


 
monitor
monitor  Identity Verified
Local time: 23:56
English to German
+ ...
Abbyy is it!!! Jan 17, 2003

After the last comment I went to the bookstore bought Fine Reader and had it installed on my notebook.

I took a 24 pages corporate brochure in pdf and had it imported and extracted into word 2000.

Wow!!! Never seen that before. Buy version 6.0 with that new feature and you are safe, once and forever

Marcel


 
Simona Oliva (X)
Simona Oliva (X)
France
Local time: 23:56
French to Italian
+ ...
click on a button Feb 10, 2003

Hi Celine,



This reply might come too late but I just found a button in Acrobat Reader called \"select a text\" (there is a T and a small square on the right hand side). If you click on it, you will be able to highlight the text you need, then right-click on your mouse and eventually copy and paste it onto a Word doc.

Hope it helps.

Simona


 
Matthew Coulson
Matthew Coulson  Identity Verified
Albanian to English
+ ...
PDF tools Feb 11, 2003

pdf2txt will change the text from pdf to a plain text file. This can be helpful but you remove all formatting when doing this. It is fairly inexpensive at $38.00 for a license. There is a free trial as well. For more info see:

http://www.verypdf.com/pdf2txt/pdf2txt.htm



You can also use pstotext. It is a bit more difficult to use so if you aren\'t very tech savy it probably isn\'
... See more
pdf2txt will change the text from pdf to a plain text file. This can be helpful but you remove all formatting when doing this. It is fairly inexpensive at $38.00 for a license. There is a free trial as well. For more info see:

http://www.verypdf.com/pdf2txt/pdf2txt.htm



You can also use pstotext. It is a bit more difficult to use so if you aren\'t very tech savy it probably isn\'t for you. You need to install GhostScript on your system and GhostView (both free) and then pstotext and then execute the extract function. This doesn\'t handle every type of pdf but it will handle many of them. You can find out more about it at:

http://www.research.compaq.com/SRC/virtualpaper/pstotext.html



A list of other tools can be found at:

http://www.pdfzone.com/toolbox/toolfilter.html

This page tells you all you wanted to know about PDF\'s but would rather never have to learn.



All tools to do a word count from pdf including Adobe Acrobat do have one weakness in that you can make a PDF that is nothing more than a scanned page without any OCR. This makes a PDF that is nothing more than a picture so there would be no way to extract a word count from this type of file without using an OCR program yourself.
Collapse


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

How to extract text from Acrobat Reader






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »