Páginas sobre el tema:   [1 2] >
How can I count words in PDF files?
Autor de la hebra: suesimons
suesimons
suesimons  Identity Verified
Local time: 12:28
portugués al inglés
Apr 12, 2006

I'm sure this has been asked before but how do I count the words in a .pdf document?

[Subject edited by staff or moderator 2006-04-12 18:48]


 
Giles Watson
Giles Watson  Identity Verified
Italia
Local time: 13:28
italiano al inglés
In Memoriam
It certainly has... Apr 12, 2006

... and there are plenty of links here:

http://www.proz.com/post/326526#326526

You can find more relevant messages by typing "pdf" or "pdf count" in the "Search forums" box in the top righthand corner of this page.

HTH

Giles


 
Kristine Sprula (Lielause)
Kristine Sprula (Lielause)  Identity Verified
Letonia
Local time: 14:28
Miembro 2005
inglés al letón
+ ...
It depends.... Apr 12, 2006

suesimons wrote:

I'm sure this has been asked before but how do I count the words in a .pdf document?

[Subject edited by staff or moderator 2006-04-12 18:48]


If the file has been created as .pdf document, one option is copying text to Word, another one is using a special programm for word count.
But if the file has been created as a picture - the text is scanned and then made as .pdf document, the only option is manual counting.

Regards,
Kristine


 
Marisa Condurso de Nohara
Marisa Condurso de Nohara  Identity Verified
Argentina
Local time: 08:28
inglés al español
+ ...
Word and ABBy Apr 12, 2006

Kristine Lielause wrote:

... one option is copying text to Word.....

But if the file has been created as a picture.....

Kristine


I would like to add something to Kristine's suggestion:

I usually do it by copying and pasting on Word, but be careful! Some words may become joined, and objects with readings won't be taken into account. So before clicking on "word-count" see the whole Word.doc over to separate possible word unions and treat objects separatedly.


Secondly, when it has been created as a picture, you could use AbbyFinder to transform "objects" (one pdf page copied) into "words", but truth to tell, I am not sure what happens when pdf's texts are too long. I habitually use AbbyF when dealing with individual pictures with words in-between.

Hope it helps!
McN


 
ddelvecchio
ddelvecchio
Local time: 13:28
inglés al italiano
+ ...
Practicount Apr 22, 2006

Hello!!

If the file isn't an image, I use Practicount&Invoice, a really nice and simple software counting words from every type of document.
It also generates invoices and many other things.

You can download a shareware version here:
http://www.practiline.com/download.htm

Bye!!
Davide


 
aitteam
aitteam
Ucrania
Local time: 14:28
Miembro 2009
inglés al ucraniano
Word count in pdf, images, and 30 more file formats Jul 28, 2009

Hello,

We have just released new version of our word count software. It is called AnyCount and is used by more than 5000 people worldwide. I am sure colleagues on the forum may give their opinion on its pros and cons.

I will only mention new feature of version 7 - word count in BMP, JPG, PNG, and GIF files.

Best,
Vladimir.


 
Anna Villegas
Anna Villegas
México
Local time: 05:28
inglés al español
Try this one Jul 29, 2009

http://www.globalrendering.com/download.html

It is a good tool.



 
Samuel Murray
Samuel Murray  Identity Verified
Países Bajos
Local time: 13:28
Miembro 2006
inglés al afrikaans
+ ...
Here's how Jul 29, 2009

suesimons wrote:
How do I count the words in a .pdf document?


1. In your PDF viewer, press Ctrl+A and Ctrl+C, and then in MS Word, press Ctrl+V. If you can see the text, count it. If you can't see the text, go to step 2.

2. Use a good, expensive OCR program to convert the PDF into MS Word, and then use CompleteWordCount to count the text. If you don't want to use OCR, go to step 3.

3. Count the way we counted in the old days, by counting a few average lines and then multiplying the average by the average number of lines per page and the number of pages.

http://www.shaunakelly.com/word/CompleteWordCount/


 
Michael GREEN
Michael GREEN  Identity Verified
Francia
Local time: 13:28
inglés al francés
Agree with Samuel Jul 30, 2009

... on all points.

I would just add that if the file is an image, I usually print it and then scan it using the OCR function of my scanner.

In any event, pdf files necessarily mean extra time taken to prepare the source files, and I invoice that extra time to my customers (having made it clear that this is how I work before the order is confirmed).


 
Tony M
Tony M
Francia
Local time: 13:28
Miembro
francés al inglés
+ ...
LOCALIZADOR DEL SITIO
Only for text-based PDFs? Jul 31, 2009

Tadzio Carvallo wrote:
Try this one:
http://www.globalrendering.com/download.html


Yes, but as far as I can ascertain from that website, it still only seems to count words in PDF files created directly from native text formats; so it still can't solve the problem of what to do when the PDF is in fact an image from some scanned document etc.

Like Michael G., I have occasionally had to resort to printing out the file and then OCRing it, which really does seem a roundabout way of doing things! Also a problem with poorer quality originals, particularly with fine print; however, the actual absolute accuracy of the OCR is fairly unimportant, as long as on average it produces about the right number of words; and in my exprience, it's a case of 'swings and roundabouts', and the end result is usually accurate enough; after all, it is hardly cost-effective to waste a lot of time producing a to-the-word accurate wordcount, since any discrepancy is likely to be fairly small.

As I translate mainly from FR > EN, I sometimes agree with the customer to base my charging on target word count + a percentage; generally, 10% seems about right for FR>EN, though on a statistical analysis I once did of a quite large number of files, I noticed variations from –16% to +5% in the FR > EN wordcount difference, so the variability is quite large! But I find most customers don't argue with 10% (they can see for themselves that the EN take up less space!), and it's not really worthwhile wasting time trying to get greater accuracy.

In passing, I'd just like to mention one customer who requested specifically that I not reduce my EN translation by more than 5% compared to the FR, for DTP reasons! Better still, this particular customer pays me by target wordcount anyway! However, in the particular field I was working in, it was actually extremely difficult to comply!


 
Igor Moshkin
Igor Moshkin
Federación Rusa
Local time: 18:28
inglés al ruso
+ ...
FineCount Jul 31, 2009

Try FineCount - http://www.tilti.com/tilti-com.software.finecount?pc_code=F97961DA6D40A&ver=2.5.1.1766
It's free, though requires registration. In addition to word count this soft provides you plenty of other useful information including invoice.


 
CHEN-Ling
CHEN-Ling  Identity Verified
Local time: 19:28
chino al inglés
+ ...
OCR Aug 13, 2009

Michael GREEN wrote:
... on all points.

I would just add that if the file is an image, I usually print it and then scan it using the OCR function of my scanner.

In any event, pdf files necessarily mean extra time taken to prepare the source files, and I invoice that extra time to my customers (having made it clear that this is how I work before the order is confirmed).


This is what I want to say. Actually a single OCR software, such as Shocr7.0 is enough. Usually I first save the PDF file as tiff file, then I open the saved tiff files in Shocr 7.0 and transform them into text. Finally copy these text on a word file and count.


 
Pierre Fleutot
Pierre Fleutot
Argentina
Local time: 08:28
inglés al francés
+ ...
Excellent Dec 29, 2009

Tadzio Carvallo wrote:

http://www.globalrendering.com/download.html

It is a good tool.



SO easy to use (no install). De diez !


 
Tam Nguyen
Tam Nguyen
Vietnam
Local time: 18:28
inglés al vietnamita
+ ...
count words in PDF Jan 22, 2010

PierreF wrote:

Tadzio Carvallo wrote:

http://www.globalrendering.com/download.html

It is a good tool.



SO easy to use (no install). De diez !


Tried it! That's OK and easy.


 
Virginia canvas
Virginia canvas
Estados Unidos
Local time: 04:28
francés al inglés
+ ...
Second vote for PractiCount Mar 26, 2010

We have been using PractiCount for a couple years and love it. It's easy to use. It's versatile and customizable. And the counts are quite accurate (even for Asian chars, lines, pages, etc.). It counts almost every file format I have needed: Word, Excel, PPT, PDF, HTML..... PractiCount also offers the flexibility to export reports or professional-looking invoices, if you need those features.

A customer just sent us a new PDF document complete with embedded CAD drawings.
I tri
... See more
We have been using PractiCount for a couple years and love it. It's easy to use. It's versatile and customizable. And the counts are quite accurate (even for Asian chars, lines, pages, etc.). It counts almost every file format I have needed: Word, Excel, PPT, PDF, HTML..... PractiCount also offers the flexibility to export reports or professional-looking invoices, if you need those features.

A customer just sent us a new PDF document complete with embedded CAD drawings.
I tried the standard route of Adobe Acrobat's save-as function. Low-ball word count because most of the images remained images - not editable text.

Next I tried the OCR tool ABBYY PDF Transformer (another tool I love!!). Fair results. At least ABBYY converted most of the images to text, but it still looked incomplete for estimating purposes.

Then I resorted to PractiCount. Somehow PractiCount came up with 2000 words higher than either of the other two approaches.


Note: Over the years, I have found that the success of OCR tools varies with the nature of the image and layout. ABBYY seems to be among the best (especially for foreign or multi-language docs and for retaining layout that a translator can use). But not always. Sometimes OmniPage or another OCR tool simply has better luck for a creative design layout. It seems to be a matter of trial and error with those scans or embedded images.

Good luck,
- Virginia Anderson

Oregon Translation, LLC
Building cooperative relationships with translators.
Apply as a translator here: www.oregontranslation.com
Collapse


 
Páginas sobre el tema:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How can I count words in PDF files?






TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »