How to count word of an e-book (html format)
Thread poster: Sandrine Rizzo (X)
Sandrine Rizzo (X)
Sandrine Rizzo (X)  Identity Verified
France
Local time: 15:56
Spanish to French
+ ...
Feb 12, 2014

Dear all,

I've been faced this week with a technical issue when a prospect sent me an e-book in the form of a link to a html address, where each page had this very same address, so it was impossible to work as any usual website, i.e saving each page for a wordcount and CAT-tool translation.

Can anyone of you give me their tips on how it is possible to do a reliable wordcount and also how to translate this kind of documents ?

For your information, the custo
... See more
Dear all,

I've been faced this week with a technical issue when a prospect sent me an e-book in the form of a link to a html address, where each page had this very same address, so it was impossible to work as any usual website, i.e saving each page for a wordcount and CAT-tool translation.

Can anyone of you give me their tips on how it is possible to do a reliable wordcount and also how to translate this kind of documents ?

For your information, the customer refused to send me the content in another format, so it was a kind of "do or die" request....

Thank you in advance for your help,
Sandrine
Collapse


 
Sergei Leshchinsky
Sergei Leshchinsky  Identity Verified
Ukraine
Local time: 16:56
Member (2008)
English to Russian
+ ...
In any CAT-tool Feb 12, 2014

If you have HTML -- it is a simple format supported by all CAT tools.

If you have a link to the web page, it is NOT the e-book.
Try using HTTRack or similar tool to extract the text.

It is difficult to say anything without seeing the web-site...

[Редактировалось 2014-02-12 11:55 GMT]


 
Sandrine Rizzo (X)
Sandrine Rizzo (X)  Identity Verified
France
Local time: 15:56
Spanish to French
+ ...
TOPIC STARTER
unsuccessful extraction Feb 12, 2014

Thank you Sergei for your reply.

As recommanded, I uploaded the HTTrack software & extracted the page with no success.
The extracted file has no content, except for the page titles, same thing when imported into a CAT tool. I saved the page as a text with the same result. It seems this page is some kind of a content "screenshot".

Actually, the link leads to this kin
... See more
Thank you Sergei for your reply.

As recommanded, I uploaded the HTTrack software & extracted the page with no success.
The extracted file has no content, except for the page titles, same thing when imported into a CAT tool. I saved the page as a text with the same result. It seems this page is some kind of a content "screenshot".

Actually, the link leads to this kind of address: http://customername.com

Any other idea?

For non-disclosure reasons, I do no want to give the right address here, but I could send it to you by private mail if you think you could work something out with it.

Sandrine
Collapse


 
Joakim Braun
Joakim Braun  Identity Verified
Sweden
Local time: 15:56
German to Swedish
+ ...
Ajax Feb 12, 2014

This is probably an Ajax-based HTML interface where you never see the actual URLs.
The back-end data might be in all kinds of formats, inluding but not limited to HTML.

Ask the customer to provide the text.



[Bearbeitet am 2014-02-12 13:24 GMT]


 
Rolf Keller
Rolf Keller
Germany
Local time: 15:56
English to German
Copy and Paste? Feb 12, 2014

If all pages have the very same URL, they cannot be saved as files because no such files exist. The web server generates a new page on-the-fly, when you click the Next Page or Previous Page button. As HTTrack doesn't click that buttons, it will not work.

You could try to copy page by page via Copy-and-Paste from the screen into an open Word document. Then reformat it if necessary.


 
Joakim Braun
Joakim Braun  Identity Verified
Sweden
Local time: 15:56
German to Swedish
+ ...
Don't think so Feb 12, 2014

Rolf Keller wrote:

You could try to copy page by page via Copy-and-Paste from the screen into an open Word document. Then reformat it if necessary.


For quoting on a book of perhaps 200 pages? It's not worth it.
Ask the customer to provide the text. If they're a customer worth having, they'll be happy to provide it.

[Bearbeitet am 2014-02-12 14:31 GMT]


 
Rolf Keller
Rolf Keller
Germany
Local time: 15:56
English to German
Quoting based on estimation Feb 12, 2014

Joakim Braun wrote:

For quoting on a book of perhaps 200 pages? It's not worth it.


Not for quoting but for translating. For quoting I'd estimate the quantity (based on 3 typical" pages) and multiply the result by 1.2 (because of the uncertainity and because of the additional work). And I'd inform the client that not providing an editable file implies higher cost. Topic: "Customer Education"


 
Tony M
Tony M
France
Local time: 15:56
Member
French to English
+ ...
SITE LOCALIZER
Copy/paste or screen capture Feb 12, 2014

If the text can be selected with your mouse, then you can copy and paste it page by page into a document.

If the text only exists as an image, then you can use something like the Windows 'capture' tool to select and copy just the text on the page and copy it into an image file, which you can then OCR.

I have done this in the past, albeit only for a short document!

I can't see a glittering alternative — obviously e-book publishers aren't going to want you
... See more
If the text can be selected with your mouse, then you can copy and paste it page by page into a document.

If the text only exists as an image, then you can use something like the Windows 'capture' tool to select and copy just the text on the page and copy it into an image file, which you can then OCR.

I have done this in the past, albeit only for a short document!

I can't see a glittering alternative — obviously e-book publishers aren't going to want you to copy their text easily into a format that could then be printed out...

If I were you, I'd be inclined to subcontract the donkey work out to someone whose time is less valuable than your own — maybe a computer-savvy kid who'd like to earn some pocket-money!

Just a thought, though — if your customer is unable to provide the original text, are you sure they actually have the right to have it translated? Depends what their intended use is, naturally.
Collapse


 
Sandrine Rizzo (X)
Sandrine Rizzo (X)  Identity Verified
France
Local time: 15:56
Spanish to French
+ ...
TOPIC STARTER
thanks Feb 13, 2014

Thank you all for your useful contributions, although an effective solution has not yet be found, I discovered new tools and new technical tips !

The idea of capturing the pages and doing an OCR conversion seems interesting and of course this should be charged as it means extra work.

When I asked the customer for an editable format, he refused to do so, and actually I refused to work further on his quotation request because of his unethical behaviour, as he is the kind
... See more
Thank you all for your useful contributions, although an effective solution has not yet be found, I discovered new tools and new technical tips !

The idea of capturing the pages and doing an OCR conversion seems interesting and of course this should be charged as it means extra work.

When I asked the customer for an editable format, he refused to do so, and actually I refused to work further on his quotation request because of his unethical behaviour, as he is the kind of customer who squeeze agencies dry to get lower prices with no efforts. And to be honest, this was a relief more than a loss !

Have a nice day !
Sandrine
Collapse


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

How to count word of an e-book (html format)






Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »