Translation from scanned images
Autor de la hebra: Tatsuhiro Sugihira
Tatsuhiro Sugihira
Tatsuhiro Sugihira  Identity Verified
Estados Unidos
Local time: 08:28
inglés al japonés
+ ...
Aug 15, 2013

Hi guys,

So I am asked to quote for Japanese to English translation from a scanned copy of a manual type document.
I have given my rate quote, but they seem to be asking for the total amount.

I have tried to convert the original Japanese pdf file into word file using Nitro Pro 8 (which has a feature to convert pdf files into other file formats), but was unsuccessful. (It doesn't recognize Japanese I think.)
It seems like the manual word count or some kind of
... See more
Hi guys,

So I am asked to quote for Japanese to English translation from a scanned copy of a manual type document.
I have given my rate quote, but they seem to be asking for the total amount.

I have tried to convert the original Japanese pdf file into word file using Nitro Pro 8 (which has a feature to convert pdf files into other file formats), but was unsuccessful. (It doesn't recognize Japanese I think.)
It seems like the manual word count or some kind of calculation using a formula is required.

So assuming the translation rate is USD 0.10 per source Japanese letter, how much should I charge?
Also, assumes that ultimately I might have to type Japanese text first from viewing scanned pdf files first, then translate the text into English.
Additionally, the original file includes, different font sizes, different text orientation, charts with words and etc etc...

Please tell me your opinion or experience working in a similar assignment.

Thank you for any comments in advance.

-Tatsu
Collapse


 
Mårten Engelberg
Mårten Engelberg  Identity Verified
Suiza
Local time: 17:28
Miembro 2003
inglés al sueco
+ ...
You shouldn't have to type... Aug 16, 2013

...all of it: for Optical Character Recognition I used the free http://en.wikipedia.org/wiki/Nuance_PDF_Reader for a number of years, in which opened PDF's are (were?) uploaded to their web site and then emailed back as doc etc. Now I'm using their PDF Converter 8 (50 USD) which is offline and therefore maybe a safer alternative. Also the free way is (was?) sometimes slower, when there's a ... See more
...all of it: for Optical Character Recognition I used the free http://en.wikipedia.org/wiki/Nuance_PDF_Reader for a number of years, in which opened PDF's are (were?) uploaded to their web site and then emailed back as doc etc. Now I'm using their PDF Converter 8 (50 USD) which is offline and therefore maybe a safer alternative. Also the free way is (was?) sometimes slower, when there's a lot of people using it/volume in one file.

I can't remember how good the results for Japanese were in the free Reader, but in the paid version they've been good most times, and only really bad when the PDF was almost illegible (which too often is the case with stuff from Japanese clients, bless their hearts...).

Or you can set a price per PDF page, of course. But it's still going to be less accurate an estimate than OCR+cleaning/estimating.

Either way, best of luck, ganbare!
Mårten

[Edited at 2013-08-16 00:23 GMT]
Collapse


 
Srini Venkataraman
Srini Venkataraman
Estados Unidos
Local time: 10:28
Miembro 2012
tamil al inglés
+ ...
exception Aug 16, 2013

I think if the pdf is from jpg then OCR may not work.

 
Tatsuhiro Sugihira
Tatsuhiro Sugihira  Identity Verified
Estados Unidos
Local time: 08:28
inglés al japonés
+ ...
PERSONA QUE INICIÓ LA HEBRA
Trying OCR, so far no success Aug 16, 2013

Marten,

Thank you for mentioning about OCR process.
I didn't know about it and trying it right now.
So far I have tried using OCR in few programs and not successful.


Srini,

Yes you are kind of right about it.



Actually, I'm not quite sure... OCR isn't working well because,
-the file is based on Japanese (non-alphabet language)
-the pdf is based on scanned images.

I'm D/L OCR tool speciali
... See more
Marten,

Thank you for mentioning about OCR process.
I didn't know about it and trying it right now.
So far I have tried using OCR in few programs and not successful.


Srini,

Yes you are kind of right about it.



Actually, I'm not quite sure... OCR isn't working well because,
-the file is based on Japanese (non-alphabet language)
-the pdf is based on scanned images.

I'm D/L OCR tool specialize (or at least it claims so) in East Asian language.
If this doesn't work... then I might have to go the manual way... =/
Collapse


 
Elina Sellgren
Elina Sellgren  Identity Verified
Finlandia
Local time: 18:28
Miembro 2013
inglés al finlandés
+ ...
Copy+paste? Aug 16, 2013

I have 'highlighted' text in PDF files (with your mouse), then copy+pasted it into Word. All the formatting disappears but you should be able to get the word count that way, if that's the main thing you need. Not sure how well it works with Japanese characters though.

 
Branka Ramadanovic
Branka Ramadanovic  Identity Verified
Bosnia y Herzegovina
Local time: 17:28
inglés al croata
+ ...
I normally Aug 16, 2013

try to charge more for PDF originals, although I do not work in Chinese, because this usually requires additional work of this or that kind. Or, I ask the client to supply an editable version.

Best,
Branka


 
Sandra Peters-Schöbel
Sandra Peters-Schöbel
Alemania
Local time: 17:28
Miembro 2007
inglés al alemán
+ ...
the worst... Aug 16, 2013

Hi,
I often get this kind of documents as well (certificates, sent by fax to the agency and afterwards emailed to me). Normally I am using ABBY Fine Reader Professional for converting pdf to word which works quite well.
But all the mentioned methods do not work if you have a scanned document, because the whole text is saved as one picture.
You cannot copy any part of it, use the 'extract text' function or similar.
So you can neither use a CAT tool nor give an exact quo
... See more
Hi,
I often get this kind of documents as well (certificates, sent by fax to the agency and afterwards emailed to me). Normally I am using ABBY Fine Reader Professional for converting pdf to word which works quite well.
But all the mentioned methods do not work if you have a scanned document, because the whole text is saved as one picture.
You cannot copy any part of it, use the 'extract text' function or similar.
So you can neither use a CAT tool nor give an exact quote. I don't think it has anything to do with the Japanese characters.

But if you have a difficult formatting converting is most of the times useless anyway. The layout work is so difficult afterwards that you are faster translating in a new Word document and format afterwards.

But how do quote:
I simply go and count the words on a full page and assume the same count for the other pages, plus an additional fare for all the layout work (because this can mean quite some time...)

When quoting this way (which is in my favor) I always tell the client that when getting a word document I could give an exact quote, maybe give a small discount on repetitions and am much faster... They simply have to learn that we cannot work with a badly scanned PDF or fax but need the orginal document (which in the case of the PDF often is a PowerPoint or Word).
You could also offer to invoice on the target word count, but remember to add your layout work to the price...

Kind regards
Sandra
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Países Bajos
Local time: 17:28
Miembro 2006
inglés al afrikaans
+ ...
Use the old-fashioned count method Aug 16, 2013

Tatsu02 wrote:
I have given my rate quote, but they seem to be asking for the total amount.


Take a few random lines, count the number of characters in them, then multiply the average characters per line by the number of lines. The price will be slightly inflated, but you can always offer a discount in the end, if you feel guilty about it.


 
Sheila Wilson
Sheila Wilson  Identity Verified
España
Local time: 16:28
Miembro 2007
inglés
+ ...
Just make sure you're paid for your time Aug 16, 2013

Tatsu02 wrote:
ultimately I might have to type Japanese text first from viewing scanned pdf files first, then translate the text into English.

Surely, you'll just read the phrases in Japanese and type them in the target language if you can't convert it, won't you? As someone else mentioned, CAT tools are probably going to be useless on this job even if the file can be converted. So I can't personally see any circumstances where this would be necessary, or even advisable, but if you do it, make sure the client knows about it and pays for the time taken.
Additionally, the original file includes, different font sizes, different text orientation, charts with words and etc etc...

Does the client need all that formatting? That's going to take time if you're knowledgeable about these things, and lots of time if you aren't. Ask the client what their requirements are, bearing in mind that perfection will cost extra. If it's an agency, you can bet your bottom dollar that they are charging their client more for that formatting!

Remember, there's absolutely no interest in you working for half your normal hourly rate simply because the client wants something complicated that you can't deliver. The client must pay you correctly or go elsewhere. You're better off refusing the job and spending the free time researching how to deal with the next similar request (as you're doing here), so that next time you can approach the job differently. Sometimes, we just have to say "No". No client would ever pay me what it would take for me to deal with this type of job (in my pair, of course!), so I just politely refuse such jobs. Somewhere out there, there will be someone who can reconstruct that document in a flash, using all sorts of IT tricks that I know nothing about, and will charge their normal per-word rate plus 5% or so for formatting. They're welcome to the job.


 
Simin Tan
Simin Tan  Identity Verified
Local time: 23:28
chino al inglés
Target word rate would also work Aug 16, 2013

In cases like this, I use a target word rate (typically 1.5x source word rate for ZH-->EN) and impose a "premium" for extra work OCR-ing, etc.

 
Tatsuhiro Sugihira
Tatsuhiro Sugihira  Identity Verified
Estados Unidos
Local time: 08:28
inglés al japonés
+ ...
PERSONA QUE INICIÓ LA HEBRA
Tried all these OCR programs Aug 16, 2013

ABBYY Fine Reader is the closest to the success after trying all these OCR programs.
(It's having discount right now too. =D)
It's OCR rate is around 60-70% I think.
Which I don't really like it for going back and forth to check whether it's OCR is correct.

The reason I would like to have pdf into proper word file at first is because CAT tool might be beneficial on this project considering it's technical (with repeated terms) and fairly large volume.
Just thi
... See more
ABBYY Fine Reader is the closest to the success after trying all these OCR programs.
(It's having discount right now too. =D)
It's OCR rate is around 60-70% I think.
Which I don't really like it for going back and forth to check whether it's OCR is correct.

The reason I would like to have pdf into proper word file at first is because CAT tool might be beneficial on this project considering it's technical (with repeated terms) and fairly large volume.
Just thinking about the benefit on using accurate and consistent words for the client.
Another benefit would be any future translation review usage in the future (which is also for the client).

Well I decided to provide the total project fee based on per page rate and also explained what work will be done and the final product at the end.

Thanks guys for all your posts! =]
Collapse


 
Łukasz Gos-Furmankiewicz
Łukasz Gos-Furmankiewicz  Identity Verified
Polonia
Local time: 17:28
inglés al polaco
+ ...
Manual and semi-manual solutions Aug 16, 2013

You can always just type, whether or not you tell the client. Obviously, you can't type a long text in time to offer a reasonably quick quotation.

Samuel's solution based on average counts per line, page etc. is also good, especially if you can find some standard formula to rely on for credibility. Alternatively, you can just simply tell the client that the alternative is manually counting the words, so you suggest this or that method of approximation. Almost all clients should be r
... See more
You can always just type, whether or not you tell the client. Obviously, you can't type a long text in time to offer a reasonably quick quotation.

Samuel's solution based on average counts per line, page etc. is also good, especially if you can find some standard formula to rely on for credibility. Alternatively, you can just simply tell the client that the alternative is manually counting the words, so you suggest this or that method of approximation. Almost all clients should be reasonable and understand, and you really don't need to go to great lengths to avoid any remote possibility of charging a cent or two too high. Remember the approximate solution is just that, an approximation, and one that aims to makes lives easier by skipping the full manual count. So don't make it difficult.

Also, Sandra may be right in that just simply counting the stuff manually may be less time consuming than finding sophisticated ways around the problem. Sometimes it really takes less time to do the footwork than to avoid it.

Also, yeah, target count. I use target count in such situations. So do my agencies. There are some people who don't really understand this, but they'd normally realise that they aren't experts, so they shouldn't be too difficult to deal with. If they are, well, just put your foot down. You're the pro there.

Oh, and avoid the kind of OCR that's more trouble than it's worth. If OCR increases your workload instead of reducing it, dump the OCR.

Also, you could probably hire a student for typing if you need to. Get yourself a walk in the sunshine in the meantime.
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Translation from scanned images







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »