Pages in topic:   [1 2] >
Poll: Who should be in charge of using an OCR tool to prepare the source text for translation?
Thread poster: ProZ.com Staff
ProZ.com Staff
ProZ.com Staff
SITE STAFF
May 24, 2016

This forum topic is for the discussion of the poll question "Who should be in charge of using an OCR tool to prepare the source text for translation?".

This poll was originally submitted by Amanda DesJardins. View the poll results »



 
Christopher Schröder
Christopher Schröder
United Kingdom
Member (2011)
Swedish to English
+ ...
I'm sorry May 24, 2016

I can't think of anything offensive to say about this poll.

If you have a problem with that, please do have the courtesy to take me to task publicly or privately rather than going behind my back and getting a moderator to do it for you.

And before I get deleted again for breach of rule 645.4(b)(ii): The project manager, of course.


 
neilmac
neilmac
Spain
Local time: 19:30
Spanish to English
+ ...
I can, but won't May 24, 2016

Chris S wrote:

I can't think of anything offensive to say about this poll.

If you have a problem with that, please do have the courtesy to take me to task publicly or privately rather than going behind my back and getting a moderator to do it for you.

And before I get deleted again for breach of rule 645.4(b)(ii): The project manager, of course.


I invariably find your comments amusing and in general think we need a bit more "offensiveness" and calling out of BS on the site.

My issue with this particular poll is that it assumes too many things. For example, that source texts should need to be OCR'd in the first place.


 
Philippe Etienne
Philippe Etienne  Identity Verified
Spain
Local time: 19:30
Member
English to French
Other May 24, 2016

When working through an agency, the PM.
When working with an end-customer, the translator, so that you have control over how good and translator-friendly the OCR output is. And it's an extra item to charge or include in your fee.

Philippe


 
Post removed: This post was hidden by a moderator or staff member because it was not in line with site rule
Christine Andersen
Christine Andersen  Identity Verified
Denmark
Local time: 19:30
Member (2003)
Danish to English
+ ...
Other May 24, 2016

Ideally someone who understands the language...

Preferably the project manager, but if the result is going to be a dog's dinner of garbled Greek and mangled formatting, then I would rather do it myself.

Or simply translate from the PDF and accept that OCR cannot cope with everything.

Danish has three extra letters compared with English, and even some of the ordinary brackets and other things get mangled by OCR, so it may be useless anyway. I dread to think
... See more
Ideally someone who understands the language...

Preferably the project manager, but if the result is going to be a dog's dinner of garbled Greek and mangled formatting, then I would rather do it myself.

Or simply translate from the PDF and accept that OCR cannot cope with everything.

Danish has three extra letters compared with English, and even some of the ordinary brackets and other things get mangled by OCR, so it may be useless anyway. I dread to think what it makes of languages like Czech or Polish.

Of course, if I do the OCR, I charge for my time!

One client actually re-typed 4000 words for me, and sent me the first three pages for approval (and so I could get started) while she typed the rest. Now THAT was a person who understood the problem. There were very few typos, AND she paid my top rate for the translation. I'm not allowed to name her here, but Hola! I still remember you!

Still chuckling about your comments yesterday, Chris! At least you were in good company - it is not every day that Jack's comments get deleted.

[Edited at 2016-05-24 09:27 GMT]
Collapse


 
Anton Konashenok
Anton Konashenok  Identity Verified
Czech Republic
Local time: 19:30
French to English
+ ...
Theory vs. practice May 24, 2016

While in theory it's the PM who should perform OCR, in practice I have yet to see a PM (or PM's technician) who would use OCR tools properly. In fact, I've been toying with an idea to offer a course in OCR to translation agencies.

Actually, just a couple of days ago a PM proudly sent me a Cyrillic text OCRed into the Latin alphabet.

[Edited at 2016-05-24 09:45 GMT]


 
Thomas Pfann
Thomas Pfann  Identity Verified
United Kingdom
Local time: 18:30
Member (2006)
English to German
+ ...
Other N/A May 24, 2016

In those rare cases where an OCR tool is needed to prepare the source text for translation, it should be done by whoever gets paid to do it.

 
Muriel Vasconcellos
Muriel Vasconcellos  Identity Verified
United States
Local time: 10:30
Member (2003)
Spanish to English
+ ...
The PM - or just skip it May 24, 2016

If the OCR tool is sophisticated and the PM knows how to use it and cleans up the text, then it's a blessing to have an electronic copy to work with, but I've been faced with many OCR'd texts that were impossibly chopped up, with every few words clumped into separate text boxes. Plus, the margins and indents are almost always weird. Much as I hate PDFs, there are times when they would be easier than working with OCR output.

Just this month I had a 74-page OCR and my client arranged
... See more
If the OCR tool is sophisticated and the PM knows how to use it and cleans up the text, then it's a blessing to have an electronic copy to work with, but I've been faced with many OCR'd texts that were impossibly chopped up, with every few words clumped into separate text boxes. Plus, the margins and indents are almost always weird. Much as I hate PDFs, there are times when they would be easier than working with OCR output.

Just this month I had a 74-page OCR and my client arranged to have it digitized for me - I assume, using OCR. So, if the client knows how to to it, maybe that's even better, as they will gain some respect for the challenges we have to face.
Collapse


 
José Henrique Lamensdorf
José Henrique Lamensdorf  Identity Verified
Brazil
Local time: 14:30
English to Portuguese
+ ...
In memoriam
Other - open for negotiation May 24, 2016

It depends on who is better equipped to do it. We all want to have the best possible outcome.

One agency I work for has spared no money in getting the best OCR software available, and their PM skilled in making it work. Now and then I find a couple of typos in, say, ten pages, and she'll admit they got it in hard copy, scanned it, and did OCR. To me, it looked like an original DOC file.

The most typical OCR flag is mixing "rn" (RN) and "m" (M) when a spell checker will
... See more
It depends on who is better equipped to do it. We all want to have the best possible outcome.

One agency I work for has spared no money in getting the best OCR software available, and their PM skilled in making it work. Now and then I find a couple of typos in, say, ten pages, and she'll admit they got it in hard copy, scanned it, and did OCR. To me, it looked like an original DOC file.

The most typical OCR flag is mixing "rn" (RN) and "m" (M) when a spell checker will accept both as valid words, e.g. gaRNer and gaMer.

Some other clients have worse OCR software than mine, so I do it. No point in charging for it, because it's a relatively quick process that can be done in the background... while my computer (Pentium D - 2.8 GHz) is supposedly NOT a speed demon, and runs under Windows XP. From what I've seen so far, doing it with an i5 under Windows 10 should be an uphill drag.

For direct clients, I know that it will be better for me if I do it, as part of the job.

The MAJOR problem in such cases is not OCR, but scanning!

Now and then I hear a lawyer's secretary rejoicing over the phone, saying "don't know why, but our scanner is sooo much faster today, that I'm sending you a PDF instead of a messenger with that 60-page contract". Whenever I hear something like this, I already know what I'll be getting... a PDF scanned at 72 DPI. On the Acrobat Reader screen I'll see unreadable spots that a Bulgarian woman would call "kukunikas" - no idea on whether she made up this word, or if it means anything in her language.

Have you ever seen the reverse side of a generic maritime bill of lading? It contains about 5,000 words within one letter-sized page! I scanned it at 600 dpi, and OCR via OmniPage was 100% perfect. I found each and every mistake also present in the original text.
Collapse


 
Mario Chavez (X)
Mario Chavez (X)  Identity Verified
Local time: 13:30
English to Spanish
+ ...
Say it ain't so! May 24, 2016

Chris S wrote:

I can't think of anything offensive to say about this poll.

If you have a problem with that, please do have the courtesy to take me to task publicly or privately rather than going behind my back and getting a moderator to do it for you.

And before I get deleted again for breach of rule 645.4(b)(ii): The project manager, of course.


Welcome to the club, Chris. Telling someone face to face (or by other available means, like the phone, letter, or email) what we think of him/her or his/her statements is not only honest. It's good manners.


 
Mario Chavez (X)
Mario Chavez (X)  Identity Verified
Local time: 13:30
English to Spanish
+ ...
Extra item to charge May 24, 2016

Philippe Etienne wrote:

When working through an agency, the PM.
When working with an end-customer, the translator, so that you have control over how good and translator-friendly the OCR output is. And it's an extra item to charge or include in your fee.

Philippe


In agreement.

Most times, I do the OCR for free as part of the whole enchilada because I have an established relationship with the client.

Other times, I have to persuade the customer or project manager not to use OCR to provide me with the text because a) I can do it better, b) I can type it faster and better and/or c) I need the native (original) files. This scenario happens with PDF files and some well-meaning clients (or PMs) think they're doing you a favor by OCR'ing it for you.


 
Mario Chavez (X)
Mario Chavez (X)  Identity Verified
Local time: 13:30
English to Spanish
+ ...
OCR technologies are a mixed bag May 24, 2016

If the document is in PDF format and the layout is very simple (one column, few if any text boxes or lists), then I'll accept the PM, client or their grandma to send me an OCR'd copy to work with. Otherwise I politely refuse their misguided efforts.

OCR technologies look like magic to the uninitiated or to whomever doesn't know how character recognition works, let alone when it is applied to foreign languages and complex layouts. By complex layout I mean anything more than on
... See more
If the document is in PDF format and the layout is very simple (one column, few if any text boxes or lists), then I'll accept the PM, client or their grandma to send me an OCR'd copy to work with. Otherwise I politely refuse their misguided efforts.

OCR technologies look like magic to the uninitiated or to whomever doesn't know how character recognition works, let alone when it is applied to foreign languages and complex layouts. By complex layout I mean anything more than one column and two typefaces.
Collapse


 
Edward Potter
Edward Potter  Identity Verified
Spain
Local time: 19:30
Member (2003)
Spanish to English
+ ...
Not a big fan of OCRing May 24, 2016

I'm a touch typist and I go quite fast with a good old hard copy of my PDF.

Many times a CAT slows me down since I have to double check what automatically gets put in my target field. Add the OCR manipulation of the text, I lose even more time. And then, there are the inevitable defects from the OCR conversion.

OCR has its place, but more often than not it doesn't do me a lot of good.


 
Katrin Bosse (X)
Katrin Bosse (X)  Identity Verified
Germany
Local time: 19:30
Dutch to German
+ ...
Agreed! May 24, 2016

Thomas Pfann wrote:

In those rare cases where an OCR tool is needed to prepare the source text for translation, it should be done by whoever gets paid to do it.


It's a job and it has to be done correctly so - yes!


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Jared Tabor[Call to this topic]

You can also contact site staff by submitting a support request »

Poll: Who should be in charge of using an OCR tool to prepare the source text for translation?






Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »
Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »