Pages in topic: [1 2] > | Is this the solution to formatting problems from OCR? Thread poster: Dylan J Hartmann
|
While I have purchased ABBYY Finereader, I've avoided using it much of the time because after doing (Thai) OCR, having to fix spelling errors throughout, making the source formatting right, and then running through my CAT tool, the final MS Word doc ends up with strange formatting problems. I had quizzed ABBYY about why the italics, bold and underlining was locked in some paragraphs, not in others and posted asking for help here, but it wasn't ever solved (my workaround was to type the bold head... See more While I have purchased ABBYY Finereader, I've avoided using it much of the time because after doing (Thai) OCR, having to fix spelling errors throughout, making the source formatting right, and then running through my CAT tool, the final MS Word doc ends up with strange formatting problems. I had quizzed ABBYY about why the italics, bold and underlining was locked in some paragraphs, not in others and posted asking for help here, but it wasn't ever solved (my workaround was to type the bold heading, for example, in a new doc and then copy it to the file I was working on!). In addition, many of the agencies I work for now have as part of the instructions, "DO NOT OCR the source files – they create formatting that is unusable on the back end"! Well, with this being said, there are certainly situations where OCR can be very helpful. I'm wondering if getting the OCR to export as a .txt file and manually inserting formatting will be the best workaround? I had previously tried exporting as plain text, as a word document, and fixing formatting in the source prior to running the CAT, but the final translated doc still had locked bold, italics and underlining, as mentioned earlier. I've tested this new .txt method on a couple of PDF and have noticed no final problems, wondering what everyone else suggests? Is this the solution to formatting problems from OCR? ▲ Collapse | | | telefpro Local time: 22:31 Portuguese to English + ... formatting problems | Jun 14, 2016 |
There are formatting problems which still persist. OCR can't always solve this issue | | | esperantisto Local time: 20:01 Member (2006) English to Russian + ... SITE LOCALIZER OpenOffice Writer | Jun 14, 2016 |
In my experience, removing unnecessary formatting and fixing other post-OCR problems is easier with Apache OpenOffice / LibreOffice Writer (even when working with MS Word formats), especially with the OOoFBTools add-on. However, I do not work with Thai and have no idea about any implications specific to that language / script. As I can understand, saving results to plain text is sometimes really the best option for languages of Asia / Far East with complex scripts.
[Edited at 2016-06-14 ... See more In my experience, removing unnecessary formatting and fixing other post-OCR problems is easier with Apache OpenOffice / LibreOffice Writer (even when working with MS Word formats), especially with the OOoFBTools add-on. However, I do not work with Thai and have no idea about any implications specific to that language / script. As I can understand, saving results to plain text is sometimes really the best option for languages of Asia / Far East with complex scripts.
[Edited at 2016-06-14 04:48 GMT] ▲ Collapse | | | Acrobat or Trados Studio | Jun 14, 2016 |
I can open PDFs in Trados Studio, but threatening to do so made one recent client send me the InDesign file. The trouble was that there were hard line breaks, often two or three in a sentence, so the source was broken up into tiny segments that could not be merged. Apart from that, I could not create a target file that I could open. The client was happy, but it was sheer guesswork on my part, as I had no WYSIWYG of the document, graphics and formatting, and had to assume the DTP per... See more I can open PDFs in Trados Studio, but threatening to do so made one recent client send me the InDesign file. The trouble was that there were hard line breaks, often two or three in a sentence, so the source was broken up into tiny segments that could not be merged. Apart from that, I could not create a target file that I could open. The client was happy, but it was sheer guesswork on my part, as I had no WYSIWYG of the document, graphics and formatting, and had to assume the DTP person could make any adjustments. A file like that is a pain to translate anyway... Sometimes Adobe Acrobat works well with Danish, my source language, if the settings are correct, but if the scanned quality is not good, then nothing helps much - Danish has three extra letters, which are very often garbled. A 'search and replace' may or may not help - they are not always garbled consistently! Then come all the other spelling errors... Whichever workaround is best for a given situation, I hope translators are making clients aware of the need for a workaround and charging for the time spent re-creating documents and formatting. It should not be included in the same standard word rate as for a simple document in Word! ▲ Collapse | |
|
|
Tom in London United Kingdom Local time: 18:01 Member (2008) Italian to English Is this the solution? No- | Jun 14, 2016 |
DJHartmann wrote: .... Is this the solution to formatting problems from OCR? None of what you describe has anything to do with translating. | | | Dylan J Hartmann Australia Member (2014) Thai to English + ... MODERATOR TOPIC STARTER No blanket rates | Jun 14, 2016 |
Christine Andersen wrote: I hope translators are making clients aware of the need for a workaround and charging for the time spent re-creating documents and formatting. It should not be included in the same standard word rate as for a simple document in Word! I totally agree with this point. Tom in London wrote: Nothing worth noting Thanks for your two-cents Tom | | | Katerina O. Russian Federation English to Russian + ... Clear All Formatting | Jun 14, 2016 |
I use 'Clear All Formatting' function in Word, and then apply styles as necessary. It's not that time consuming after all | | | Dylan J Hartmann Australia Member (2014) Thai to English + ... MODERATOR TOPIC STARTER
Katerina O. wrote: I use 'Clear All Formatting' function in Word. Yes, likewise. However the bold, italics and underline functions were still locked afterwards. In certain documents document language was locked (either as Thai or Arabic) and I couldn't change to English after translation to run a spellcheck. | |
|
|
Tina Vonhof (X) Canada Local time: 11:01 Dutch to English + ...
Why would you spend valuable time struggling with formatting in a converted document, which, as Tom points out, has nothing to do with translation? Just open a blank document and start typing! | | |
If the formatting remains locked after being cleared, it is most likely due to Word styles. Open the list of styles used in the document and delete the styles responsible for that formatting. | | | Tom in London United Kingdom Local time: 18:01 Member (2008) Italian to English Yes, and..... | Jun 14, 2016 |
Tina Vonhof wrote: Why would you spend valuable time struggling with formatting in a converted document, which, as Tom points out, has nothing to do with translation? Just open a blank document and start typing! Yes - and translating If you use dictation software you can just read out your translation from the PDF and hey presto, it will type itself out in the target language. I too struggled with PDF conversion for a long time until I realised I could do it with dictation.
[Edited at 2016-06-14 15:18 GMT] | | | Dylan J Hartmann Australia Member (2014) Thai to English + ... MODERATOR TOPIC STARTER
Tina Vonhof wrote: Why would you spend valuable time struggling with formatting in a converted document, which, as Tom points out, has nothing to do with translation? Just open a blank document and start typing! In plenty of situations this is the best option, however sometimes OCR can be very useful! | |
|
|
Dylan J Hartmann Australia Member (2014) Thai to English + ... MODERATOR TOPIC STARTER List of styles? | Jun 14, 2016 |
Anton Konashenok wrote: If the formatting remains locked after being cleared, it is most likely due to Word styles. Open the list of styles used in the document and delete the styles responsible for that formatting. I have never found instructions for this. Most point only to the clear formatting icon. Nevertheless, shouldn't a .txt be clear of all styles, formatting and be safe to use? While I have my own issues with the MS Word docs, something must lead the agencies to not allow OCR! | | | Dylan J Hartmann Australia Member (2014) Thai to English + ... MODERATOR TOPIC STARTER Still locked | Jun 14, 2016 |
Well, even using a .txt has caused issues. It seems to be related to MS Word and Thai fonts because the latin fonts can be formatted fine. My process was as follows: Exported the OCR as plain text .txt file. Opened with MS Word. Bolded the heading of the first line (worked) and then tried to correct the spelling of the first character of the first word. The new text that I typed couldn't be bolded! However, if I typed new text in... See more Well, even using a .txt has caused issues. It seems to be related to MS Word and Thai fonts because the latin fonts can be formatted fine. My process was as follows: Exported the OCR as plain text .txt file. Opened with MS Word. Bolded the heading of the first line (worked) and then tried to correct the spelling of the first character of the first word. The new text that I typed couldn't be bolded! However, if I typed new text in English, it could! If anyone can clarify this situation, it'd be very appreciated! ▲ Collapse | | | esperantisto Local time: 20:01 Member (2006) English to Russian + ... SITE LOCALIZER
You should better share a file (a sample page where the problem appears). | | | Pages in topic: [1 2] > | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Is this the solution to formatting problems from OCR? CafeTran Espresso | You've never met a CAT tool this clever!
Translate faster & easier, using a sophisticated CAT tool built by a translator / developer.
Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools.
Download and start using CafeTran Espresso -- for free
Buy now! » |
| Protemos translation business management system | Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!
The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |