Exporting Tables from PDF files
Thread poster: Bharg Shah
Bharg Shah
Bharg Shah  Identity Verified
India
Local time: 13:59
French to English
+ ...
Nov 1, 2003

Hi all,

One of my clients has given me a bilingual glossary of about 400 pages as a PDF file. The terms are arranged in a table in 2 distinct columns. I was wondering if I could convert this into a 2-column Excel worksheet which I could then import into Multiterm. I tried saving the PDF file as RTF but it doesn't retain the table format and all terms are just listed one after the other. The PDF is editable text and not a scanned image so I guess there must be some way to extract the
... See more
Hi all,

One of my clients has given me a bilingual glossary of about 400 pages as a PDF file. The terms are arranged in a table in 2 distinct columns. I was wondering if I could convert this into a 2-column Excel worksheet which I could then import into Multiterm. I tried saving the PDF file as RTF but it doesn't retain the table format and all terms are just listed one after the other. The PDF is editable text and not a scanned image so I guess there must be some way to extract the table. All help will be appreciated.
Collapse


 
Natalie
Natalie  Identity Verified
Poland
Local time: 10:29
Member (2002)
English to Russian
+ ...

MODERATOR
SITE LOCALIZER
Try using good OCR software Nov 1, 2003

For example, FineReader Pro version 6 or higher. If your file is large, then divide it first into smaller parts using full version of Acrobat, otherwise opening file in FineReader would last for ages.

After having opened the file, recognize the text as usually and then choose "Send to Word". 99% of formatting will be saved.


 
Harry Bornemann
Harry Bornemann  Identity Verified
Mexico
Local time: 02:29
English to German
+ ...
Write a macro Nov 1, 2003

I would write a macro in Word-VBA or Perl.
First you could insert a sign like # after every second end-of-paragraph mark and then search and replace until you got a tab separated table.

400 pages might be too much for FineReader and even too much for Word. That's where Perl becomes interesting, it would do it within a few seconds.
HTH,
Harry

[Edited at 2003-11-01 12:04]


 
Mónica Machado
Mónica Machado
United Kingdom
Local time: 09:29
English to Portuguese
+ ...
Fine Reader 7 could be useful Nov 1, 2003

Hello,

Fine Reader 7 could be useful. You can download a trial version for 15 days (serch under Abby). If 400 pages is too much for it, split the document in two. Fine Reader 7 works ok with 270 pages (I have never tried more than that for each doc).

Hope this helps

Regards,
Mónica


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

Exporting Tables from PDF files






CafeTran Espresso
You've never met a CAT tool this clever!

Translate faster & easier, using a sophisticated CAT tool built by a translator / developer. Accept jobs from clients who use Trados, MemoQ, Wordfast & major CAT tools. Download and start using CafeTran Espresso -- for free

Buy now! »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »