How to extract text from a website
Thread poster: John Detre
John Detre
John Detre  Identity Verified
Canada
French to English
Sep 7, 2011

I have a very basic question about translating websites that has no doubt been asked and answered many times before, but a cursory search of the forums hasn't turned up a solution.

I need to translate a website that is already live, including invisible text (e.g. keywords) and unselectable text, but not including html code. Is there a tool I can use to extract all the relevant text and import it into a word processor?

If anyone can help with this or point me towards a
... See more
I have a very basic question about translating websites that has no doubt been asked and answered many times before, but a cursory search of the forums hasn't turned up a solution.

I need to translate a website that is already live, including invisible text (e.g. keywords) and unselectable text, but not including html code. Is there a tool I can use to extract all the relevant text and import it into a word processor?

If anyone can help with this or point me towards a thread in which this has already been discussed, I would be most grateful. Thanks in advance and my apologies for asking such a rudimentary (to some) question.
Collapse


 
MH TRADUCCIONES
MH TRADUCCIONES  Identity Verified
Argentina
Local time: 21:11
English to Spanish
+ ...
Tool for websites Sep 7, 2011

Software: Sharepoint Designer 2007 / TRADOS 2007 (Tag Editor)

 
Joakim Braun
Joakim Braun  Identity Verified
Sweden
Local time: 02:11
German to Swedish
+ ...
Dynamic HTML ps Sep 7, 2011

Be aware that pages may be modified on-the-fly with scripting. These days, the initial page as sent from the server is very likely NOT what you see.

You need to make sure that the HTML saved is the HTML of the page DOM at that point (not the HTML as initially served).

(But a dynamic website will hardly be translated that way, so on reflection never mind.)

[Bearbeitet am 2011-09-07 19:29 GMT]


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 02:11
English to Hungarian
+ ...
Tricky Sep 7, 2011

This comes up pretty regularly here, and the only good answer is: arrange a three-way chat between the client, yourself and the webmaster who set the page up. There are a million ways for things to go wrong or get misunderstood.
You could use httrack or wget to download the website in question, but there are more than a few ways this can go wrong. Then you could translate the resulting html files with a CAT, but your client may not be able to use your HTML files directly.


... See more
This comes up pretty regularly here, and the only good answer is: arrange a three-way chat between the client, yourself and the webmaster who set the page up. There are a million ways for things to go wrong or get misunderstood.
You could use httrack or wget to download the website in question, but there are more than a few ways this can go wrong. Then you could translate the resulting html files with a CAT, but your client may not be able to use your HTML files directly.


For instance, you seem to be saying that you're expected to extract text and just deliver plain text to your client instead of HTML, but that doesn't sound like a good idea. If they want to use the translation on the website, they would need to manually copy-paste each phrase or paragraph to the right place... There has to be a better way.

If your client is sending you HTML files to translate, you could try and use a CAT to translate them. If you don't use a CAT, that's too bad...
Collapse


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

How to extract text from a website






Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »