extract text from xml files
Autor de la hebra: barryw
barryw
barryw
Local time: 20:33
inglés al chino
+ ...
Nov 2, 2010

dear all,
is there any one who knows how to extract text (plain text) from xml files? any hany software?
thank you!


 
Brand Localization
Brand Localization
Egipto
Local time: 15:33
alemán al árabe
+ ...
There is a work arround :) Nov 2, 2010

Hi Brrawy,

The following is very useful in most cases:

1- right click the XML file
2- Choose Edit (with Notepad for example)
3- When the file is opened, choose "File" | Save as
4- In the File name field change the file extension into ".html"
5- Save the file (will be saved as a webpage)
6- Open the web page,
you'll find the pure text with the XML tags and more over with format
7- In this way you can copy this text and paste it
... See more
Hi Brrawy,

The following is very useful in most cases:

1- right click the XML file
2- Choose Edit (with Notepad for example)
3- When the file is opened, choose "File" | Save as
4- In the File name field change the file extension into ".html"
5- Save the file (will be saved as a webpage)
6- Open the web page,
you'll find the pure text with the XML tags and more over with format
7- In this way you can copy this text and paste it in a word file for example to be able to deal with the text alone

Best regards

Your Arabic Translation Team
Collapse


Donna Escuin
 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 14:33
inglés al húngaro
+ ...
Here's one Nov 2, 2010

barryw wrote:

dear all,
is there any one who knows how to extract text (plain text) from xml files? any hany software?
thank you!



Here's a script of mine:
http://www.mediafire.com/?kq9yayc1hgt2kj9

Unzip, move your file to the tag_stripper folder and rename it to .html. Double click the .bat and follow the instructions. It's a bit crude but it should work... check the results of course, though.
The end result should be pretty much the same as opening the file in a browser and copying the content to a txt, but this solution will work with large files as well, while your browser definitely won't open a 50+ MB file for you.

Also, I have no idea why Arabic Translation Team posted such a convoluted solution. If you want to open the file in your browser, right click it, choose Open with... and pick the browser from the list. No need to change the extension, especially not by opening the file in another program first. If the browser is not on the "open with" list, choose "other program" and pick the browser from there.

The file extension doesn't change the type of the file ("save it as a webpage"). It's just an indication to the OS; it tells the OS what software to open the file with by default. You can easily override that default through the right-click local menu.

[Edited at 2010-11-02 20:48 GMT]


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 14:33
inglés al húngaro
+ ...
OS Nov 2, 2010

Note: the above solution only woks on Windows computers. Barryw failed to specify the OS he uses, so I assume it's some flavour of Windows.

 
barryw
barryw
Local time: 20:33
inglés al chino
+ ...
PERSONA QUE INICIÓ LA HEBRA
Thanks very much for your suggestions. Nov 3, 2010

Dear Arabic Translation Team and FarkasAndras,

Thanks very much for your suggestions.

Arabic Translation Team's solution works well in my case! quite a simple solution.

Thanks FarkasAndras for giving a detailed suggestion, though I still haven't time to try your link, but I believe it will be a good fix for dealing with large size files. Yet, regarding your second suggestion by opening the xml files directly via "Open with>browser" command, it seems it
... See more
Dear Arabic Translation Team and FarkasAndras,

Thanks very much for your suggestions.

Arabic Translation Team's solution works well in my case! quite a simple solution.

Thanks FarkasAndras for giving a detailed suggestion, though I still haven't time to try your link, but I believe it will be a good fix for dealing with large size files. Yet, regarding your second suggestion by opening the xml files directly via "Open with>browser" command, it seems it doesn't work in my case. The firefox just shows all the tags, while IE simply cannot open the xml file. Maybe I mess up something?

Anyway, thank you all for your contributions.
Collapse


 
Dawid Wietrzyk
Dawid Wietrzyk  Identity Verified
Polonia
Local time: 14:33
polaco al inglés
+ ...
It works.. Jan 23, 2012

FarkasAndras - I know the post is kind of old, but your script worked perfect for me, just what I needed. Thank you.

 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 14:33
inglés al húngaro
+ ...
You're welcome Jan 24, 2012

Dawid Wietrzyk wrote:

FarkasAndras - I know the post is kind of old, but your script worked perfect for me, just what I needed. Thank you.


Glad it worked. Now this script (probably a more refined version) and similar random bits and bobs are in the "grab bag" at http://sourceforge.net/projects/aligner/files/

Currently, the grab bag is at version 1.6. You'll always find the most recent version at the sourceforge url above.

[Edited at 2012-01-24 11:29 GMT]


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

extract text from xml files






Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »