Wanted: context macro Thread poster: Heinrich Pesch
|
Heinrich Pesch Finland Local time: 21:33 Member (2003) Finnish to German + ...
When you use a CAT-tool, there is usually some context search function. You have to highlight the word or part of it or a group of words and run the search function. But this search is restricted to the active TM. I think it would be useful if there were a macro in Word, that would search for every word in a given document within a defined folder (which would include txt-files like tms, glossaries or plain textfiles like uncleaned bilingual files) and assign colors to the words in th... See more When you use a CAT-tool, there is usually some context search function. You have to highlight the word or part of it or a group of words and run the search function. But this search is restricted to the active TM. I think it would be useful if there were a macro in Word, that would search for every word in a given document within a defined folder (which would include txt-files like tms, glossaries or plain textfiles like uncleaned bilingual files) and assign colors to the words in the doc. Black would mean "no match", red "100% match" and blue "partial match". After that search one would know, which words are to be found in the reference files and which not, so one would not search in vain. I notice I spend considerable time running context searches without finding anything. Have you any knowledge of such a macro or program, do you think it useful or not? Regards Heinrich ▲ Collapse | | |
Vito Smolej Germany Local time: 20:33 Member (2004) English to Slovenian + ... SITE LOCALIZER Google desktop? | Jun 26, 2008 |
Heinrich Pesch wrote: Have you any knowledge of such a macro or program, do you think it useful or not? Regards Heinrich | | |
Jaroslaw Michalak Poland Local time: 20:33 Member (2004) English to Polish SITE LOCALIZER Apsic Xbench | Jun 26, 2008 |
It's not exactly what you require, but I think it is close enough... You can load various reference files and search for the terms quite quickly. http://www.apsic.com/en/products_xbench.html The only downside is that it loads all reference files into memory (no indexing), so it can slow down your system, especially if you have low RAM... | | |
Giles Watson Italy Local time: 20:33 Italian to English In memoriam ApSIC XBench | Jun 26, 2008 |
Hi Heinrich, For that sort of thing, I use the freeware ApSIC XBench, which supports a wide range of formats: http://www.apsic.com/en/products_xbench.html You create an XBench project with the files you want to consult and then invoke the program when you need it. Highlight the text and hit Ctrl+Alt+Ins, which brings up source and target in context in th... See more Hi Heinrich, For that sort of thing, I use the freeware ApSIC XBench, which supports a wide range of formats: http://www.apsic.com/en/products_xbench.html You create an XBench project with the files you want to consult and then invoke the program when you need it. Highlight the text and hit Ctrl+Alt+Ins, which brings up source and target in context in the project documents. It doesn't do colours but it is a very useful extension of CAT concordance functions. HTH Giles ▲ Collapse | |
|
|
Heinrich Pesch Finland Local time: 21:33 Member (2003) Finnish to German + ... TOPIC STARTER Slowing down | Jun 26, 2008 |
I do not want to play around with the mouse and try each word myself. For that the normal the context- or concordance search in Wf or Trados is good enough. But I realise a search for each word in a doc would take a lot of prosessor power, unless the software is well organised. I had Google desktop installed for years, but didn't find it useful and it slowed down my machine, so I got rid of it. The macro or software I was thinking of would only work when it is required and not... See more I do not want to play around with the mouse and try each word myself. For that the normal the context- or concordance search in Wf or Trados is good enough. But I realise a search for each word in a doc would take a lot of prosessor power, unless the software is well organised. I had Google desktop installed for years, but didn't find it useful and it slowed down my machine, so I got rid of it. The macro or software I was thinking of would only work when it is required and not all the time. ▲ Collapse | | |
Kristina Kolic Croatia Local time: 20:33 English to Croatian + ... SITE LOCALIZER Copernic Desktop Search | Jun 26, 2008 |
For this purpose, I use Copernic Desktop Search. It allows you to view all occurrences of a term in a specific directory or folder. http://www.copernic.com/en/products/desktop-search/index.html I find it quite useful and have no memory issues. You can search for a word, several words or 100% matches. And when you're done, you just close it (to avoid... See more For this purpose, I use Copernic Desktop Search. It allows you to view all occurrences of a term in a specific directory or folder. http://www.copernic.com/en/products/desktop-search/index.html I find it quite useful and have no memory issues. You can search for a word, several words or 100% matches. And when you're done, you just close it (to avoid slowing down your system). ▲ Collapse | | |
Heinrich Pesch wrote: I do not want to play around with the mouse and try each word myself. For that the normal the context- or concordance search in Wf or Trados is good enough. What do you want then? A software that automatically does a concordance search for ALL the words of your document? Without slowing down the system? That's not going to happen. I thought this was about basically doing a concordance search in lots of files at the same time, and xbench sounds like the best solution from the posts so far. BTW Google Desktop doesn't really slow down modern machines. On my comp it uses 30MB of memory at the moment (quite a lot but when you have 2GB it doesn't matter, just kill GD when you're doing something really really memory intensive). And it's using all of 2 per cent of my processor resources... The bottom line is, if you have a decent computer with at least 1GB of RAM you'll never ever notice it's there. It uses lots of system resources when it's doing the initial indexing, but that's supposed to happpen when you're not at the machine, and only once (unless you dump huge text files on your computer at once). | | |
|
|
Vito Smolej Germany Local time: 20:33 Member (2004) English to Slovenian + ... SITE LOCALIZER It's the "big O" question | Jun 27, 2008 |
a search for each word in a doc would take a lot of processor power, unless the software is well organized. O(n) function tells, how the time to do something increases with the size of the problem to be solved. Like, does it increase linearly (i.e. O(n))? Or is it - gasp - a case of O(n^2)? Or hopefully O(nlogn) or less? Does software use hashing(Google for sure does), dictionary structures? ... etc. Note that all this addresses the retrieval time, not adding new stuff to the existing corpus. If it the subject so important, I would buy myself a bare-bone for 200euro and let all this run on it. Or may be a NAS machine could well do the job too - after all, this is where all the texts should be sitting anyhow. BUT - it takes work and we have no time - we have orders to do (g). Regards Vito
[Edited at 2008-06-27 14:11] | | |
Heinrich Pesch Finland Local time: 21:33 Member (2003) Finnish to German + ... TOPIC STARTER Simple after all? | Jun 28, 2008 |
I started to think differently about this task. I dropped the idea that the original document should be color-coded. Instead I would go for a list of unknown words. This could be arrived at using only two basic functions. 1. The reference txt-file folder is converted into an alphabetical wordlist. Each word in the material would appear only once, so the list would be rather short. No reference to the original file needed. The function would be run only once, and the list would... See more I started to think differently about this task. I dropped the idea that the original document should be color-coded. Instead I would go for a list of unknown words. This could be arrived at using only two basic functions. 1. The reference txt-file folder is converted into an alphabetical wordlist. Each word in the material would appear only once, so the list would be rather short. No reference to the original file needed. The function would be run only once, and the list would be updated when enough new material has been added to the folder. 1a. The same function is applied to the source folder, which has the files of a new project in txt-format. It would create an alphabetical list of the untranslated material. The size of the list would grow only moderately with increasing word-count. A project with 10 thousand words would perhaps deliver a list with 2 thousand individual words, a project of 100 thousand words perhaps 4 thousand words, or what do you think? After all most of the stuff consists always of the same basic vocabularity. 2. A second function would compare the source list to the reference list and create a list with only "unknown" words. Viewing this wordlist the translator would be able to decide, how difficult the project is and how much s/he has to research. The length of this list together with the word count of the project would be a much better mark of comparison than the word count alone. I believe both functions could exist already in a programming library, e.g. Linux or Java. If I only knew some hacker that could put them to work. Perhaps these function are already available in some software tool, who knows? Regards Heinrich ▲ Collapse | | |
should be easy | Jun 28, 2008 |
This is definitely child's play for anyone with the tiniest bit of programming knowledge (which I don't have). BTW you need a programmer, not a hacker. You'd start by converting/pasting your text into a .txt file which simple software can handle, then running a tokenizer or simply a word s&r macro that removes all the punctuation. Then comes the trickier bit that a perl script or similar sw can do easily. Actually, I'm sure even Word+Excel is capable of doing wha... See more This is definitely child's play for anyone with the tiniest bit of programming knowledge (which I don't have). BTW you need a programmer, not a hacker. You'd start by converting/pasting your text into a .txt file which simple software can handle, then running a tokenizer or simply a word s&r macro that removes all the punctuation. Then comes the trickier bit that a perl script or similar sw can do easily. Actually, I'm sure even Word+Excel is capable of doing what you need, if you don't mind going the somewhat slower, somewhat manual way. You can use a word macro to remove punctuation and s&r all spaces with line breaks and then remove multiple line breaks. You should get one word per line. Then paste the word list into excel, order it alphabetically and remove duplicates with a filter. Then all you need is a function that compares two columns and produces a list of the elements found in one but not the other. Excel may have one built in or it may be fairly easy to make, I'm not sure which. I tried to fool around with filters and functions a bit but it's not working and I have no more time. The function that should do this returns only partially correct results for me and I can't troubleshoot it now. My excel is in hungarian so I don't know the function's english/german name. I'll start poking it again next week if nobody else does it until then. Again, the excel operations should be fairly easy to automatize with a macro. The only problem with such an evaluation is that you don't know how many of the new words are actually specialized terms. But then you could browse the list of course.
[Edited at 2008-06-28 07:55] ▲ Collapse | | |
I thought about it a bit and there is one trivial way of getting a "new word count": Copy the one word per line word list into excel, filter it (roughly: Data/Filter/advanced filter/Other location and Show only unique records) Then copy it under the similarly prepared word list of translated material. Filter the whole thing again, and see how many words you get. The difference between that and the word count of the "translated words" list is the number of new words. ... See more I thought about it a bit and there is one trivial way of getting a "new word count": Copy the one word per line word list into excel, filter it (roughly: Data/Filter/advanced filter/Other location and Show only unique records) Then copy it under the similarly prepared word list of translated material. Filter the whole thing again, and see how many words you get. The difference between that and the word count of the "translated words" list is the number of new words. With languages like hungarian that make extensive use of suffixes, one would ideally run a program that reverts every word back to base form before putting the list in excel... the filtering shrank my 43000 word sample down to 13000 (i.e. there were 13000 different words in a text of 43000 words), but I'm sure it's actually well under 10000 words in various inflected forms. Again, I'll see how you can extract a full word list, not just a number. It must be easy. The excel functions are killing me with their silly bahaviour, retrieving the nearest "value" being searcehd, instead of "NOT FOUND". ▲ Collapse | | |