Offline tool to compare two word lists Thread poster: Hans Lenting
|
I'm looking for an offline tool (script, macro, ...) to compare two word lists, either case-sensitive or case-insensitive, and create a third list, containing all words that are present in both compared lists. Both lists contain exactly one word per line. The higher ASCII range (ä, ß etc.) should be supported. | | | Tony M France Local time: 21:42 Member French to English + ... SITE LOCALIZER
Could you do it going via Excel? Somthing like IF (value in Column A) = (value in Column B), THEN Column C = (value in Column A), ELSE [0] Then when you copy back to word, it would be easy enough to sort the table on Column C, and manually remove all the lines where C is empty, finally resorting alphabetically on (say) C if that's important. | | |
I'd look for a diff tool for text files/directories (Meld, Diffuse, Beyond Compare, etc.) or one that is specifically for Excel (ExcelMerge) if you prefer that route. | | | esperantisto Local time: 22:42 Member (2006) English to Russian + ... SITE LOCALIZER
|
|
Samuel Murray Netherlands Local time: 21:42 Member (2006) English to Afrikaans + ... Try my little glossary comparison scripts (AutoIt) | Oct 26, 2019 |
Hans Lenting wrote: I'm looking for an offline tool (script, macro, ...) to compare two word lists, either case-sensitive or case-insensitive, and create a third list, containing all words that are present in both compared lists. ... Both lists contain exactly one word per line. The higher ASCII range (ä, ß etc.) should be supported. Oh, dear. Well, I may have something that you can use while you search for the perfect solution: http://www.leuce.com/autoit/WFC%20Glossary%20Comparer.zip Each of these two scripts attempts to compare two Wordfast Classic glossaries (which are tab-delimited files). I tried to quickly adapt one of them for comparing word lists that contain only 1 column (i.e. your scenario), but I'm afraid I'm too stoned right now. So, what you need to do, is temporarily replace any existing tabs in your files with a marker, e.g. "|||", and then add a tab to the end of each line (i.e. replace \n with \t\n, or replace CRLF with TAB & CRLF, or whatever), and then use the "compare column 1" script. Also type "NONE" when prompted. The readme file is your friend. The script outputs two additional files, named after the two original files. If an entry occurs in both files, it gets the word [BOTH] added in front of it. If an entry occurs in one file only, then, well, it just remains in that file. Look, I used these scripts during a large translation project but did not develop them beyond the point where they were useful to me at the time. These scripts are SLOW with large files, though.
[Edited at 2019-10-26 13:00 GMT] | | | Jean Lachaud United States Local time: 15:42 English to French + ...
Top off my head: Add the content of one list to the other import/copy into an Excel column Sort the column (if required) ([Data Tab | Sort]) Remove Duplicates ([Data Tab | Remove Duplicates]) | | | Samuel Murray Netherlands Local time: 21:42 Member (2006) English to Afrikaans + ...
Jean Lachaud wrote: Add the content of one list to the other Import/copy into an Excel column Remove Duplicates ([Data | Data Tools | Remove Duplicates]) If you do this, then you end up with a column that contains all terms. The way I understand it, Hans wants only terms that occur in both files. If a term occurs only in one file, then he doesn't what that term. In other words (if we assume that duplicates (except one instance, of course) were already removed from each list individually), then step #3 should be something like "remove non-duplicates" (i.e. remove all terms that appear only once in the list). | | | Jean Lachaud United States Local time: 15:42 English to French + ...
You are right. Still, I'm pretty sure there is a quick way to do that in Excel, but I don't have time today to research it. Samuel Murray wrote: If you do this, then you end up with a column that contains all terms. The way I understand it, Hans wants only terms that occur in both files. If a term occurs only in one file, then he doesn't what that term. In other words (if we assume that duplicates (except one instance, of course) were already removed from each list individually), then step #3 should be something like "remove non-duplicates" (i.e. remove all terms that appear only once in the list). | |
|
|
Samuel Murray Netherlands Local time: 21:42 Member (2006) English to Afrikaans + ... @Hans, here's a superfast one | Oct 26, 2019 |
Hans Lenting wrote: I'm looking for an offline tool [etc.] I found an AutoIt script that can do this, cannibalized it a bit, and here you go: http://www.leuce.com/autoit/compare_two_lists.zip It's super, super fast. It doesn't sort the files. It creates three files: one with terms that occur only in file 1, one with terms that occur only in file 2, and one with only terms that occur in both files. Note that the script counts all instances of a term in either file as a single term (put differently: so if a term occurs twice in the same file, the script counts it as one term only; put differently: the script removes all duplicates from each file's content before comparing the two files). It leaves the original files intact.
[Edited at 2019-10-26 15:13 GMT] | | | Luca Tutino Italy Member (2002) English to Italian + ... Just add a couple of variations to Jean solution (case sensitive) | Oct 26, 2019 |
Before merging the lists your should eliminate any repetition from each list separately, by using the Excel remove duplicates command. Then you merge them and sort the merged list as suggested by Jean. Now, you can add a formula like this in Cell B2: =identical(A2;A1). Copy the cell B2 in the remaining rows of column B, and you automatically get =identical(A3;A2) in Cell B3 and so on. The formula will indicate "True" for the terms appearing twice, which m... See more Before merging the lists your should eliminate any repetition from each list separately, by using the Excel remove duplicates command. Then you merge them and sort the merged list as suggested by Jean. Now, you can add a formula like this in Cell B2: =identical(A2;A1). Copy the cell B2 in the remaining rows of column B, and you automatically get =identical(A3;A2) in Cell B3 and so on. The formula will indicate "True" for the terms appearing twice, which means originally appearing in both lists, and false for all the other terms, as well as for the first appearance of the double terms. Use the Automatic Filter in column B to select the "True" rows. Copy the filtered column A in a new worksheet, and you have your desired list.
[Edited at 2019-10-26 16:22 GMT]
[Edited at 2019-10-26 16:24 GMT] ▲ Collapse | | | Luca Tutino Italy Member (2002) English to Italian + ... Additional step for case insensitive | Oct 26, 2019 |
Just add the function "=upper(A1)" in B1 and copy Cell B1 in the remaining rows of column B. Then proceed as above by referring the "identical" formula to column B rather than column A and placing it in column C rather than column B.
[Edited at 2019-10-26 16:24 GMT] | | | Samuel Murray Netherlands Local time: 21:42 Member (2006) English to Afrikaans + ...
The script assumes Windows line breaks (CRLF), so if your files have Unix line breaks, try changing CRLF to LF in the script. | |
|
|
Hans Lenting Netherlands Member (2006) German to Dutch TOPIC STARTER
Samuel Murray wrote: The script assumes Windows line breaks (CRLF), so if your files have Unix line breaks, try changing CRLF to LF in the script. Thank you all! I've used the second script that Samuel provided. @Samuel, if you can find a case-insensitive solution, I'd be much obliged. @Jean: I'll test the Mac version of Beyond Compare. | | | To report site rules violations or get help, contact a site moderator: You can also contact site staff by submitting a support request » Offline tool to compare two word lists Trados Business Manager Lite | Create customer quotes and invoices from within Trados Studio
Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.
More info » |
| Trados Studio 2022 Freelance | The leading translation software used by over 270,000 translators.
Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop
and cloud solution, empowering you to work in the most efficient and cost-effective way.
More info » |
|
| | | | X Sign in to your ProZ.com account... | | | | | |