https://www.proz.com/forum/general_technical_issues/111619-automatic_process_to_add_tab_paragraph_marker_after_before_english_or_asian_chinese_words.html

Automatic process to add tab/paragraph marker after/before english or asian (chinese) words?
Thread poster: 855649 (X)
 855649 (X)
855649 (X)  Identity Verified

Local time: 12:16
English to Chinese
+ ...
Aug 2, 2008

Stupid question I'm pretty sure, I wouldn't have a clue how to do this personally, but I thought maybe some of the pros here would IF it's actually possible.

I want to turn some glossaries I found on the internet into a trados termbase. In order to do that, I gotta turn...
Thermal efficiency 热效率
Thermal equivalent of work 热功当量
Thermal expansion 热膨胀
...into...
Thermal efficiency --> 热效率
Thermal equivalent of work --> 热�
... See more
Stupid question I'm pretty sure, I wouldn't have a clue how to do this personally, but I thought maybe some of the pros here would IF it's actually possible.

I want to turn some glossaries I found on the internet into a trados termbase. In order to do that, I gotta turn...
Thermal efficiency 热效率
Thermal equivalent of work 热功当量
Thermal expansion 热膨胀
...into...
Thermal efficiency --> 热效率
Thermal equivalent of work --> 热功当量
Thermal expansion --> 热膨胀

With tabs (-->) in between (to put in excel column). The only way I can see a "replace all" working, or some sort of macro/template, is if it can somehow detect the space before asian characters (or even English would be ok) and then, like replace, insert a tabulator mark in its place. When you have a list of 1000 words, using the mouse to click the space between each one and then hitting tab becomes really tedious! Does anyone have any suggestions? Thanks so much for any help.


[Edited at 2008-08-02 13:05]
Collapse


 
Jaroslaw Michalak
Jaroslaw Michalak  Identity Verified
Poland
Local time: 06:16
Member (2004)
English to Polish
SITE LOCALIZER
A little trick... Aug 2, 2008

Yes, there is a little trick that allows that, but only if the language attribute is different for both entries...

In Word open the "Find and replace" dialog, in the "Find" field select language "English" (from the drop-down button below). In the "Replace" field enter ^&^t (caret followed by ampersand, followed by caret followed by a letter "t"). Or you can select "Found text" and "Tab" from the "Special" menu.

If the entries are all-text, without the language attribute
... See more
Yes, there is a little trick that allows that, but only if the language attribute is different for both entries...

In Word open the "Find and replace" dialog, in the "Find" field select language "English" (from the drop-down button below). In the "Replace" field enter ^&^t (caret followed by ampersand, followed by caret followed by a letter "t"). Or you can select "Found text" and "Tab" from the "Special" menu.

If the entries are all-text, without the language attribute (as you say you copied them from the Internet), it would be harder... Basically, you need a text editor that works with regular expressions and have to match the character codes of the Asian text. However, this depends on the coding of the text, so I cannot give you precise instructions.
Collapse


 
Boyan Brezinsky
Boyan Brezinsky  Identity Verified
Bulgaria
Local time: 07:16
English to Bulgarian
+ ...
It can be done with regular expressions Aug 2, 2008

Reading the last suggestion on regular expressions, I realized that it can be done without knowledge of the encoding scheme. The idea is to insert a separator character before the first occurence of something that is NOT an ASCII-symbol and space (and maybe digits and punctuation marks as well).
This can be done with 'find and replace' based on regular expressions, depending on the versatility of the regex engine in the text editor.
Or it can be done with a relatively simple macro (
... See more
Reading the last suggestion on regular expressions, I realized that it can be done without knowledge of the encoding scheme. The idea is to insert a separator character before the first occurence of something that is NOT an ASCII-symbol and space (and maybe digits and punctuation marks as well).
This can be done with 'find and replace' based on regular expressions, depending on the versatility of the regex engine in the text editor.
Or it can be done with a relatively simple macro (albeit slow) in MS Word or other programmable text editor that looks sequentially at all the characters in the line and inserts a separator before the first non-Latin character.
By the way, for a list of 1000 entries it would be faster to do it manually than to write a macro - using the text editor find and replace function to search for spaces and replace them by tabs at the appropriate positions. One just needs to be careful with answering 'Yes' or 'No'. The time it takes to write and test a macro would pay off only if the macro is going to be used repeatedly.
Collapse


 
Jaroslaw Michalak
Jaroslaw Michalak  Identity Verified
Poland
Local time: 06:16
Member (2004)
English to Polish
SITE LOCALIZER
Use the spaces to your advantage... Aug 2, 2008

bsb's answer gave me another idea: replace all the spaces between regular ascii characters with a special sequence, then replace all the spaces with tabs, then replace the special sequence again with spaces. I.e. in Word:

1.
Search (with regular expressions): ([a-z]) ([a-z])
Replace: \1###\2

2.
Search: " " (space)
Replace: ^t

3. Search: ###
Replace: " " (space)

Of course, this assumes that in the English phrases t
... See more
bsb's answer gave me another idea: replace all the spaces between regular ascii characters with a special sequence, then replace all the spaces with tabs, then replace the special sequence again with spaces. I.e. in Word:

1.
Search (with regular expressions): ([a-z]) ([a-z])
Replace: \1###\2

2.
Search: " " (space)
Replace: ^t

3. Search: ###
Replace: " " (space)

Of course, this assumes that in the English phrases there are no special symbols (like apostrophes, quotes, etc.) at the word boundary. If there are, you have to include them in the first search expression (e.g. [a-z@#$%^&*]).
Collapse


 
 855649 (X)
855649 (X)  Identity Verified

Local time: 12:16
English to Chinese
+ ...
TOPIC STARTER
Got it working for the most part, minus a few errors Aug 3, 2008

Amazing suggestions. Thanks bsb_2, though I'm not quite sure what you mean by "The idea is to insert a separator character before the first occurence of something that is NOT an ASCII-symbol and space". I'm don't know how to specify non ascii-symbols. Can MS Word do it?

Jabberwock, thanks a ton as well, those steps 1, 2, and 3 helped me a ton. Although the problem, like you said, is when there's a lot of () or [], and A / C for example. It's not picking up uppercase letters (simp
... See more
Amazing suggestions. Thanks bsb_2, though I'm not quite sure what you mean by "The idea is to insert a separator character before the first occurence of something that is NOT an ASCII-symbol and space". I'm don't know how to specify non ascii-symbols. Can MS Word do it?

Jabberwock, thanks a ton as well, those steps 1, 2, and 3 helped me a ton. Although the problem, like you said, is when there's a lot of () or [], and A / C for example. It's not picking up uppercase letters (simple way around that is convert all uppercase to lowercase though), but the main problems are the () and [] and / . I tried using the [a-z@#$%^&*] code, but one, those characters aren't included in it, two, it gives me an error saying "^& is not a valid character". However, for anything that doesn't contain those special characters and consists of basically just pure English text (in the English translation of course), it's a success, thanks so much!
Collapse


 
Jaroslaw Michalak
Jaroslaw Michalak  Identity Verified
Poland
Local time: 06:16
Member (2004)
English to Polish
SITE LOCALIZER
Sorry... Aug 3, 2008

The code I gave in the comment was simply an example... To match round brackets and / (and uppercase letters) use: [A-Za-z/)(]

The square brackets cannot be matched (and I think they cannot be escaped?), but there is an easy workaround: before the described operation just find and replace them with a placeholder (that is, with a character, or a sequence of characters, which does not appear in the original text). For example, replace [ with % and ] with $ (provided that % and $ do n
... See more
The code I gave in the comment was simply an example... To match round brackets and / (and uppercase letters) use: [A-Za-z/)(]

The square brackets cannot be matched (and I think they cannot be escaped?), but there is an easy workaround: before the described operation just find and replace them with a placeholder (that is, with a character, or a sequence of characters, which does not appear in the original text). For example, replace [ with % and ] with $ (provided that % and $ do not appear in the text).
Collapse


 
 855649 (X)
855649 (X)  Identity Verified

Local time: 12:16
English to Chinese
+ ...
TOPIC STARTER
Ah ha Aug 3, 2008

Ah ha, thats what I was looking for. I'm not familiar with the internal codes of word, and personally, I had no idea that you could even support codes like [a-z]. I then tried [A-Z] to see if that worked, but it didn't, and when I tried [a-z@#$%^&*] and added a () in there, it didn't gave an error, so I figured guessing wouldn't work and gave up. Thank again (a ton) for that bit of extra info.

 
 855649 (X)
855649 (X)  Identity Verified

Local time: 12:16
English to Chinese
+ ...
TOPIC STARTER
problems with the 'replace with' code now Aug 4, 2008

Jabberwock wrote:

The code I gave in the comment was simply an example... To match round brackets and / (and uppercase letters) use: [A-Za-z/)(]


Hi Jabberwock, I tried using the ([A-Za-z/)(] [A-Za-z/)(]) code today, and it found them just fine, but when I wanted to replace with \1###\2, this is what happened...

it turned this...

rack link 齿条联接杆
rack nut 齿条螺母
rack shaft 齿条

...into...

rack l###ink 齿条联接杆
rack n###ut 齿条螺母
rack s###haft 齿条

Nothing changed at all as far as I can tell. Should the \1###\2 code be altered too perhaps?


 
Jaroslaw Michalak
Jaroslaw Michalak  Identity Verified
Poland
Local time: 06:16
Member (2004)
English to Polish
SITE LOCALIZER
Sorry again... Aug 4, 2008

When I tested the expression I forgot you need to reuse the matched portions... In that case you cannot match parentheses either, you have to replace them with a placeholder just like in case of square brackets...

 
 855649 (X)
855649 (X)  Identity Verified

Local time: 12:16
English to Chinese
+ ...
TOPIC STARTER
Easy enough Aug 5, 2008

Jabberwock wrote:

When I tested the expression I forgot you need to reuse the matched portions... In that case you cannot match parentheses either, you have to replace them with a placeholder just like in case of square brackets...


Ok, that's easy enough. Thanks again Jabberwock, you've been a great help to me on this.


 
Jaroslaw Michalak
Jaroslaw Michalak  Identity Verified
Poland
Local time: 06:16
Member (2004)
English to Polish
SITE LOCALIZER
No problem... Aug 5, 2008

I'm glad that I could help. Maybe even someone else will benefit from the discussion...

 
 855649 (X)
855649 (X)  Identity Verified

Local time: 12:16
English to Chinese
+ ...
TOPIC STARTER
one more question Aug 5, 2008

Jabberwock, do you think you can help me out once more? I searched on the internet for more of the ms codes so I can stop bothering you and figure them out for myself, but I just don't get it for my next situation. I'm taking Chinese to English now (instead of English to Chinese).

Here's an example.
同步器式变速器 synchromesh transmission
直接档变速器 direct drive transmission
超速档变速器 over drive transmission

I'm searching f
... See more
Jabberwock, do you think you can help me out once more? I searched on the internet for more of the ms codes so I can stop bothering you and figure them out for myself, but I just don't get it for my next situation. I'm taking Chinese to English now (instead of English to Chinese).

Here's an example.
同步器式变速器 synchromesh transmission
直接档变速器 direct drive transmission
超速档变速器 over drive transmission

I'm searching for: Search: ([!a-z] [a-z])
Which finds the space between the Chinese and the English. But adding the tab is my problem. Replacing with a "^t" cuts off the last Chinese character and the first English letter. If I try something like \1^t\2, it says "The Replace With text contains a group number which is out of range". Any variation of that, like ^t\2, \1###\2, etc give the same error. Do you know what I'm doing wrong?

[Edited at 2008-08-05 15:45]
Collapse


 
Jaroslaw Michalak
Jaroslaw Michalak  Identity Verified
Poland
Local time: 06:16
Member (2004)
English to Polish
SITE LOCALIZER
Try the same... Aug 5, 2008

Hmm... Actually, you should try the same procedure as in the previous case. This allows you to find spaces within the English term and replace with the placeholder ### (step 1), then the remaining spaces are (or should be) only the spaces between Chinese and English - you replace them with tabs (step 2), finally, you restore the English spaces (step 3).

For your interest, you should not write ([!a-z] [a-z]), as this means "group 1 is non-lower ASCII character followed by space follo
... See more
Hmm... Actually, you should try the same procedure as in the previous case. This allows you to find spaces within the English term and replace with the placeholder ### (step 1), then the remaining spaces are (or should be) only the spaces between Chinese and English - you replace them with tabs (step 2), finally, you restore the English spaces (step 3).

For your interest, you should not write ([!a-z] [a-z]), as this means "group 1 is non-lower ASCII character followed by space followed by lower ASCII character", but ([!a-z]) ([a-z]), which means "group 1 is non-lower ASCII character, then follows space, then group 2 is ASCII character.

I know that regexp (regular expressions) can get confusing, especially in Word, which uses a non-standard flavor of regexp.

If you are really interested, you might look at Perl. It is a programming language which allows doing such conversion with two or three lines of code which can be then reused over and over. Of course, first you have to write a few dozen non-working programs (I'm not there yet myself...).
Collapse


 


To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon[Call to this topic]

You can also contact site staff by submitting a support request »

Automatic process to add tab/paragraph marker after/before english or asian (chinese) words?






Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »