Automatic process to add tab/paragraph marker after/before english or asian (chinese) words? (General technical issues)

Technical forums » General technical issues »
Automatic process to add tab/paragraph marker after/before english or asian (chinese) words?
Track this topic

Automatic process to add tab/paragraph marker after/before english or asian (chinese) words?

Thread poster: 855649 (X)

855649 (X)

Local time: 12:16
English to Chinese
+ ...

Aug 2, 2008

Stupid question I'm pretty sure, I wouldn't have a clue how to do this personally, but I thought maybe some of the pros here would IF it's actually possible.

I want to turn some glossaries I found on the internet into a trados termbase. In order to do that, I gotta turn...
Thermal efficiency 热效率
Thermal equivalent of work 热功当量
Thermal expansion 热膨胀
...into...
Thermal efficiency --> 热效率
Thermal equivalent of work --> 热功当量
Thermal expansion --> 热膨胀

With tabs (-->) in between (to put in excel column). The only way I can see a "replace all" working, or some sort of macro/template, is if it can somehow detect the space before asian characters (or even English would be ok) and then, like replace, insert a tabulator mark in its place. When you have a list of 1000 words, using the mouse to click the space between each one and then hitting tab becomes really tedious! Does anyone have any suggestions? Thanks so much for any help.

[Edited at 2008-08-02 13:05] ▲ Collapse

Jaroslaw Michalak

Poland
Local time: 06:16
Member (2004)
English to Polish

SITE LOCALIZER

A little trick...

Aug 2, 2008

Yes, there is a little trick that allows that, but only if the language attribute is different for both entries...

In Word open the "Find and replace" dialog, in the "Find" field select language "English" (from the drop-down button below). In the "Replace" field enter ^&^t (caret followed by ampersand, followed by caret followed by a letter "t"). Or you can select "Found text" and "Tab" from the "Special" menu.

If the entries are all-text, without the language attribute (as you say you copied them from the Internet), it would be harder... Basically, you need a text editor that works with regular expressions and have to match the character codes of the Asian text. However, this depends on the coding of the text, so I cannot give you precise instructions. ▲ Collapse

Boyan Brezinsky

Bulgaria
Local time: 07:16
English to Bulgarian
+ ...

It can be done with regular expressions

Aug 2, 2008

Reading the last suggestion on regular expressions, I realized that it can be done without knowledge of the encoding scheme. The idea is to insert a separator character before the first occurence of something that is NOT an ASCII-symbol and space (and maybe digits and punctuation marks as well).
This can be done with 'find and replace' based on regular expressions, depending on the versatility of the regex engine in the text editor.
Or it can be done with a relatively simple macro (albeit slow) in MS Word or other programmable text editor that looks sequentially at all the characters in the line and inserts a separator before the first non-Latin character.
By the way, for a list of 1000 entries it would be faster to do it manually than to write a macro - using the text editor find and replace function to search for spaces and replace them by tabs at the appropriate positions. One just needs to be careful with answering 'Yes' or 'No'. The time it takes to write and test a macro would pay off only if the macro is going to be used repeatedly. ▲ Collapse

Jaroslaw Michalak

Poland
Local time: 06:16
Member (2004)
English to Polish

SITE LOCALIZER

Use the spaces to your advantage...

Aug 2, 2008

bsb's answer gave me another idea: replace all the spaces between regular ascii characters with a special sequence, then replace all the spaces with tabs, then replace the special sequence again with spaces. I.e. in Word:

1.
Search (with regular expressions): ([a-z]) ([a-z])
Replace: \1###\2

2.
Search: " " (space)
Replace: ^t

3. Search: ###
Replace: " " (space)

Of course, this assumes that in the English phrases t... See more

855649 (X)

Local time: 12:16
English to Chinese
+ ...

TOPIC STARTER

Got it working for the most part, minus a few errors

Aug 3, 2008

Amazing suggestions. Thanks bsb_2, though I'm not quite sure what you mean by "The idea is to insert a separator character before the first occurence of something that is NOT an ASCII-symbol and space". I'm don't know how to specify non ascii-symbols. Can MS Word do it?

Jabberwock, thanks a ton as well, those steps 1, 2, and 3 helped me a ton. Although the problem, like you said, is when there's a lot of () or [], and A / C for example. It's not picking up uppercase letters (simple way around that is convert all uppercase to lowercase though), but the main problems are the () and [] and / . I tried using the [a-z@#$%^&*] code, but one, those characters aren't included in it, two, it gives me an error saying "^& is not a valid character". However, for anything that doesn't contain those special characters and consists of basically just pure English text (in the English translation of course), it's a success, thanks so much! ▲ Collapse

Jaroslaw Michalak

Poland
Local time: 06:16
Member (2004)
English to Polish

SITE LOCALIZER

Sorry...

Aug 3, 2008

The code I gave in the comment was simply an example... To match round brackets and / (and uppercase letters) use: [A-Za-z/)(]

The square brackets cannot be matched (and I think they cannot be escaped?), but there is an easy workaround: before the described operation just find and replace them with a placeholder (that is, with a character, or a sequence of characters, which does not appear in the original text). For example, replace [ with % and ] with $ (provided that % and $ do n... See more

855649 (X)

Local time: 12:16
English to Chinese
+ ...

TOPIC STARTER

Ah ha

Aug 3, 2008

Ah ha, thats what I was looking for. I'm not familiar with the internal codes of word, and personally, I had no idea that you could even support codes like [a-z]. I then tried [A-Z] to see if that worked, but it didn't, and when I tried [a-z@#$%^&*] and added a () in there, it didn't gave an error, so I figured guessing wouldn't work and gave up. Thank again (a ton) for that bit of extra info.

855649 (X)

Local time: 12:16
English to Chinese
+ ...

TOPIC STARTER

problems with the 'replace with' code now

Aug 4, 2008

Jabberwock wrote:

The code I gave in the comment was simply an example... To match round brackets and / (and uppercase letters) use: [A-Za-z/)(]

Hi Jabberwock, I tried using the ([A-Za-z/)(] [A-Za-z/)(]) code today, and it found them just fine, but when I wanted to replace with \1###\2, this is what happened...

it turned this...

rack link 齿条联接杆
rack nut 齿条螺母
rack shaft 齿条

...into...

rack l###ink 齿条联接杆
rack n###ut 齿条螺母
rack s###haft 齿条

Nothing changed at all as far as I can tell. Should the \1###\2 code be altered too perhaps?

Jaroslaw Michalak

Poland
Local time: 06:16
Member (2004)
English to Polish

SITE LOCALIZER

Sorry again...

Aug 4, 2008

When I tested the expression I forgot you need to reuse the matched portions... In that case you cannot match parentheses either, you have to replace them with a placeholder just like in case of square brackets...

855649 (X)

Local time: 12:16
English to Chinese
+ ...

TOPIC STARTER

Easy enough

Aug 5, 2008

Jabberwock wrote:

When I tested the expression I forgot you need to reuse the matched portions... In that case you cannot match parentheses either, you have to replace them with a placeholder just like in case of square brackets...

Ok, that's easy enough. Thanks again Jabberwock, you've been a great help to me on this.

Jaroslaw Michalak

Poland
Local time: 06:16
Member (2004)
English to Polish

SITE LOCALIZER

No problem...

Aug 5, 2008

I'm glad that I could help. Maybe even someone else will benefit from the discussion...

855649 (X)

Local time: 12:16
English to Chinese
+ ...

TOPIC STARTER

one more question

Aug 5, 2008

Jabberwock, do you think you can help me out once more? I searched on the internet for more of the ms codes so I can stop bothering you and figure them out for myself, but I just don't get it for my next situation. I'm taking Chinese to English now (instead of English to Chinese).

Here's an example.
同步器式变速器 synchromesh transmission
直接档变速器 direct drive transmission
超速档变速器 over drive transmission

I'm searching for: Search: ([!a-z] [a-z])
Which finds the space between the Chinese and the English. But adding the tab is my problem. Replacing with a "^t" cuts off the last Chinese character and the first English letter. If I try something like \1^t\2, it says "The Replace With text contains a group number which is out of range". Any variation of that, like ^t\2, \1###\2, etc give the same error. Do you know what I'm doing wrong?

[Edited at 2008-08-05 15:45] ▲ Collapse

Jaroslaw Michalak

Poland
Local time: 06:16
Member (2004)
English to Polish

SITE LOCALIZER

Try the same...

Aug 5, 2008

Hmm... Actually, you should try the same procedure as in the previous case. This allows you to find spaces within the English term and replace with the placeholder ### (step 1), then the remaining spaces are (or should be) only the spaces between Chinese and English - you replace them with tabs (step 2), finally, you restore the English spaces (step 3).

For your interest, you should not write ([!a-z] [a-z]), as this means "group 1 is non-lower ASCII character followed by space followed by lower ASCII character", but ([!a-z]) ([a-z]), which means "group 1 is non-lower ASCII character, then follows space, then group 2 is ASCII character.

I know that regexp (regular expressions) can get confusing, especially in Word, which uses a non-standard flavor of regexp.

If you are really interested, you might look at Perl. It is a programming language which allows doing such conversion with two or three lines of code which can be then reused over and over. Of course, first you have to write a few dozen non-working programs

(I'm not there yet myself...). ▲ Collapse

Login to reply/comment

To report site rules violations or get help, contact a site moderator:

Moderator(s) of this forum
Laureana Pavon	[Call to this topic]

You can also contact site staff by submitting a support request »

Automatic process to add tab/paragraph marker after/before english or asian (chinese) words?

Forum rules

Help and orientation

Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business. More info »

TM-Town
Manage your TMs and Terms ... and boost your translation business Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work. More info »


	X Sign in to your ProZ.com account... Username: Password: Forgot your password? Or create a new account