Chinese / Unicode inside SIL file

All announcements, questions and issues related to the TsiLang Components Suite.
Post Reply
Hagmann
Posts: 3
Joined: Tue Jan 31, 2006 7:09 am

Chinese / Unicode inside SIL file

Post by Hagmann »

We try to translate a application to chinese.
I picked out one string for examination, which is in unicode:
EA 81 03 BC D5 8B 77 8D
That means the unicode characters: $81EA, ...

After entering the data in the SIL-Editor the data is stored as
A6 DB 3F 3F B0 5F

The characters have been translated:
EA 81 => A6 DB, 03 BC => 3F, D5 8B => 3F, 77 8D => B0 5F

The first and the last characters seems to be ok on a chinese windows, but
the chars in the middle are now Ansi "?"-characters and of course are displayed as "?" on all windows versions.
What kind of "translation" occurs ?

The SIL-Editor already displays them as "?", only the DictionaryManager
can show the right characters

Characters $BC03 and $8BD5 are only shown correct, if I fill the string
during runtime into widestrings properties of TNT components.

How can I handle this using the SIL file ?
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

Does the same apply when using SIB files? I would suggest to use SIB because they are much faster and also don't save/write to the file using text routines, since SIL is actually INI-like file format.
Please try to use SIB files and let us know if this helps.
Best regards,
Igor Siticov.
Hagmann
Posts: 3
Joined: Tue Jan 31, 2006 7:09 am

Post by Hagmann »

The problem is to put the chacters into the SIB-file.
I can't input it in the SILEditor independent of the
file type.
I also don't know the format of the SIB file
(for manipulate it like the SIL with an hex editor)
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

Have you adjusted settings for Font and Charset for Chinese language under Default Fonts (see in SIL Editor menu Tools->Default Fonts) and under Fonts and Charsets sections?
Once you adjust these you will be able either to type directly (if your OS supports this) or copy-paste from another applications like MS Word and others.
Please let me know if this helps.
Best regards,
Igor Siticov.
Hagmann
Posts: 3
Joined: Tue Jan 31, 2006 7:09 am

Post by Hagmann »

Yes we have done all this. We followed exactly the instructions on your web side:
http://www.tsilang.com/press/en/adding_ ... uages.html
We set the Font “Tahoma" and the Charset “CHINESEBIG5_CHARSET”.
We tried the same also under an chinese Windows.
Still we are not able to input (by copy paste or by 'Microsoft Pinyin IME’) all characters.
An example is the character feng (风).
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

Are you able to type all Chinese chars in MS Word? If yes, does copy/paste from MS Word to SIL Editor works? If nothing above works then the way could be as following:
1. Export all translations to Dictionary Manager using Add All.
2. Translate them in Dictionary Manager since it is Unicode and will be able to handle this correctly.
3. Auto-translate in SIL Editor using Auto-Translate function.

Please let me know about the results.

P.S. What OS do you use?
Best regards,
Igor Siticov.
MMSomeware
Posts: 4
Joined: Tue Jan 31, 2006 11:31 am

Post by MMSomeware »

Hello,

I'm working with Mr.Hagmann on the same problem, and I tried the steps,
but with the same result. Using auto-translate the same occurs. First and last char ok, the both in the middle are "?".
I erased the given translation, it's overwritten with the wrong chars again.
I've tried the font Arial Unicode MS too, same result.
Editing in Word works fine, copy&paste only works with 1. & 4. character.
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

Using GB1232_CHARSET and Tahoma for Simplified Chinese (I guess it is better since Chinese translator set this) I was able to enter almost all Chinese characters including feng (风). You can check http://www.sicomponents.com/soft/resourcebuilder.sib it includes Chinese language as well. The only problem is with 밃 character (03 BC).
Best regards,
Igor Siticov.
MMSomeware
Posts: 4
Joined: Tue Jan 31, 2006 11:31 am

Post by MMSomeware »

Do you know the reason for the problems with 밃 character (03 BC).
That could lead to the answer of what kind of characters are not usable ?

We want to know which chars are available for translation, and if a
reasonable translation is possible at all.
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

Do you know the reason for the problems with 밃 character (03 BC).
That could lead to the answer of what kind of characters are not usable ?
We're researching this now...
Best regards,
Igor Siticov.
MMSomeware
Posts: 4
Joined: Tue Jan 31, 2006 11:31 am

Post by MMSomeware »

Are there already any researching results or theories ?

Using the other Charset there are 4 chinese charcters visible, but all are wrong.
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

The temporary results are that Windows' comnversion API Unicode<->ANSI converts these character in wrong way. We're trying to find a better handling for this.
Best regards,
Igor Siticov.
MMSomeware
Posts: 4
Joined: Tue Jan 31, 2006 11:31 am

Post by MMSomeware »

Hello!
It seems to be a bigger problem as expected.
Are there already results for a smarter conversion?
Can you explain which kind of characters are going to be lost?
In the middle of march, I urgently need a working solution or I'll have to switch to another translation method at all.
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

Hello,

Actually, we still unable to find a solution :(, except using OS with Chinese default locale and set DefaultCharset to be used for Chinese language everywhere.
Under other cases the only (at least all others used in our Chinese translations converted fine) problematic character is 밃 (03 BC).
Windows is unable to convert this character to ANSI multi-byte character.
I afraid Delphi application may fail to display such character as well. :(
Best regards,
Igor Siticov.
Post Reply