SiComponents Home Page SiComponents Forums
Here you will be able to get help and share your experience
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Chinese / Unicode inside SIL file

 
Post new topic   Reply to topic    SiComponents Forums Forum Index -> TsiLang Components Suite
View previous topic :: View next topic  
Author Message
Hagmann



Joined: 31 Jan 2006
Posts: 3

PostPosted: Tue Jan 31, 2006 7:46 am    Post subject: Chinese / Unicode inside SIL file Reply with quote

We try to translate a application to chinese.
I picked out one string for examination, which is in unicode:
EA 81 03 BC D5 8B 77 8D
That means the unicode characters: $81EA, ...

After entering the data in the SIL-Editor the data is stored as
A6 DB 3F 3F B0 5F

The characters have been translated:
EA 81 => A6 DB, 03 BC => 3F, D5 8B => 3F, 77 8D => B0 5F

The first and the last characters seems to be ok on a chinese windows, but
the chars in the middle are now Ansi "?"-characters and of course are displayed as "?" on all windows versions.
What kind of "translation" occurs ?

The SIL-Editor already displays them as "?", only the DictionaryManager
can show the right characters

Characters $BC03 and $8BD5 are only shown correct, if I fill the string
during runtime into widestrings properties of TNT components.

How can I handle this using the SIL file ?
Back to top
View user's profile Send private message
isiticov
Site Admin


Joined: 21 Nov 2002
Posts: 2103

PostPosted: Tue Jan 31, 2006 7:58 am    Post subject: Reply with quote

Does the same apply when using SIB files? I would suggest to use SIB because they are much faster and also don't save/write to the file using text routines, since SIL is actually INI-like file format.
Please try to use SIB files and let us know if this helps.
Back to top
View user's profile Send private message
Hagmann



Joined: 31 Jan 2006
Posts: 3

PostPosted: Tue Jan 31, 2006 3:23 pm    Post subject: Reply with quote

The problem is to put the chacters into the SIB-file.
I can't input it in the SILEditor independent of the
file type.
I also don't know the format of the SIB file
(for manipulate it like the SIL with an hex editor)
Back to top
View user's profile Send private message
isiticov
Site Admin


Joined: 21 Nov 2002
Posts: 2103

PostPosted: Tue Jan 31, 2006 5:40 pm    Post subject: Reply with quote

Have you adjusted settings for Font and Charset for Chinese language under Default Fonts (see in SIL Editor menu Tools->Default Fonts) and under Fonts and Charsets sections?
Once you adjust these you will be able either to type directly (if your OS supports this) or copy-paste from another applications like MS Word and others.
Please let me know if this helps.
Back to top
View user's profile Send private message
Hagmann



Joined: 31 Jan 2006
Posts: 3

PostPosted: Wed Feb 01, 2006 11:16 am    Post subject: Reply with quote

Yes we have done all this. We followed exactly the instructions on your web side:
http://www.tsilang.com/press/en/adding_support_for_far_east_or_other_unicode_languages.html
We set the Font “Tahoma" and the Charset “CHINESEBIG5_CHARSET”.
We tried the same also under an chinese Windows.
Still we are not able to input (by copy paste or by 'Microsoft Pinyin IME’) all characters.
An example is the character feng (风).
Back to top
View user's profile Send private message
isiticov
Site Admin


Joined: 21 Nov 2002
Posts: 2103

PostPosted: Wed Feb 01, 2006 11:33 am    Post subject: Reply with quote

Are you able to type all Chinese chars in MS Word? If yes, does copy/paste from MS Word to SIL Editor works? If nothing above works then the way could be as following:
1. Export all translations to Dictionary Manager using Add All.
2. Translate them in Dictionary Manager since it is Unicode and will be able to handle this correctly.
3. Auto-translate in SIL Editor using Auto-Translate function.

Please let me know about the results.

P.S. What OS do you use?
Back to top
View user's profile Send private message
MMSomeware



Joined: 31 Jan 2006
Posts: 4

PostPosted: Thu Feb 02, 2006 5:43 pm    Post subject: Reply with quote

Hello,

I'm working with Mr.Hagmann on the same problem, and I tried the steps,
but with the same result. Using auto-translate the same occurs. First and last char ok, the both in the middle are "?".
I erased the given translation, it's overwritten with the wrong chars again.
I've tried the font Arial Unicode MS too, same result.
Editing in Word works fine, copy&paste only works with 1. & 4. character.
Back to top
View user's profile Send private message
isiticov
Site Admin


Joined: 21 Nov 2002
Posts: 2103

PostPosted: Fri Feb 03, 2006 2:19 am    Post subject: Reply with quote

Using GB1232_CHARSET and Tahoma for Simplified Chinese (I guess it is better since Chinese translator set this) I was able to enter almost all Chinese characters including feng (风). You can check http://www.sicomponents.com/soft/resourcebuilder.sib it includes Chinese language as well. The only problem is with 밃 character (03 BC).
Back to top
View user's profile Send private message
MMSomeware



Joined: 31 Jan 2006
Posts: 4

PostPosted: Fri Feb 03, 2006 8:49 am    Post subject: Reply with quote

Do you know the reason for the problems with 밃 character (03 BC).
That could lead to the answer of what kind of characters are not usable ?

We want to know which chars are available for translation, and if a
reasonable translation is possible at all.
Back to top
View user's profile Send private message
isiticov
Site Admin


Joined: 21 Nov 2002
Posts: 2103

PostPosted: Fri Feb 03, 2006 11:01 am    Post subject: Reply with quote

Quote:
Do you know the reason for the problems with 밃 character (03 BC).
That could lead to the answer of what kind of characters are not usable ?

We're researching this now...
Back to top
View user's profile Send private message
MMSomeware



Joined: 31 Jan 2006
Posts: 4

PostPosted: Wed Feb 08, 2006 9:11 am    Post subject: Reply with quote

Are there already any researching results or theories ?

Using the other Charset there are 4 chinese charcters visible, but all are wrong.
Back to top
View user's profile Send private message
isiticov
Site Admin


Joined: 21 Nov 2002
Posts: 2103

PostPosted: Wed Feb 08, 2006 11:15 am    Post subject: Reply with quote

The temporary results are that Windows' comnversion API Unicode<->ANSI converts these character in wrong way. We're trying to find a better handling for this.
Back to top
View user's profile Send private message
MMSomeware



Joined: 31 Jan 2006
Posts: 4

PostPosted: Tue Feb 28, 2006 10:56 am    Post subject: Reply with quote

Hello!
It seems to be a bigger problem as expected.
Are there already results for a smarter conversion?
Can you explain which kind of characters are going to be lost?
In the middle of march, I urgently need a working solution or I'll have to switch to another translation method at all.
Back to top
View user's profile Send private message
isiticov
Site Admin


Joined: 21 Nov 2002
Posts: 2103

PostPosted: Tue Feb 28, 2006 2:26 pm    Post subject: Reply with quote

Hello,

Actually, we still unable to find a solution Sad, except using OS with Chinese default locale and set DefaultCharset to be used for Chinese language everywhere.
Under other cases the only (at least all others used in our Chinese translations converted fine) problematic character is 밃 (03 BC).
Windows is unable to convert this character to ANSI multi-byte character.
I afraid Delphi application may fail to display such character as well. Sad
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    SiComponents Forums Forum Index -> TsiLang Components Suite All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by p h p B B  © 2001, 2005 p h p B B  Group