New Version Need with Unicode storage in TsiLang

All announcements, questions and issues related to the TsiLang Components Suite.
DInfo
Posts: 24
Joined: Wed Mar 02, 2005 1:38 pm

New Version Need with Unicode storage in TsiLang

Post by DInfo »

Hi,

We are using Tnt and need to store and retrieve Unicode strings from the TsiLang component with *no* code page translation. Code page translation and locales are causing problems in on the fly switching of languages, especially in ActiveDocument COM servers.

If your new version is going to store Unicode in TsiLang instead of AnsiStrings, I would *massively* appreciate an early version. Otherwise, I'm going to rewrite the components in the next few days (deadline of Thursday).

Thanks Thanks Thanks

David
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

Hi,

Next version (6.0.3), which is almost completed won't change storage method and will still use AnsiStrings in storage. Sorry.
Best regards,
Igor Siticov.
DInfo
Posts: 24
Joined: Wed Mar 02, 2005 1:38 pm

Post by DInfo »

Thanks for the update.

I am curious about the reasons for using AnsiString storage. What are the advantages of using AnsiStrings over Unicode? Am I missing something here?

Thanks again,

David
DInfo
Posts: 24
Joined: Wed Mar 02, 2005 1:38 pm

Post by DInfo »

It might also be helpful to ask the question this way:

Can I retrieve the AnsiStrings from storage without a code page conversion?

Thanks,

David
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

DInfo wrote: I am curious about the reasons for using AnsiString storage. What are the advantages of using AnsiStrings over Unicode? Am I missing something here?
Most UI controls and properties under Delphi are still Ansi and keeping translations in Unicode will decrease the performance for unnecessary conversion Unicode -> Ansi for proper translation.
Best regards,
Igor Siticov.
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

DInfo wrote:It might also be helpful to ask the question this way:

Can I retrieve the AnsiStrings from storage without a code page conversion?
Of course. Depending on which translation you need you can use GetTextFrom() method passing there the respective translations list. siLang1.Captions for captions, siLang1.Hints for hints and so on. Also you can use GetStringValue() method for retrieve string for specified language.
Does this help?
Best regards,
Igor Siticov.
DInfo
Posts: 24
Joined: Wed Mar 02, 2005 1:38 pm

Post by DInfo »

So, how do I convert your MBCS Ansi string to Unicode string without a code page conversion?

BTW: I am using TNT and ElPack and everything is unicode. We have to ship 20 to 30 languages in the same exe.

Thanks

David
DInfo
Posts: 24
Joined: Wed Mar 02, 2005 1:38 pm

Post by DInfo »

Maybe a better question: Does it make sense to convert the Ansi to Widestring by using GetTextFrom and then do a ..toWideString conversion using the code page that applies to a particular translation?

Thanks,

David
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

Could you please describe what do you try to achieve? Because if you use Ansi-to-Unicode conversion without code page usage you will get incorrectly converted Unicode strings in most cases (when default locale of your PC differs from target language locale).
May be the easiest solution for you would be the following:
1. You let TsiLang to translate all your UI controls as it does. Even Unicode.
2. You exclude "problematic" particular components, as ActiveDocument OLE server, from automatic translations.
3. You translate them in code in OnChangeLanguage event.
4. If it need to use Unicode for translation you can use GetTextOrDefaultW() methods of TsiLang to receive Unicode strings.

Please let me know if this helps.
Best regards,
Igor Siticov.
DInfo
Posts: 24
Joined: Wed Mar 02, 2005 1:38 pm

Post by DInfo »

Thanks for the comments.

First, let me say that we *really* like Tsilang and it has saved us lots of work up to this point. We hope that we can continue to use it. :)

Unfortunately, TsiLang unicode implementation only works in circumstances in which the translated language can be represented by the system code page. Your call for GetTextOrDefaultW is

function TsiCustomLang.GetTextOrDefaultW(const TextID: Tstring): WideString;
begin
Result := AnsiStringToWideStringCP(GetTextOrDefault(TextID), CurrentCharset);
end;

and unless I'm missing something, it uses the code page to convert from the AnsiString storage to Unicode. This means that at any given point in time, the translation is dependent on the code page setting for the process/thread. This approach leads to lots of errors that are hard for us to reproduce and that depend on the user's configuration, software, etc.

Even on a US system with both Far East and Complex regional support installed, this fundamentally doesn't work 100% of the time as illustrated by attempting to display a language selection in which Chinese, Japanese, Russian, English, etc. are displayed in the same control (as in your language selection widget).

We are also seeing the issue in situations where the thread locale can not be safely changed as in ActiveDocument situations (the ActiveDocument is loaded by an unknown thread with its own locale, etc.)

So, ...

1) TsiLang does not actually translate into Unicode correctly unless the code page/charset is correct.

2) It is impossible to exclude certain parts of the application from translation because they are central to our app.

3) Translating from code is very inflexible and would be required for all captions, hints, text, etc. and really negates the need for TsiLang.

4) As noted above the GetTextofDefaultW approach uses a code page to recreate Unicode is not always reliable.

We are building (almost done) a solution that draws the translations from the dictionary and stores unicode text and extended translation info in a binary resource. At runtime, the unicode text and the extended data are pulled out and pushed into the controls, strings, dialogs as necessary.

After looking at this, I would urge you to at least make this approach optional. We are not losing speed because no code page translations are done unless we are going from unicode to ansistring. I think that you are almost always going through at least one code page translation and maybe two in the case of unicode.

Secondly, most of your support messages seem to stem from the misunderstanding of this whole code page support issue and how differing OS localizations support each code page. Going to unicode seems like it would solved at least half of those problems before people asked.

Anyway, we really like the tools and appreciate your work and support of them. Let me know if we are missing something here and *please* let us know about your solutions in the future.

Best,

David


[/quote]
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

Hi!
Thank you for your so kind feedback.
DInfo wrote: and unless I'm missing something, it uses the code page to convert from the AnsiString storage to Unicode. This means that at any given point in time, the translation is dependent on the code page setting for the process/thread. This approach leads to lots of errors that are hard for us to reproduce and that depend on the user's configuration, software, etc.
I'm sorry but, actually, you didn't understand how this works.
This code uses charset to detect needed code page to perform conversion from ANSI to Unicode. BUT, CurrentCharset is not related to default locale (code page) you set on your OS. It is the value you set in Charsets section for the form for each language. So, in case you set the appropriate charset for each language under Charsets section, TsiLang will automatically detect code page for conversion and will properly convert ANSI to Unicode. This was done exactly to suppress incorrect behavior of ANSI to Unicode conversion based on default locale used by Delphi itself. Did you set the Charsets in your project?
Best regards,
Igor Siticov.
DInfo
Posts: 24
Joined: Wed Mar 02, 2005 1:38 pm

Post by DInfo »

Thanks for continuing the dialog.

I hadn't dug deep enough to understand that you are using the CurrentCharSet instead of defaulting to the locale. However, we have been using the correctly set charset for the form for each language.

This approach does not work for multiple languages with different charsets. For example, the language selector can not include both chinese and japanese languages because the charset is different. The result is that one of the languages is displayed as garbage chars.

More importantly, how are you verifying that Windows can actually make the code page conversions? Our Active Document forms will not work correctly despite the fact that the charset for each form and each language is set correctly.

Please tell us how to make this work/troubleshoot this issue today and we can go back to TsiLang.

Thanks,

David
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

Once again, I'm very sorry, but I can't see where your problem is. Could you please narrow the field? :)
If you need to display different language strings at same time in "language selector" then this control must be Unicode enabled. And since you don't need to translate it, you exclude it completely from translation and enter Unicode items by hand.
Windows is able to convert from Ansi to Unicode always if support for this language is installed into system. Otherwise it will return Unicode string converted using default locale. But this will be the problem in any case. Even if you use only Unicode strings. They will be displayed incorrectly if OS doesn't support such language, except your application will have own rendering engine like MS Word. :)
Could you please describe a little what is your "Active Forms" and why (how) they don't work correctly?
Thanks.
Best regards,
Igor Siticov.
DInfo
Posts: 24
Joined: Wed Mar 02, 2005 1:38 pm

Post by DInfo »

Igor,

We are converting the language selection using Tnt Uncode controls and hand entering the languages - this is not a problem on the one control.

The question for us is, how much of our other language "garbage char" problems are related to code page/charset issues?

An ActiveDocument is a MS defined COM in-process server that can be loaded by an ActiveDocument container (e.g. IE, Word, Windows Explorer). The container allows the ActiveDocument to own a portion of the application space/interface to display/edit the document type is managed by the ActiveDocument. For example, when you browse to a web page and load a PDF document, you are using an Acrobat ActiveDocument executable.

Evidently, ActiveDocuments are probably the most complex COM interface that is currently used in Windows. I know that it was a bear to get it running correctly. Something in the process is preventing the correct translation of Ansi to Unicode if the charset changes. For example, any language that uses the Default Charset is fine; any changes for Chinese, Russian, etc. do not work correctly.

Our previous version used AnsiStrings going from TsiLang to ElPack inside a COM object. What I don't yet know if it was the ElPack mechanism or TsiLang that was causing the charset not to work. But I know from our QA testers that the combination is causing way to much variation access OSes and Localizations.

So, the decision was made to go straight Unicode and eliminate the potential for errors. I would encourage you to do make Unicode an option in TsiLang.

Our work over the last several days has been to Export a SIL file from TsiLang and then process it using the Dictionary as a COM server to get the translations. The result is saved in a binary unicode file and stored as a binary resource in the EXE. At runtime, the controls/properties that have been translated are replaced on demand and the strings and dialogs are available by calling a function similar to GetText... All results deliver straight unicode. We are checking the OS to be sure that the appropriate language support has been installed prior to allowing a user to select the language. The only work that remains is working with Windows Common dialogs to be get them to use the same language when possible. In certain circumstances (e.g. printer properties) it may not be possible to use a language that is not supported on the native system locale.

I hope this helps.

David
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

Hi David,

If ANY component has properties defined as WideString (Unicode), TsiLang will convert Ansi string from internal storage to Unicode when changing language. And it has never been reported before that this doesn't work properly. I guess in case you have somewhere "garbage" chars in ActiveForms after changing language, then somewhere in the code is doing conversion from Unicode to Ansi and back so the string gets corrupted.
Best regards,
Igor Siticov.
Post Reply