New Version Need with Unicode storage in TsiLang

All announcements, questions and issues related to the TsiLang Components Suite.
DInfo
Posts: 24
Joined: Wed Mar 02, 2005 1:38 pm

Post by DInfo »

Igor,

Thanks so much for your comments.

After we changed to pure unicode storage - making no changes to the ElPack, TNT, or std Delphi controls- the strings display correctly with no garbarge chars. So, (please forgive me for a bug report) this is an issue with TsiLang or Borland as described below.

We did have to make a change to TypInfo to eliminate a rather senseless conversion from Unicode to Ansi to Unicode. Was this the source of our original problem? Your code calls this function in TypInfo, so maybe this is the root of the problem. Regardless of which code caused the issue, the issue of a faulty Ansi to Uni conversion was at the root. Following is our code change. In essence, Borland was goofy here. They are implicitly converting to an AnsiString with the SetStrProp(Instance, FindPropInfo(Instance, PropName), Value); call. The code supports no reason that this would have been done, so I'm a bit baffled. Our fix follows:


procedure SetWideStrProp(Instance: TObject; const PropName: string; const Value: WideString);
var
PropInfo: PPropInfo;
begin
//Original Call
// This call implicitly converts Value into an AnsiString that loses Widestring chars
//SetStrProp(Instance, FindPropInfo(Instance, PropName), Value);

//Instead, directly call Widestring routine to preserve widestring integrity
PropInfo := FindPropInfo(Instance, PropName);
if (PropInfo^.PropType^.Kind = tkWString) then
SetWideStrProp(Instance, PropInfo, Value)
else
SetStrProp(Instance, PropInfo, Value);
end;

Is is possible that this Borland code creates a TsiLang Ansi to Wide to Ansi to Wide conversion that creates code page issues for TsiLang?

Would you be interested in incorporating our unicode enabled storage into TsiLang?

I help this helps.

Best,

David
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

Hi David,

Thank you very much for your information. Could you please let me know the following: did you use SetWideStrProp() internally in your code and this is why there were problems with Unicode? Becuase TsiLang uses only SetWideStrProp(Instance: TObject; PropInfo: PPropInfo; const Value: WideString); implementation.
Also what version of Delphi do you use? Because I wasn't able to find your sample code in Delphi 6+ sources.
Yes, I would like take a look at your Unicode storage but I can't promise to integrate it into TsiLang, because new version is almost ready and finalized.
Best regards,
Igor Siticov.
DInfo
Posts: 24
Joined: Wed Mar 02, 2005 1:38 pm

Post by DInfo »

Igor,

We are calling SetWideStrProp just like TsiLang calls it. In fact, this is how we found the Delphi 6 bug mentioned. We are using Delphi 6. The function that I pasted is the function after we fixed it. Look for for SetWideStrProp in TypInfo.pas. Trace it and you will be able to see the implicit widestring to ansistring conversion when it calls SetStrProp. SetStrProp *only* takes an AnsiString so the Delphi complier implictly converts a widestring to an ansistring. From SetStrProp Delphi tests the tkType and sends tkWidestring on to a function that handles widestrings. However, widestring info has been truncated as the string comes into SetStrProp, so sending it to a widestring function does not restore truncated characters.

Since TsiLang is calling SetWideStrProp, this bug affects you too.

I am on a super deadline now, but I would be glad to pass on our Unicode handling code for your next version.

Best,

David
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

David,

Thank you for details, but I'm now absolutely confused. What version of TsiLang Components Suite do you use?
Because in 6.0.2 there is explicit call to

Code: Select all

procedure SetWideStrProp(Instance: TObject; PropInfo: PPropInfo;
  const Value: WideString); overload;
which works correctly with WideStrings. TsiLang uses the

Code: Select all

procedure TsiCustomLang.siSetStrProp(const AObject: TObject; const PInfo: PPropInfo; const PropValue: string);
function to change all strings properties and this function uses

Code: Select all

SetWideStrProp(AObject, PInfo, AnsiStringToWideStringCP(PropValue, CurrentCharset));
when passed property is WideString.
Do you have the same on your side?
Best regards,
Igor Siticov.
DInfo
Posts: 24
Joined: Wed Mar 02, 2005 1:38 pm

Post by DInfo »

Igor,

We are using TsiLang 6.0.2 as well. You example is exactly correct with this exception:

When TsiCustomLang.siSetStrProp makes the following call

SetWideStrProp(AObject, PInfo, AnsiStringToWideStringCP(PropValue, CurrentCharset));

then you are converting your AnsiString from storage to a Widestring using the current CP.

So far, so good (as long as the CP is correct).

Here's the error:

In TypInfo.SetWideStrProp, the Delphi code passing the widestring Value to TypInfo.SetStrProp as follows:

SetStrProp(Instance, FindPropInfo(Instance, PropName), Value);

However, the declaration of SetStrProp is as follows (from memory)

procedure SetStrProp(Instance: TObject; PropInfo: PPropInfo;
const Value: String);

in which case the Value is an AnsiString.

So, Delphi does an implicit conversion that converts the Value param from a Widestring to an AnsiString.

Then SetStrProp tests the tkType and if the type return from the control is tkWidestring, then it passes Value (now an Ansistring) to another function that take a Widestring as a param. In this process the AnsiString is once again converted to a widestring.

So starting with TsiLang: the Ansistring from storage is converted to a widestring, back to an Ansistring, and then back to a widestring again.

Since your storage mechanism depends on Ansistrings as the foundation, most strings go through these conversions OK. However, even some Ansistrings will have trouble with these conversions if the character being mapped by Windows is not an exact CP mapping. For example, the quote character in Times Roman can be in the ANSI range 1-128 or it can be a unicode value of 4000 or so. During the mapping process, the character can be mapped to its equivalent (depending on the WideStringToAnsiString options). Then when it is "re-mapped" back to Unicode, it doesn't get retranslated into the proper character.

However, my issue with the whole Ansi to Unicode mapping goes much deeper. In general, I need to fully enable an application for on the fly switching of languages. This means that in some cases, that on form or one Windows common dialog needs to display characters from more than one charset. This is a requirement forced on us by implementing on-the-fly language switching on a Windows OS that is designed to fit only a few localizations. If I stick to AnsiStrings, having characters from more than one charset is problematic at best. At best, multiple charsets requirements have to be tracked and charsets have to be switched correctly for the translations over and above what TsiLang makes possible. Anyway, especially when we are not in charge of the thread locale (like in a ActiveDocument), this is problematic.

If you have hung on thus far ...

With all storage as unicode, our problems with character re-mapping and multiple charsets completely go away. But evidently Delphi designers goofed a bit in the SetWideStrProp call because as written, it requires our unicode strings to be converted to Ansistrings and back to unicode. This conversion complete obliterates Russian and Asian text that is pushed through a Default Charset (Windows common dialogs, etc.).

This is a very confusing problem. However, I am happy to report that after about 10 days of hacking, we now have an application that do the following:

if Language support is installed for the language group (CONTROL PANEL, REGIONAL SETTINGS), then we can use any language on any version of Windows. For example, we can do Russian on Chinese OS, Japanese on Korean, Chinese on English, etc. including the Common Dialogs and it works!

So, after I sleep a month, I will pass along code that might be helpful. In the meantime, I hope that I've made a better case for going to straight unicode for your storage. Doing so will give you many more options in languages and how they are supported.

Best,

David
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

David,
Our call
SetWideStrProp(AObject, PInfo, AnsiStringToWideStringCP(PropValue, CurrentCharset));
goes directly to
procedure SetWideStrProp(Instance: TObject; PropInfo: PPropInfo;
const Value: WideString); overload;

which doesn't call SetStrProp() for Unicode strings. And as result there is no conversion (Unicode loss) from wide to ansi.
Best regards,
Igor Siticov.
DInfo
Posts: 24
Joined: Wed Mar 02, 2005 1:38 pm

Post by DInfo »

Igor,

My bad ... of course you are correct on the TsiLang call.

It is interesting why the TsiLang conversions truncated the Unicode strings. How could this happen? I don't know. I do know that staying in the Unicode world has worked miracles for our language compatibility.

Here is a huge advantage for using a Unicode storage with no code page translations:

I am currently debugging a 13 language application using just US English XP. This is massive timesaver. Previously, I had to spent a lot of time in the localized OSes to test: Chinese, Japanese, etc. We will of course do a base "sanity check" installation on those OSes. However, 99% of our testing and debuging can be done in our native OS.

Woo hoo.

Thanks,

David
DInfo
Posts: 24
Joined: Wed Mar 02, 2005 1:38 pm

Post by DInfo »

Igor,

One more follow up to your last message:

When Tsilang converts from the Unicode translation that is created in the native language (stored in the Dictionary) and then saved to AnsiString storage and then back to Unicode for display, it *is* subject to the same character re-mapping issues as I discussed above. This is a well-known issue with the Windows multi-byte conversion routines.

Food for thought ...

David
DInfo
Posts: 24
Joined: Wed Mar 02, 2005 1:38 pm

Post by DInfo »

Igor,

Best wishes on your new release. Thanks for engaging me on this whole unicode issue.

David
isiticov
Site Admin
Posts: 2383
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

Hi David,

Thank you! Hope, the next update will include Unicode enabled storage ;-)
Best regards,
Igor Siticov.
Post Reply