Reading Unicode strings from SIL file (no BOM)

All announcements, questions and issues related to the TsiLang Components Suite.
Post Reply
mclosson
Posts: 5
Joined: Mon Aug 04, 2008 1:41 pm

Reading Unicode strings from SIL file (no BOM)

Post by mclosson »

Hello,

I have a program which is manually reading and parsing an SIL file. Now we have added languages to it which contain unicode character sets. I'm having a problem reading the SIL file properly because there is no "Byte Order Marking" at the beginning of the file so I don't know if the strings are encoded and stored UTF8, UTF16, etc., in the SIL. Can you tell me how unicode strings are stored or encoded in an SIL file?

Thanks,

-Matt-
isiticov
Site Admin
Posts: 2385
Joined: Thu Nov 21, 2002 3:17 pm

Post by isiticov »

Hello,

Regular SIL files contain data in ANSI format. If you need to convert it to Unicode you need to convert it based on Charset settings for the language to detect the appropriate code page. You can use AnsiStringToWideStringCP() function from siComp.pas. In the version 6.3 there was introduced UTF-8 support for SIL files. If SIL file is UTF8-encoded then under [Options] section there will be IsUTF8File key with True as value.
Best regards,
Igor Siticov.
Post Reply