Page 1 of 1

Reading Unicode strings from SIL file (no BOM)

Posted: Wed Sep 10, 2008 7:22 pm
by mclosson
Hello,

I have a program which is manually reading and parsing an SIL file. Now we have added languages to it which contain unicode character sets. I'm having a problem reading the SIL file properly because there is no "Byte Order Marking" at the beginning of the file so I don't know if the strings are encoded and stored UTF8, UTF16, etc., in the SIL. Can you tell me how unicode strings are stored or encoded in an SIL file?

Thanks,

-Matt-

Posted: Thu Sep 11, 2008 6:43 am
by isiticov
Hello,

Regular SIL files contain data in ANSI format. If you need to convert it to Unicode you need to convert it based on Charset settings for the language to detect the appropriate code page. You can use AnsiStringToWideStringCP() function from siComp.pas. In the version 6.3 there was introduced UTF-8 support for SIL files. If SIL file is UTF8-encoded then under [Options] section there will be IsUTF8File key with True as value.