index  prev  next

Weird Encodings

--------------------------------------------------------------------------------------

It can be difficult to properly test multi-byte support in Samba if you don't speak a multi-byte language. To overcome this problem I use "weird" encodings of English that display some of the properties of multi-byte character sets while remaining readable.

My current weird character set encodes Q as ^Q^. That provides for a 3 byte character without disturbing constant strings too much (as Q is rare in constant strings).

Other useful test character sets include CAP and HEX. These character sets are commonly used in Japan as filesystem encodings of Kanji, especially when the server is also being used for the Columbia Appletalk Package. The CAP and HEX character sets encode characters as hexadecimal strings, making them very good for testing multi-byte support, although quite hard to read.

--------------------------------------------------------------------------------------

CIFS2001 Seattle
tridge@valinux.com