Legacy TSS versus UTF-T

The old TSS is called legacy TSS. The embedding of Unicode in TSS is called UTF-T, analogously to the names UTF-8, UTF-16, and UTF-32 for the three standard Unicode Encoding Forms.

TSS consists of two types of characters: single byte and multibyte. The single byte characters use hexadecimal values 0 … FF, except 9B. Multi byte TSS characters use a sequence of 4 bytes, the first of which has hexadecimal value 9B.

This table describes how all supported native character sets are mapped into the available TSS space.


TSS range (hexadecimal)	Meaning
00 - 7F	ASCII character.
80 - 8A	Line drawing character.
8B - 9A	Code feature.
9B	Lead byte for 4-byte TSS characters
9C-9E	Reserved for future use.
9F	Used to represent the Euro Symbol in a Cyrillic context. In ISO8859-5 there is no room available in the normally used range A0 - FF.
A0 - FF	Is ambiguous. It corresponds to a 'high ASCII' character in one of the ISO8859-n character sets. Can be converted (without actual conversion) to the correct ISO8859-n character set, or (often with some offset) to the corresponding Windows Code Page.
9B 21 pp qq	Japanese (Kanji). Can be converted to Kanji EUC, Shift JIS, or Windows Code Page 932
9B 23 21 pp	Single width Japanese. Can be converted to Kanji EUC, Shift JIS, or Windows Code Page 932
9B 25 pp qq	Simplified Chinese. Can be converted to GB2312-80 or Windows Code Page 936
9B 27 pp qq	Traditional Chinese. Can be converted to Big 5 or Windows Code Page 950
9B 31 pp qq	Korean (Wansung). Can be converted to Wansung or Code Page 949.
9B 32 pp qq	Korean (Johab). Can be converted to Johab or Code Page 1361.
9B 9C 9D nn with: 40 < nn < BF	Is ambiguous. It corresponds to Microsoft extension character nn + 0x40 in one of the Windows Code Pages corresponding to an ISO8859-n character set. Can be converted to the correct Windows Code Page. The value nn + 0x40 is in the high ASCII range 0x80 - 0xFF, so nn is in the range 0x40 - 0xBF. Only 55 positions are really used, and only 3 of them are really ambiguous.

The ambiguity that is shown in the table for TSS characters in the ranges A0 – FF and 9B 9C 9D 40 – 9B 9C 9D BF is resolved by interpreting these characters in the context of the character set of the current locale.

The embedding of Unicode in TSS uses the single byte ASCII range and a major part of the remaining TSS space, as shown in the following table. Notice that this embedding does not increase the existing ambiguity described earlier.


TSS range (hexadecimal)	Meaning
00 - 7F	ASCII character. Corresponding to the first 128 Unicode characters U+0000 - U+007F. Can be converted (without actual conversion) to single byte UTF-8 or to single word UTF-16.
9B pp qq rr with: BC < pp < BF, 80 < qq < FF, 80 < rr < FF	UTF-T corresponding to the first 216 Unicode characters U+0000 - U+FFFF, the so called Basic Multilingual Plane (BMP), except for the first 128 Unicode characters U+0000 - U+007F (corresponding to the ASCII character set, and mapped to single byte TSS). Can be converted algorithmically (bit shuffling) to 2-byte or 3-byte UTF-8 or single word UTF-16.
9B pp qq rr with: C0 < pp < FF, 80 < qq < FF, 80 < rr < FF	UTF-T corresponding to the 220 so called Supplementary Unicode characters U+010000 - U+10FFFF. Can be converted algorithmically (bit shuffling) to 4-byte UTF-8 or double word UTF-16.

When converting from TSS to some other character set, the Enterprise Server porting set can interpret both legacy TSS and UTF-T. The other way around, when converting from some other character set to TSS, it depends on the so called TssMode whether legacy TSS of UTF-T is produced. The TssMode is determined by the content of the $BSE/lib/tss_mbstore6.2 file. If the first line of this file consists of exactly the text “UTF-T”, then the mode is called ‘UTF-T mode’ and the conversion produces UTF-T. Otherwise the mode is called ‘legacy mode’ and the conversion produces legacy TSS.

In the following table this and further differences between UTF-T mode and legacy mode are indicated. These differences are so essential that it is not allowed to switch the mode at arbitrary moments. Switching from legacy mode to UTF-T mode is allowed at the price of a complete database conversion. Switching back from UTF-T mode to legacy mode is not allowed.


Legacy mode	UTF-T mode
Conversion from any character set to TSS produces legacy TSS	Conversion from any character set to TSS produces UTF-T
Conversion from any Unicode encoding (UTF-8 or UTF-16) to TSS uses conversion tables and fails for characters which do not exist in the character set of the current locale.	Conversion from any Unicode encoding (UTF-8 or UTF-16) to TSS does not need conversion tables and will not fail.
Multi Byte Data in the database is stored in the native character set of the database	Multi Byte Data in the database is stored in Unicode (probably UTF-16, possibly UTF-8) for Multi Byte, in a native locale for Single Byte
Each user must use a language and corresponding locale of which the character set corresponds to the character set used in the database	Each user can choose a language and locale, independent of the character set used in the database.
Single Byte Data in the database is sorted using binary sort, e.g. A Z a z À Ý à ý	Data in the database is sorted according to the Unicode Collation Algorithm, e.g. a A à À ý Ý z Z
For single byte character sets, the non-ASCII characters are mapped to single byte TSS (with some exceptions which are mapped to 9B 9C 9D nn)	For single byte character sets, the non-ASCII characters are mapped to 4-byte UTF-T