Legacy TSS versus UTF-T
The old TSS is called legacy TSS. The embedding of Unicode in TSS is called UTF-T, analogously to the names UTF-8, UTF-16, and UTF-32 for the three standard Unicode Encoding Forms.
TSS consists of two types of characters: single byte and multibyte. The single byte characters use hexadecimal values 0 … FF, except 9B. Multi byte TSS characters use a sequence of 4 bytes, the first of which has hexadecimal value 9B.
This table describes how all supported native character sets are mapped into the available TSS space.
| TSS range (hexadecimal) | Meaning | 
|---|---|
| 00 - 7F | ASCII character. | 
| 80 - 8A | Line drawing character. | 
| 8B - 9A | Code feature. | 
| 9B | Lead byte for 4-byte TSS characters | 
| 9C-9E | Reserved for future use. | 
| 9F | Used to represent the Euro Symbol in a Cyrillic context. In ISO8859-5 there is no room available in the normally used range A0 - FF. | 
| A0 - FF | Is ambiguous. It corresponds to a 'high ASCII' character in one of the ISO8859-n character sets. Can be converted (without actual conversion) to the correct ISO8859-n character set, or (often with some offset) to the corresponding Windows Code Page. | 
| 9B 21 pp qq | Japanese (Kanji). Can be converted to Kanji EUC, Shift JIS, or Windows Code Page 932 | 
| 9B 23 21 pp | Single width Japanese. Can be converted to Kanji EUC, Shift JIS, or Windows Code Page 932 | 
| 9B 25 pp qq | Simplified Chinese. Can be converted to GB2312-80 or Windows Code Page 936 | 
| 9B 27 pp qq | Traditional Chinese. Can be converted to Big 5 or Windows Code Page 950 | 
| 9B 31 pp qq | Korean (Wansung). Can be converted to Wansung or Code Page 949. | 
| 9B 32 pp qq | Korean (Johab). Can be converted to Johab or Code Page 1361. | 
| 
              9B 9C 9D nn with: 40 < nn < BF  | 
            
              Is ambiguous. It corresponds to Microsoft extension character nn + 0x40 in one of the Windows Code Pages corresponding to an ISO8859-n character set. Can be converted to the correct Windows Code Page. The value nn + 0x40 is in the high ASCII range 0x80 - 0xFF, so nn is in the range 0x40 - 0xBF. Only 55 positions are really used, and only 3 of them are really ambiguous.  | 
           
The ambiguity that is shown in the table for TSS characters in the ranges A0 – FF and 9B 9C 9D 40 – 9B 9C 9D BF is resolved by interpreting these characters in the context of the character set of the current locale.
The embedding of Unicode in TSS uses the single byte ASCII range and a major part of the remaining TSS space, as shown in the following table. Notice that this embedding does not increase the existing ambiguity described earlier.
| TSS range (hexadecimal) | Meaning | 
|---|---|
| 
              00 - 7F  | 
            
              ASCII character. Corresponding to the first 128 Unicode characters U+0000 - U+007F. Can be converted (without actual conversion) to single byte UTF-8 or to single word UTF-16.  | 
           
| 
              9B pp qq rr with: BC < pp < BF, 80 < qq < FF, 80 < rr < FF  | 
            
              UTF-T corresponding to the first 216 Unicode characters U+0000 - U+FFFF, the so called Basic Multilingual Plane (BMP), except for the first 128 Unicode characters U+0000 - U+007F (corresponding to the ASCII character set, and mapped to single byte TSS). Can be converted algorithmically (bit shuffling) to 2-byte or 3-byte UTF-8 or single word UTF-16.  | 
           
| 
              9B pp qq rr with: C0 < pp < FF, 80 < qq < FF, 80 < rr < FF  | 
            
              UTF-T corresponding to the 220 so called Supplementary Unicode characters U+010000 - U+10FFFF. Can be converted algorithmically (bit shuffling) to 4-byte UTF-8 or double word UTF-16.  | 
           
When converting from TSS to some other character set, the Enterprise Server porting set can interpret both legacy TSS and UTF-T. The other way around, when converting from some other character set to TSS, it depends on the so called TssMode whether legacy TSS of UTF-T is produced. The TssMode is determined by the content of the $BSE/lib/tss_mbstore6.2 file. If the first line of this file consists of exactly the text “UTF-T”, then the mode is called ‘UTF-T mode’ and the conversion produces UTF-T. Otherwise the mode is called ‘legacy mode’ and the conversion produces legacy TSS.
In the following table this and further differences between UTF-T mode and legacy mode are indicated. These differences are so essential that it is not allowed to switch the mode at arbitrary moments. Switching from legacy mode to UTF-T mode is allowed at the price of a complete database conversion. Switching back from UTF-T mode to legacy mode is not allowed.
| Legacy mode | UTF-T mode | 
|---|---|
| Conversion from any character set to TSS produces legacy TSS | Conversion from any character set to TSS produces UTF-T | 
| Conversion from any Unicode encoding (UTF-8 or UTF-16) to TSS uses conversion tables and fails for characters which do not exist in the character set of the current locale. | Conversion from any Unicode encoding (UTF-8 or UTF-16) to TSS does not need conversion tables and will not fail. | 
| Multi Byte Data in the database is stored in the native character set of the database | Multi Byte Data in the database is stored in Unicode (probably UTF-16, possibly UTF-8) for Multi Byte, in a native locale for Single Byte | 
| Each user must use a language and corresponding locale of which the character set corresponds to the character set used in the database | Each user can choose a language and locale, independent of the character set used in the database. | 
| Single Byte Data in the database is sorted using binary sort, e.g. A Z a z À Ý à ý | Data in the database is sorted according to the Unicode Collation Algorithm, e.g. a A à À ý Ý z Z | 
| For single byte character sets, the non-ASCII characters are mapped to single byte TSS (with some exceptions which are mapped to 9B 9C 9D nn) | For single byte character sets, the non-ASCII characters are mapped to 4-byte UTF-T |