Character setsA character set is a set of alphabetic or other characters used to construct the words and other elementary units of (a) native language(s). During the installation of the LN application you must choose a character set. So only one character set applies for the whole LN environment. Therefore only those languages can be stored which are supported by the character set that was chosen. You can choose the following character set types:
Single byte character sets Single byte character sets only need one byte to store the character information. As a consequence max 256 characters are available. The ISO 8859 standard defines several characters sets, also called locales, to cover the characters of mainly the European languages. Examples of single byte character sets are:
The lower range, character 000 -127, is the same for all ISO 8859 character sets, the upper range character 128 – 255, is specific per locale. The alphabet is encoded in the lower range and therefore the English language is supported with each ISO 8859 locale (English does not need any additional characters). The sorting is binary based: The sorting is based on the order the characters are defined in the encoding. All upper case alphabetical characters, for instance, are sorted before the lower case alphabetical characters. For example, the ‘Z’ is sorted before the ‘a’. Multi-byte character sets Multi-byte characters sets are typically required for languages that have more than 256 characters. A typical example is Chinese. In the context of LN the multi-byte character sets require 4 bytes per character. Examples of multi-byte character sets are:
Sorting is binary based. Unicode character set The Unicode character set is a standardized character set supporting all (modern) languages. This takes away the limitation of supporting a small set of languages within one LN environment. When you choose Unicode as character set, you can have for example Chinese, English and French in one LN environment. Another advantage of the Unicode character set is that it comes with linguistic sorting rules. When the data must be visualized in a sorted form, the data will be shown based on the sorting rules as defined by the ICU standard. Hinweis The ICU standard also defines ‘tailoring’, that is fine-tuning the sorting rules to a specific language, which is not supported by the LN tools. As a consequence the database size of a Unicode based LN environment is bigger, and the CPU and memory load on the system are higher than for a multi-byte or single byte based character set. The choice for Unicode is typically made when multiple languages must be supported or when linguistic sorting is a preference. High Ascii Tolerance Achtung! The following only applies to LN environments that do not run in Unicode mode. You must set the high_ascii_tolerance resource to 0 in the following situations:
To set high_ascii_tolerance to 0, add the following line in the $BSE/lib/defaults/all file:
The role of the user locale This section describes the role of the user locale in the following types of installations:
Achtung! It is technically possible to define a different locale for each user ( User Data Template (ttams1110m000) session). However, this can cause problems. Therefore Infor does NOT support the usage of multiple user locales. Consequently all users in an LN environment must have the same user locale. The role of the user locale in a single-byte installation In a single-byte installation the user locale defines the character set that can be used throughout the application. Achtung! Infor strongly advises the following:
The user locale has impact on the following:
The role of the user locale in a multi-byte
installation In a multi-byte installation the user locale defines the character set that can be used throughout the application. Achtung! Infor strongly advises the following:
The user locale has impact on the following:
The role of the user locale in a Unicode
installation Since the introduction of the Unicode character set, the role of the user locale has become less important. In a pure Unicode environment all characters are represented by unique code points. All code points have a unique interpretation. However, there are still some areas where conversions from and to Unicode occur. Beispiel You work in a Unicode environment. But your personal user locale is ISO8859. You want to exchange data between the Unicode environment and another environment. When you perform an export from the Unicode environment, for example through LN Data Director or EDI, the export files are in ISO8859 format. The user locale has no impact on:
The user locale has a small impact on the dump files as created by the bdbpre utility. Data in the bdbpre-dump files is in the UTF-8 character set. If the database contains “high ascii” characters, these characters are converted in the context of the current user locale. Note that the high_ascii_tolerance resource has no effect on this process. For details, refer to the comment on the conversion of “high ascii” characters below. The user locale has impact on:
Conversion of “high ascii” characters The occurrence of “high ascii” characters poses a problem, because one code point can have different meanings in different character sets. Beispiel In the ISO-8859-1 locale, the code point 0xe9 (decimal 233) is interpreted as the “LATIN SMALL LETTER E WITH ACUTE” character (é). In the ISO-8859-7 locale, this code point is interpreted as the “GREEK SMALL LETTER IOTA” character (ι). To determine the meaning of a “high ascii” character, LN uses the current user locale. If the user locale is an ISO8859n variant, then this character set is used to determine the correct meaning; otherwise the ISO85591 character set is used. Beispiel The user locale is ISO88597. A string, which contains the 0xe9 code point, must be converted to UTF-T. The code point is interpreted as the “GREEK SMALL LETTER IOTA” character. The resulting UTF-T code point is 0x9bbc87b9. The user locale is ISO88591 and the same string must be converted. The code point is interpreted as the “LATIN SMALL LETTER E WITH ACUTE” character. The resulting UTF-T code point is 0x9bbc81e9. We recommend that you keep the installation clean from “high ascii” characters. To achieve this, set the high_ascii_tolerance resource to 0.
| |||