File template properties

When configuring the file template, the properties that are shown in this table must be defined:

File Type Format Type Property Description
All Types All Types Name Name to be shown as the label for the File template configuration. This field is required and must be unique across all templates that are defined.
All Types All Types Description Short description of the template configuration.
All Types All Types File Type Specify type of content in the file. Supported file types: Text & Binary.
All Types All - except Full BOD Use Attributes for File Name This determines whether to store the original file name and extension in the created documents when reading files or not. This option is not selected by default. If select- ed, then two properties are enabled to specify the name of the attributes that carry the file name and file extension. These 2 properties are File Name Attribute and File Extension Attribute.
All Types All - except Full BOD File Name Attribute This property is available if Use Attributes for File Name is selected. This specifies the name of the attributes that are added to the noun tag of the created BOD when reading files. It holds the file name of the file read. The name that is specified in this field must match the rules of the XML attributes. The File Name Attribute and File Extension Attribute properties must be different - otherwise BOD generation fails.
All Types All - except Full BOD File Extension Attribute This property is available if Use Attributes for File Name is selected. This specifies the name of the attributes that are added to the noun tag of the created BOD when reading files. It holds the file extension of the file read. The name that is specified in this field must match the rules of XML attributes. The File Name Attribute and File Extension Attribute properties must be different - otherwise BOD generation fails.
All Types All - except Full BOD File Path AttributeThis property is available if Use File Attributes is switched on. It carries the value of folder path as defined in Read Locaton. It is added to the noun tag as attribute of the created BOD when reading files.
Text Only All Types File Encoding The character encoding for the text file content. Supported encoding are UTF-8 and ISO8859-1.
All Types All Types Format Type The type of formatting for the file content. Text files can be one of these types; Delimited, Fixed-Length, Fixed- Length & Delimited, Full BOD or XML.

Binary files are of type Raw Data.

Text Only Delimited / Fixed-Length & Delimited Field Separator

This specifies a separator character between fields when Delimited or Fixed-Length & Delimited is specified in the Format Type property. When rendering text, each element in the input BOD data schema is separated by the Field Separator in the output text string. When parsing text, each field becomes an element in the output BOD data schema. Only one character can be specified or \t (tab).

The Field Separator cannot be the same as or a subset of Line Separator or Optional Value Indication.

Text Only Delimited / Fixed-Length / Fixed-Length & Delimited Line Separator This specifies the character(s) that determine the end of each line. When parsing text, each line is treated as a new record in the output BOD data schema. When rendering text, each data record is separated by the line separator character in the output text string.

Supported escaped characters: \t (tab), \r (Carriage Return) and \n (Line Feed) or a combination of these.

The Line Separator cannot be the same as or a subset of Field Separator or Optional Value Indication.

Limitation: Currently you cannot generate a BOD for files that use literal \r(slash followed by r) as a Line Separator.

Text Only Delimited Field Enclosing Character This specifies the characters used to enclose a field. Each field can have a start enclosing character and an end enclosing character. The start and end character can be the same or different. When parsing the field, all data within properly enclosed characters is treated as valid content. The enclosing characters are not considered part of the data. See Enclosing Character.
Text Only Fixed-Length / Fixed-Length & Delimited Fill Character This is the type of character that is used to fill the blank space in a field and between fields. Only one character can be specified \t.
All Types Delimited / Fixed-Length / Fixed-Length & Delimited / Raw Data Data Fields This specifies the data schema for the input or output text. All data fields that are defined are on the same level; hierarchy structure is not supported. For each data field, a name must be defined. For fixed-Length formats, the field length (characters) must be specified. The field sequence is displayed; to change the order of the fields, use the up and down arrows. Field names must be unique.

Each data field has an Optional flag. Selecting this flag makes the field optional. All cleared fields are treated as required.

See Optional Fields.

For Raw Data, there are only two data fields that are defined. One for Document ID and the other for the Raw Data content. You can set the labels for these two fields, but cannot delete these fields or add more fields. Field names must be unique.

Text Only Delimited Optional Value Indication The value which can be substituted to indicate a field is optional. Only one indicator is allowed per file. Any field that is not present in a row must be represented by this value.

These values are allowed for the Optional value indicator:

  • Value can be left blank. In this case an empty value is assumed.
  • A NULL character (*\0*).
  • Any string of Alpha numeric characters with no spaces. It is a data type independent value. It must be treated by the applications as a special token indicating optional of any field regardless of its data type.

Note that, the onus is still on the application to represent the optional field with the appropriate indicator and its delimiter.

This is the behavior of ION file parser when a field is marked as optional and when the optional value indicator is left blank:

  • When the first field in a row is optional. A line separator or beginning of the file is expected, followed immediately by a field separator.
  • When an intermediate field is optional, that field is expected to be flanked by two field separators on either side.
  • When the last field is optional. A field separator is expected followed by the line separator or the end of file.

In all cases an appropriate delimiter corresponding to that field position is expected. A record with less number of delimiters than the defined number of fields is treated as an invalid file.

The optional value indicator cannot be:

  • A repeated character.
  • The same value as a Field Separator or Line Separator.
  • A subset of a Field Separator or a Line Separator.
Text Only XML Sample XML

The Sample XML Contents property is used to generate Custom BOD metadata into the Data Catalog. Define XML sample content manually or click UPLOAD to select a sample XML file.

Text Only Full BOD Document Select one of the available BODs in ION Registry. Any BOD can be selected; standard or custom.
All Types All - except Full BOD BOD Noun This specifies the noun of the BOD that is generated - based on the file template configuration specified. The BOD noun cannot be similar to one of the existing ION standard BODs. In addition, if the BOD Noun that is specified matches an existing Custom BOD, a warning message is displayed. The user can choose to overwrite the existing BOD.
Text Only Delimited / Fixed-Length / Fixed-Length & Delimited Generated BOD This specifies how records in text are used to create BOD instances or how BOD data area is written to text:
  • (Read) Generated BOD - Single: All lines in a file are treated as one record that is mapped to one BOD. A file with multiple lines result in one BOD instance with repeating noun for each line.
  • (Read) Generated BOD - Multiple: Each line in a file is treated as a record that is mapped to one BOD, a file with multiple lines results in multiple BOD instances created.
  • (Write) Generated BOD - Single: Each noun in the input BOD data schema is transformed into a line of output text in the same file. A BOD instance with repeated elements is written as a single file with multiple lines.
  • (Write) Generated BOD - Multiple: Each noun in the input BOD data schema is transformed into a line of output text in a separate file. A BOD instance with repeated elements is written as multiple files and each file contains one line. For Raw Data files, the default settings for "Generated BOD" is "Single" for both reading and writing binary files.
Text Only XML Include XML header during writing the files XML Header specifies the version number and optionally the character encoding. This is part of a grammar document's XML declaration on the first line of the document. If present, this header must be shown on the first line of all XML documents. Selecting this property enables ION to automatically add the XML header to the XML files when written. Note:
  1. The "version" attribute is added with value '1.0'.
  2. The "encoding" type attribute is added with the value as specified in the file format template for "File Encoding". This is either "utf-8" or "ISO8859-1".
  3. The "stand-alone" attribute is added with the value "yes".

    For example:

    The property Include XML header While writing files is selected and the 'File Encoding' is specified as UTF-8. A header is added for every XML file that ION writes. The header is in this format: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>

This setting is only applicable when ION is writing a XML file to a destination. When ION is reading a XML file, it automatically removes the XML header line if encountered. This is necessary because during the 'read file' operation, the XML file is embedded into the Noun section of the data area of the BOD. Because retaining the XML header inside a section of another XML makes the resulting BOD as an invalid XML.

After the configuration of a format template is completed and saved, click Generate BOD to start the BOD generation steps.

BOD generation applies to Delimited, Fixed-Length, Fixed-Length & Delimited and XML and Raw Data Format types. For Full BOD, no BOD is generated as it already exists in the registry.

  • In case all configurations are valid, a window is displayed to browse and select a sample file which content matches the file template configuration defined. This is used to validate the configuration that is defined against the provided sample content. This applies to Delimited, Fixed-Length and Fixed-Length & Delimited format types.
  • After it is validated, another window is displayed; listing the BOD schema elements and the option to change the data type of the elements. A Document ID element must be specified to complete the BOD generation steps. For Raw Data files, the Document ID is already selected. If Use Data Fields for File Name is selected, the attributes for File Name and File Extension are listed as well. When you click OK, information is displayed about the structure of the generated BODs.

After these steps are completed, a BOD is generated, noun and verbs schema. This BOD is stored in ION Data Catalog as a custom BOD and linked to the file template. This BOD represents the schema to use for rendering and parsing text content.

The defined data types in the generated schema are more strict than shown in the generate UI. Define your example file with values in the appropriate range. These are the value ranges:

  • -128 and 127. This range is generated as "byte".
  • -32768 and 32767. This range is generated as "short".
  • -2147483648 and 2147483647. This range is generated as "int".
  • -9223372036854775808 and 9223372036854775807. This range is generated as "long".

Custom BODs that are generated from File Format Templates can be managed through Custom Documents on the Data Catalog menu in ION Desk.

The contents of the noun in the DataArea section must match the file format template that is defined when the type of files is:

  • Delimited
  • Fixed-Length
  • Fixed-Length & Delimited

Otherwise a confirm BOD is generated.