Configuring Data Characteristics

You can configure the data characteristic of a source.

Note: Data Characteristics can also be set using the Command Window with the setdatacharacteristics command. This is necessary for SFTP files.

To configure the data characteristics of a source:

Note: Max allowed line breaks for a single row is set to 1,000. Birst cannot parse files that exceed this limit.

Click the Actions icon of the connector, then click Data Characteristics.

Note: Some data characteristics are automatically set based on the file type.
From the Data Characteristics menu, select the characteristics to use:

Only Recognize Quotes at the Start and End of Fields

If this box is checked, quotes embedded in fields are not treated as the start/end of quoted strings. For example, abc|red"red |green"c would be translated as abc, red"red, green"c rather than abc, red"red, green", c.

First row contains column names

If this box is checked, the columns in the first row of the source will be treated as column headers.

Force number of rows to match header count

If this box is checked, Birst adjusts the number of columns in the source to match the number of headers.

Parse Formatted Numbers as Numeric

Enabled by default. This setting parses all numbers as numeric values that can be used in measures. If a user disables this, all columns in that source with formatted numbers are converted to Varchar. This could impact measures derived from those columns in the space.

Column Separator

Enter the character that is used as the column separator in the data source here. By default, the column separator is | (pipe).

Quote character

By default, the quote character is set to ". If strings are quoted with a different character in this data source, enter that quote character, for example, '. There are cases where CSV files are ill-formatted and quotes are not properly enclosed throughout the CSV file. This can result into malformed CSV exception. For such cases, there is a provision to parse a CSV without considering any quote characters.

Character Encoding

Select the type of encoding to use for this data source here. By default, the encoding is set to UTF-8. This needs to match the format of the incoming data.

Number of rows to skip at beginning of file

The number of rows as the start of a file to exclude from the import.

Number of rows to skip at end of file

The number of rows as the end of a file to exclude from the import.

Parser Type

Birst is able to parse CSV data using two different parsers: Default and the RFC4180. Typically, you can use the Default parser type. In some cases, you may run into the following error message when uploading a file into Birst: {{ 1) Malformed file: Unterminated quoted field at end of CSV line. 2) Too many line breaks within a single data row }} This may be due to the fact that your file is satisfying a RFC4180 CSV file that Birst's Default parser is unable to analyze.
Click Save.