Configuring Data Characteristics

You can configure the data characteristic of a source.

Note:  Data Characteristics can also be set using the Command Window with the setdatacharacteristics command. This is necessary for SFTP files.

To configure the data characteristics of a source:

Note: Max allowed line breaks for a single row is set to 1,000. Birst cannot parse files that exceed this limit.
  1. Click the Actions icon of the connector, then click Data Characteristics.
    Note: Some data characteristics are automatically set based on the file type.
  2. From the Data Characteristics menu, select the characteristics to use:
    Only Recognize Quotes at the Start and End of Fields
    If this box is checked, quotes embedded in fields are not treated as the start/end of quoted strings. For example, abc|red"red |green"c would be translated as abc, red"red, green"c rather than abc, red"red, green", c.
    First row contains column names
    If this box is checked, the columns in the first row of the source will be treated as column headers.
    Force number of rows to match header count
    If this box is checked, Birst adjusts the number of columns in the source to match the number of headers.
    Parse Formatted Numbers as Numeric
    Enabled by default. This setting parses all numbers as numeric values that can be used in measures. If a user disables this, all columns in that source with formatted numbers are converted to Varchar. This could impact measures derived from those columns in the space.
    Column Separator
    Enter the character that is used as the column separator in the data source here. By default, the column separator is | (pipe).
    Quote character
    By default, the quote character is set to ". If strings are quoted with a different character in this data source, enter that quote character, for example, '. There are cases where CSV files are ill-formatted and quotes are not properly enclosed throughout the CSV file. This can result into malformed CSV exception. For such cases, there is a provision to parse a CSV without considering any quote characters.
    Character Encoding
    Select the type of encoding to use for this data source here. By default, the encoding is set to UTF-8. This needs to match the format of the incoming data.
    Number of rows to skip at beginning of file
    The number of rows as the start of a file to exclude from the import.
    Number of rows to skip at end of file
    The number of rows as the end of a file to exclude from the import.
    Parser Type
    Birst is able to parse CSV data using two different parsers: Default and the RFC4180. Typically, you can use the Default parser type. In some cases, you may run into the following error message when uploading a file into Birst: {{ 1) Malformed file: Unterminated quoted field at end of CSV line. 2) Too many line breaks within a single data row }} This may be due to the fact that your file is satisfying a RFC4180 CSV file that Birst's Default parser is unable to analyze.


  3. Click Save.