Deep Dive: Transformations

Professional spaces can use the Transformations located on the Prepare page to make additional modifications to the data. For more information on preparing data, see Prepare Data. For more information on adding transformations, see Add Transformation.

The table below lists all available transformations and describes what they do.

Available Transformations

Transform Description Menu Icon
Add Column Adds a new column by writing an expression for customized data. See Deep Dive: The Add Column Transform. Important: Not available in the Trial. Upgrade to Birst Professional Edition!
Append Table Appends a table with another table. Append is a common operation used by analysts for data workflows. Common use cases include appending historical data with current data, appending data from various retail outlets, or multiple profiles in the case of Google Analytics. Append is inspired by SQL based Union and Union All statements. Union appends rows by removing duplicate rows, whereas Union All appends all rows that may include duplicates.
Combine Merges two columns into one, with or without a delimiter. By default, the delimiter is a comma.
Data Type Changes the data type of the column or columns. It does not change the data type in the source data, only for the column in Birst. In addition to the standard date casting, users can now suggest date formats for parsing and casting Numeric and Text columns into Dates.
Date

Allows a variety of functions related to dates. See Deep Dive: Date Transforms.

Works only with columns of type date or datetime. If you try to apply this to other types of columns there will be an error.

Filter Filters the rows based on the contents of one or more cells in the column. A filter transformation cannot be edited once you create it.
Ignore Excludes the column from the Birst data store. Does not change the column in the source.
Index Creates a new column of ascending sequential numbers titled "Index", most often used as an index on a table.
Lookup Creates a new column based on a lookup value in another table.
Merge Merges columns from two tables into a single table. See Deep Dive: Merge Tables Transform.
Null Handling Replaces empty cells in a column or columns with text of your choice.
Pivot Pivots the table based on column.
R Transform The R transform lets users leverage R code when applying transformations to a prepared a source.
Remove Duplicates Deletes records based on the contents of one or more columns. When employing this transformation, the first matching record is kept while the other matching records are removed from the prepared data source.
Remove Records Deletes records based on the contents of one or more columns. You can edit or delete this transformation.
Rename Renames the title of the selected column or columns.
Replace Finds and replaces a value in the selected column. Can replace multiple values at the same time.
Split Splits one column into two or more columns, based on either a delimiter or a number of characters.
Split Row

Duplicates a column based on a selected delimiter.

Summarize Summarizes both groups one or more rows based on matching values, and also shows a calculation based on data another column. A Summarize transformation cannot be edited once you create it.
Text Changes case or trims out empty spaces of the selected column.

Add Column

Important: Add Column is not available in the Trial. Upgrade to Birst Professional Edition!

The Add Column transformation adds a column to the table. You give it a name and a formula for the contents, or just some text. The formula uses the Birst Query Language, BQL. BQL is similar to macro or other query languages. Then you assign a data type for the column. See Deep Dive: The Add Column Transform.

Append Table

The Append Table transformation lets you append a table with another table similar to SQL based Union and Union ALL operators. This generally works best for tables with the same structure and data types. Note: You cannot include columns with different data types as part of the union condition.

Append Tables with Matching Columns

Click the image to run an animation:

Append Tables with non-Matching Columns and Data Types

You can create Unions with tables with similar data from different systems but you may need to create some default values for columns that do not map to each other. For example, another system may capture city but not region and you want to create a Union with a table that captures both City and Region. You could enter a default value for the table that lacks the Region data so it can map to the Region column as desired. You cannot create Unions with columns that contain similar data but different data types. In this example, one table has Transaction Amt in Decimal data type, while the other table contains Sale in numeric data type. See how Birst prevents these values from creating a Union.

Click the image to run an animation:

Combine

You can combine the data from two columns into one column, and optionally indicate a separation, using a delimiter or a space, between the two. By default there is no space between the two.

Click the image to run an animation:

This transformation can combine operation is to put a first name and last name together in a single column for full name. , or to put address or location columns together. For example you could combine a City column with a State column and separate them with a comma.

Data Type

The Data Type transformation changes the type of data in a column or columns. Sometimes a column from the original source is one type, and you want to relate it to a column from another source that is a different type. You would need to change one of them so that the data types match.

Data type conversions include varchar, integer, float, date, and datetime. When you change a data type, Modeler changes the selected column, it does not create a new column.

Click the image to run an animation:

When changing date or datetime columns, you may see the following results:

  • When you change a date to a datetime, Modeler updates the data in the column and adds the timestamp 12:00:00.000AM.
  • When you change a datetime to a date, Modeler drops the timestamp.
  • When you change a datetime to a varchar, Modeler reformats the date, from the 08/29/2016 5:39:00.000 AM format as a datetime to the 2016-09-02 06:01:00.000 format as a varchar.

When changing varchar columns to integer or float, change or remove any non-numeric characters first. By default if a varchar with non-numeric characters is converted to integer or float, the conversion results in 0.0.

Parse Dates Data Type Enhancement

In addition to the standard date casting, users can now suggest date formats for parsing and casting Numeric and Text columns into Dates.

Date

The Date transformations are a variety of functions that modify columns of type date or datetime. See Deep Dive: Date Transforms.

Filter

Filter changes the rows in the table based on the values in the cells. Filters use operators including as equal to, not equal to, greater than, greater than or equal to, less than, less than or equal to. For example you can filter data out of a table based on the value of a cell.

Click the image to run an animation:

Wild cards and partial entries are not supported. The filter value text must exactly match the text in the cell. For text columns, the greater than and less than operators use alphabetical order.

Ignore

The Ignore transformation makes the column invisible in the table. The column is still part of the prepared source, but you won't see it.

When you select a column and click Ignore, the screen refreshes and the column is gone. To get it back, click on the Ignore icon in the pipeline and select delete.

Click the image to run an animation:

FAQ: What is the difference between using Ignore in the Prepare page and just excluding it when you define the connection in the Connect page?

If you won't need a column anywhere in your dashboard or reports, then don't put it in the connection and it won't be extracted from the original source at all. Sometimes you need a column in some reports and not others;. In this case, extract it and use it in a prepared source for a report that needs it. Then use the Ignore transform in the prepared source for a report that doesn't need the column.

Index

The Index transformation sets up a new column with numbered rows. You can set the starting number of the index. If you create more than one index column, the column name is appended with a number.

Click the image to run an animation:

You can set the starting number of the index. If you create more than one index column, the column name is appended with a number.

Index can be used as a way to uniquely identify a row, or as a simple counting mechanism.

Lookup

Lookup is a common operation that analyst use for data cleansing. Lookups in BI tools are similar to Vlookup and Hlookup in excel where the Vlookup provides a matching a key value to a column and returns the corresponding value to a different column. The first match is considered if there are duplicates and any unmatched columns can return a specified default value.

Click the image to run an animation:

Merge

The Merge transformation takes the columns from two tables and puts them together into one new table. You can show only some columns from each original table, or take them all. See Deep Dive: Merge Tables Transform.

Null Handling

Sometimes there is nothing in a cell, and you don't want to leave it blank. You can put text into the empty cells of a column or columns.

Click the image to run an animation:

Pivot

The pivot transform pivots the table based on the selected column. For more information, see Pivot Transform.

R Transform

The R transform lets users leverage R code when applying transformations to a prepared a source.

Similar to the Index column, the R transform is only active for the entire prepared source and not when you select a column in the source.

Note: The R Transform will not be active if you did not set up your R server Connectivity in Modeler Connect.

  1. Select a prepared source, and then select the R Transformation to run R code against a prepared source:
  2. Enter R code in the editor box to complete your R transformation.
  3. You are Provided with guidance for your R code script entry:
  4. Enter your R code and validate:
  5. After the code executes successfully, click Done.
  6. View the results of the R code transformed prepared source.

Saving the R transform updates the prepared source with sample data based on the R code returned results from the R server. You can then add any subsequent steps to your prepared source. Publishing the prepared source runs the R code on the entire data set. The prepared source can be schedule for import.

Remove Records

Deletes records based on the contents of one or more columns. Select a column and enter the condition(s) for which you want to remove records from your prepared source. Note: Be careful when entering your criteria as the Remove Records transformation will remove all records that meet the criteria. For instance, when entering "Fran" in the below condition, the Remove Records transformation will remove all of the records that start with "Fran".

If you need to edit or delete the Remove Records transformation, simply select the transformation from the Transformation pipeline and select the desired action.

Remove Duplicates

Remove duplicates deletes records based on the contents of one or more columns. When employing this transformation, the first matching record is kept while the other matching records are removed from the prepared data source.

Rename

Rename controls the labels of column headers. Just select a column or columns and type in new names.

Click the image to run an animation:

Replace

Replace is a standard search-and-replace in a single column. Select the column and provide one or more text or number replacements.

Click the image to run an animation:

Split

Split will divide the contents of a cell into multiple cells by adding a column. You can split on a delimiter, or on a count of characters. The new column is next to the original one, and you can use the Rename transform to give it a different name.

Click the image to run an animation:

Split Row

Split row is a way to duplicate rows based on a delimiter. You can use one of the delimiters in the dropdown list or enter your own. For this example the rows that have an ampersand (&) in the selected column are duplicated.

Click the image to run an animation:

Summarize (Group By and Show Calculations)

Summarize both groups one or more rows based on matching values, and also shows a calculation based on data another column.

A Summarize transformation cannot be edited once you create it.

Text

Text provides uppercase and lowercase transforms, plus it will trim extra blank spaces.

Click the image to run an animation: