ETL Services Script Functions

An ETL script provides a set of operations that will execute for each line of data returned by the input query (SELECT statement). See ETL Services Input Query Reference.

ETL provides the standard elements of a procedural scripting language to define how data is transformed. In addition, you can write your own custom functions. See Custom Functions in ETL Scripts.



Tip: Click the Wizard button to use the Script Wizard to help you write your script.

ETL Script Functions

The following built-in functions are supported in the Script section of ETL Services:

To learn about fuzzy matching see Fuzzy Matching.

To create a custom function see Custom Functions in ETL Scripts.

Halt

The Halt function allows the execution of a script to be aborted. The function requires a string parameter that is used to report the error. When run using the Execute button in ETL Services the error message will appear in the Script Results message box. When encountered during normal processing an error will appear in the Process Data window and data will not be loaded into the warehouse.

Halt ‘MyErrorMessage’

Variable Definition

Variables may be defined as part of a script. Variables are initialized once at the beginning of script execution, before the first line of input data is processed, and maintain their values throughout the entire process. This means that variables can store information that can be used across sets of input records. Variables are declared using the DIM statement:

Dim [Count] As Integer

Variables can be defined as Integer, Float, Varchar, Date or DateTime:

Dim [TestFloat] as Float = 3.14159
Dim [TestString] as Varchar(30)
Dim [TestDate] as Date = #10/1/2005#
Dim [TestDateTime] as DateTime = #11/1/2006 1:35 PM#

Optionally, a variable can be initialized as part of its declaration as above. This initialize can be either a constant or a scalar (single result value) query. Only the first column of the first row of the query is used:

Dim [Count] As Integer = Select Count([Orders.OrderID]) from [Orders]

Initialization queries can also have where clauses that include other variables:

Dim [C] as Integer =Select Max([Order Details.OrderID]) from [Order Details]
Dim [D] as Varchar(50) = Select [Orders.ShipName],[Orders.OrderID] from [Orders] where [Orders.OrderID]=[C]

Lists

One particularly powerful type of variable is a list. Lists allow you to accumulate data elements over many records and create calculations based on them:

Dim [Values] As List(Float)

Lists support operations to add, remove and count elements:

Add([Values],10) // Adds the value 10 to the list

Length([Values]) // Returns the length of the list

RemoveAt([Values],0) // Removes the element at position 0 from the list

RemoveAll([Values]) // Removes all elements from the list

Individual list elements can be referenced using an array-syntax:

[Sum] = [Sum]+[Values](10) // Add the 11th element in the list to the variable Sum

Lists can also be assigned using a query that returns 1 value. This allows for convenient use of lists for iteration purposes. For example for the following query that returns a list of product IDs:

Dim [ProductList] As List(Integer) = Select [Products.ProductID] from [Products]

Maps

Maps are another useful structure. A map allows the designer to create mappings from one data element to another. This is useful for storing lookup values that can be accessed quickly and easily.

Dim [Mappings] As Map(Varchar(30),Integer)

Maps support operations to add, remove and count elements:

[Mappings](‘See Jane run’) = 10 // Associates the value 10 with the string ‘See Jane run’

Length([Mappings]) // Returns the length of the list

RemoveItem([Mappings],’See Jane run’) // Removes the mapping for ‘See Jane run’

RemoveAll([Mappings]) // Removes all elements from the list

Individual list elements can be referenced using an array-syntax:

[RowIndex] = [Mappings](‘See Jane run’)

If an element does not exist in map, a null value is returned.

Maps can also be assigned using a query that returns 2 values. This allows for convenient use of lookups. For example for the following query that returns a product ID for each order:

Select [Order Details.OrderID],[Order Details.ProductID] from [Order Details]

You can populate a map with a product name for each product ID and do a lookup in the script:

Dim [TestMap] as Map(Integer,Varchar(30)) = Select [Products.ProductID],[Products.ProductName] from [Products]
[ColA]=[Order Details.OrderID]
[ColB]=[TestMap]([Order Details.ProductID])
WriteRecord

Assignment

The results of an expression can be assigned either to a variable of the appropriate type or an output column of the appropriate type:

[Average Price] = [Sum]/[Count] // Calculate an average price

// Assign the value of the input column [Orders.ProductID] to the output column [ProductID]
[ProductID]=[Orders.ProductID]

// Assign an aggregated value
[Total Quantity]=Sum([Order Details.Quantity])

Literals

ETL allows you to specify values of each data type directly as part of an expression. Integers and Floats are represented as standard numbers:

[IntegerVariable] = 2

[FloatVariable] = 3.14159

Strings must be quoted using single quotes:

[StringVariable] = ‘See Jane run’

Date and DateTime values must be enclosed in ‘#’ to designate them as dates:

[DateVariable] = #10/1/2009#

[DateTimeVariable]= #10/1/2009 4:30 PM#

Keywords

ETL Services has keywords that can be used in conjunction with variable assignment.

“NowDate” and “NowDateTime” can be used to represent the current date/datetime:

[DateVariable] = NowDate

Null Values

ETL provides for Null (not set) values. They can be assigned or used in expressions:

[Variable] = Null
IF [Variable] = Null
       …
End If

IF [Variable] Is Null
       …
End If

IF [Variable] Is Not Null
       …
End If

WriteRecord

Data is not automatically written to the new data source. Only when the command WriteRecord is executed will this happen. When a script starts, a variable is initialized for each output column. These variables maintain their values throughout the execution of the script. It is up to you to change or update their values as input records are processed.

Filtering Records

When WriteRecord is executed, the current state of each of these variables is written to the data source. This means that a designer could conditionally execute WriteRecord to filter records.

If [Variable] <> “bad value” Then
       WriteRecord
End If

Pivoting Records

Additionally, one could take multiple input records, store the values of a given column in several variables and accumulate them. When all records for a given set of keys are present, one could write the record out with each variable as a different column – effectively pivoting the data.

For example, to pivot a single “value” column into 2 different output columns depending on a flag, you could use the following:

If [Flag]=1 Then
        [Output1]=[Value]
ELSEIF [Flag]=2 Then
        [Output2]=[Value]
End If
WriteRecord

De-Pivoting Records

One could, for example, only write records that meet certain data quality constraints. Additionally, one could execute MORE WriteRecord statements than there are input records. In this case, for example, one could take a given input row that many include multiple facts and “de-pivot” the row by writing one output row for each measure column in the input.

[Value] = [Output1]
[Flag] = 1
WriteRecord
[Value] = [Output2]
[Flag] = 2
WriteRecord

If/Then/Else

ETL Services supports full conditional logic through its IF/THEN/ELSE syntax.

If [Orders.Freight]>75 Then

       [Average Freight] = [Orders.Freight]/SUM([Order Details.Quantity])

ElseIf [Orders.Freight]>10 Then

       [ROWNUM] = [ROWNUM]+1

       [Average Freight] = 10

Else

       [Average Freight] = 0

End If

For/Next Loop

ETL Services supports fixed count looping through the For/Next statements.

For [Test] As Integer = 1 to 10 Step 2

       [Val] = [Val] * 2

Next [Test]

This structure allows for constructing loops with a fixed number of iterations. Notice that the variable used for looping (in this case [Test]) can be declared directly in the For/Next statement. Also, the “Step” construct allows incrementing by more than one for each loop. The Exit For statement will allow you to exit a for loop during execution.

While Loop

Conditional looping is supported using the While statement. An expression can be used that, if true, will continue to cause the loop to be executed. The Exit While statement can be used to break out of a loop in the middle of execution.

While [Test] < 5

       [Test] = [Test] + 1

       If [ROWNUM] > 50 AND [Test] > 2 Then

              Exit While

       End If

End While

Completion Block

Sometimes cleanup is necessary before finishing processing. This is usually true when variables are used to store data across many rows and either a row value change or end of file is used as trigger for writing a new output record. In this case, the Completion Block allows a designer to finish up an unfinished record processing after all input records are processed. The Completion Block executes only once, after all other script code is executed and there can only be one declared per script. Typically this block is used to write the final output record(s).

Complete

       [Value] = [LastValue]

       WriteRecord

End Complete

Deleting Records

Deleting records may be required under rare circumstances where cleanup is required. Either old dimension records need to be removed or prior facts need to be re-stated. In either case, a simple delete statement may be used for this purpose. It is similar to the select statement in a variable definition, except that it only includes the target table name and a where clause.

Records may be deleted from:

  • A data source:

    Delete From [Order Details] Where [Order Details.ProductID]=10

  • A level (dimension table):

    Delete From [Level(Order Details.Order)] Where [Level(Order Details.Order).ProductID]=10

  • A grain (fact table):

    Delete From [Grain(Order Details)] Where [Grain(Order Details).ProductID]=10

It is not possible to delete all records from a table by leaving out the where clause; instead, the where clause must be crafted in such a way as to guarantee that it will delete all records.

Type Casting

Where appropriate, you can turn a variable or expression of a given type into another type. To convert to an Integer, Float, DateTime or Varchar, you can cast an expression to the desired output. For example:

[IntegerVariable] = INTEGER([StringVariable])

Variables (GETVARIABLE)

The GetVariable function retrieves the value of a variable. For example, the following returns the current load number:

[LoadNumber] = GETVARIABLE('LoadNumber')

Tip: Session variables are not initialized in an ETL script because a user session is not available. Do not use session variables in ETL scripts as they could affect processing performance. If session variables are detected in an ETL script, the metadata validation displays an error and the processing logs contain a warning.

Writeback to Salesforce.com (UPDATESFDC)

The UPDATESFDC function allows you to write directly to existing Salesforce.com objects. See Writing Back to Salesforce.com.

Parent-Child Hierarchy Flattening

Many operational systems store hierarchies in recursive parent-child structures called trees. In these structures the ID of an object can be associated with an ID of its parent. Examples include organizational hierarchies and product hierarchies. Given the nature of these hierarchies, children may be any number of levels away from the top element in the tree. Birst ETL Services provides a way to flatten one of these hierarchies. This ensures that all elements at the same level are evaluated at the same level and provides for convenient drilling down a hierarchy. This also allows for processing of information at a element in a hierarchy based on information at higher levels or lower levels.

 For example, you may want to examine children of a particular element to calculate offsets. You might do this to prevent double counting as you aggregate across a hierarchy (for example, so that a parent’s value is the incremental difference over the sum of all of its children). You might also want to walk from the top down the tree to a node and calculate a running total. You might do this walking down a bill of materials to calculate total number of items used.

Each script allows for one parent-child structure in memory. As rows are processed, parent-child combinations are added one at a time. Birst maintains this list of parent-child connections until it is time to flatten the hierarchy. This is typically done in the complete block when all input parent-child records are processed and now one can output the results. Flattening happens when you attempt to list each row in the flattened hierarchy table using the NextChild method. Also, ETL Services will provide the depth of each element, which is accessible by the GetLevelAttribute method.

There are two ways that Birst can flatten a tree: bottom-up and top-down.

Bottom-Up Flattening

With bottom-up flattening, Birst assumes each element in the tree has one and only one parent. This assumption is valid for many hierarchies, for example, in most organization hierarchies, each employee has one and only one boss or each subsidiary is part of one and only one parent organization. In this case it is possible then to start at each element and “walk up” the tree to the parent. Each time the NextChild method is called, Birst moves to the next element. Birst calculates the level of that element and you can then access the elements at all fixed levels in the hierarchy using the GetLevelValue method.

For example if an element is 3 levels down in the hierarchy, calling GetLevelAttribute('CurrentDepth') will return 3. If the hierarchy is 10 levels deep, calling GetLevelAttribute('NumLevels') will return 10. Calling GetLevelValue(0) will return the top-most parent in the hierarchy, while GetLevelValue(2) will return the current element (numbering is zero-based). Calling the GetLevelValue method on any levels below the current one will result in the current element being returned.

Top-Down Flattening

With top-down flattening, Birst starts at a given element and attempts to walk down, producing a single tree below that element. This is necessary for hierarchies where the same elements can have multiple parents. For example, in a bill of materials, the same part may be used in multiple assemblies. The only difference in this case is that you need to add a top level element to the NextChild method as a parameter. In the case of a hierarchy where there are multiple top-level elements, you can use the ResetChild method, which resets the current tree and allows the NextChild method to be called again with a different top-level element. Using this method, you can iterate through all top-level elements, flattening each of their trees separately. Note that NumLevels (the depth of the tree), is calculated for each top-level element and can be different from top-level element to top-level element.

AddParentChild

The AddParentChild method adds a single parent-child pair to the current list of pairs:

AddParentChild([ParentID],[ChildID])

NextChild

Once a set of parent-child pairs have been added, the list can then be iterated over one by one to flatten. The NextChild method positions the current record to the next available pair. If there are no more pairs left, it returns false. In the case of top-down processing, you must supply the top level element:

NextChild([TopLevelNode])

Reset Child

The ResetChild method clears the current top-level element so that the NextChild method can be called with a new top-level element.

GetLevelValue

For the current pair, GetLevelValue(n) returns the ID "n" levels deep into the hierarchy. If there are fewer than "n" levels, it returns the ID of the lowest level available. Hence, the IDs will repeat for levels lower than their depth.

GetLevelAttribute

The GetLevelAttribute method provides additional data either on the entire tree or on a given record that is being processed. There are two options that are supported:

GetLevelAttribute('NumLevels') // Returns the maximum depth of items in the tree

GetLevelAttribute('CurrentDepth') // Returns the depth of the current item in the tree

ToTime

ToTime allows integer values to be converted into a time format, for example: HH:mm:ss. ToTime is supported in both ETL scripts and in column expressions.

Years and months are not supported.

ToTime(1000000000, 'hours,minutes,seconds', '%d:%02d:%02d')

For example:

ToTime(Integer([Order_Details.OrderID]+[Order_Details.ProductID]*1000),'HH:mm:ss:SSS')

RWRITE

Use RWRITE to write back to R during an ETL process. RWRITE has two parameters: the path to write to on the R Server and the value to write. A file is created during script execution the first time a path is referenced. That same connection is used until the end of the script. This way, data can be written to the R file with precise control.

The following example shows how to write columns back to file on the R Server using the iris data source.

RWRITE('/home/rstudio/etltest.txt',[iris.row] + '|' + [iris.Sea;Length] + '|' + [iris.SepalWidth] + '\n']

See Moving Data From Birst to R.

See Also
ETL Services Overview
Birst ETL Services Input Query Reference
Custom Functions in ETL Scripts
Creating a Scripted (ETL) Data Source