ETL Services Input Query (SELECT) Reference

This topic provides information about how to write the SELECT statement, the input query, for an ETL script. ETL Services provide significant flexibility when you specify an input query. Aggregation, filtering, sorting and joining can all be leveraged when specifying the data that ETL Services stream through when producing a new data source.

Sources

ETL Services can query three different types of data:

  • Data sources
  • Dimension levels
  • Measure grains

In addition to processing input data, Birst has the advantage of querying data that already processed the Birst data set. This allows you to produce custom calculations on top of already processed data or to use data that had been loaded previously as part of a transformation.

ETL Services provides a convenient syntax for specifying these sources. To reference a data source, use its name:

SELECT [Order Details.OrderID], [Order Details.ProductID], [Order Details.UnitPrice], [Order Details.Quantity] FROM [Order Details]

Birst supports aliases for table names in scripts. Provide the alias in single quotes ('aliasname') after the table name. For example:

SELECT [Emp.EmployeeID], [Emp.LastName] FROM [Employees] 'Emp'
Note: Birst does not support alias names with Level() and Grain() operators in a scripted source. Spaces in alias names are not supported.

To reference dimension data from a given level, use the Level() operator. The level operator allows you to reference a hierarchy and a level as if it were a table:

SELECT [Level(Order Details.Orders).OrderID], [Level(Order Details.Orders).ProductID], [Level(Order Details.Orders).UnitPrice], [Level(Order Details.Orders).Quantity] FROM [Level(Order Details.Orders)]

To reference a measure grain, use the Grain() operator. The Grain operator allows you to refer to the grain of a data source rather than having to type all the levels that compose a grain:

SELECT [Grain(Order Details).OrderID], [Grain(Order Details).ProductID], [Grain(Order Details).UnitPrice], [Grain(Order Details).Quantity] FROM [Grain(Order Details)]

Aggregation

In addition to querying columns directly, numerical columns can be aggregated as well. ETL Services supports Sum, Count, CountDistinct, Avg, Min, and Max aggregation rules.

In the case below, to aggregate unit price and quantity by product, just add the aggregation rule operators:

Select [Order Details.ProductID], Avg([Order Details.UnitPrice]), Sum([Order Details.Quantity]) From [Order Details]

DISTINCT/TOP Operators

In addition to querying columns directly and using aggregations in the SELECT portion of the input query, Birst supports using the DISTINCT keyword for de-duping data and the TOP operator for returning the TOP N rows in the result set.

SELECT DISTINCT [Categories.CategoryID], [Categories.CategoryName], [Categories.Description], [Categories.Picture] FROM [Categories]

SELECT TOP 1 [Categories.CategoryID], [Categories.CategoryName], [Categories.Description], [Categories.Picture] FROM [Categories]

Sorting

To sort the incoming data by elements in the select list, use Order By.

Note: Everything in the Order By must also be in the SELECT list.
Select [Order Details.OrderID], [Order Details.ProductID], [Order Details.UnitPrice], [Order Details.Quantity] From [Order Details] Order By [Order Details.ProductID]
Note: We recommend not using Order By when using a large amounts of incoming data.

Joining

Often, data from two sources must be combined to create a new transformed data set. ETL Services supports joining any two basic queries. The syntax is a little different from standard SQL syntax as ETL Services joins after any aggregation has occurred.

Select [Orders.OrderID],[Orders.ShipName],[Orders.Freight] From [Orders] Inner Join Select [Order Details.OrderID],SUM([Order Details.Quantity]) From [Order Details] On [Orders.OrderID]=[Order Details.OrderID]

ETL Services supports outer joins as well as inner joins. To use an outer join, use LEFT OUTER JOIN, RIGHT OUTER JOIN or FULL OUTER JOIN.

Note: A maximum of 10 joins can be used in one ETL script.
Select [Orders.OrderID],[Orders.ShipName],[Orders.Freight] From [Orders] Inner Join Select [Order Details.OrderID],SUM([Order Details.Quantity]) From [Order Details] On ISNULL([Orders.OrderID])=ISNULL([Order Details.OrderID])

Union/UnionAll

The ability to Union and UnionAll is supported in ETL Services input queries. Union and UnionAll are not considered joins.

Note: The column data types match in order to merge data.
Select [Orders.OrderID] from [Orders]
UnionAll
Select [Order Details.OrderID] from [Order Details]

NULL Support

For sources with column mismatches, Birst supports using NULL as a placeholder for missing columns.

Select [Orders.OrderID],NULL from [Orders]
UnionAll
Select [Order Details.OrderID],[Order Details.Product ID] from [Order Details]

Referencing Union Columns in Script Portion

To reference a Union input query column in the script portion, you can use either an explicit reference or a column index value reference.

Explicit reference
Explicitly reference the data source and column name such as:
[Order ID]=[Orders.OrderID]
Only columns from the data source mentioned within the first SELECT statement can be explicitly referenced. If columns from other data sources in the SELECT statement are explicitly referenced, they will populate NULL values.
Column index value reference
Use the GETCOLUMNVALUE() function to reference a column index value. To assign the Product ID to the output column use the index value to grab the Product ID column from the input query.

SELECT statement:

SELECT [Orders.OrderID],NULL from [Orders]
UnionAll
Select [Order Details.OrderID],[Order Details.Product ID] from [Order Details]

Script:

[Product ID]=GETCOLUMNVALUE(1)
   WRITERECORD

Filtering

Filters can be applied to each SELECT statement by adding a WHERE clause.

The supported operators in a WHERE clause are:

=, <>,>, <, >=, <=, AND, OR, &&, IN/NOTIN, ISNULL/ISNOTNULL, LIKE/NOTLIKE (% wildcard is supported)

The supported functions in a WHERE clause are:

DATEDIFF(), DATEADD(), DATEPART(), IFNULL()

For example:

SELECT [Orders.OrderID], [Orders.OrderDate], [Orders.RequiredDate], [Orders.ShippedDate] FROM [Orders] WHERE DATEDIFF(DAY,[Orders.OrderDate],[Orders.ShippedDate])>100 AND IFNULL([Order Details.UnitPrice],0.0) <> 0.0

Example using a variable in the WHERE clause:

SELECT [Orders.OrderID], [Orders.OrderDate], [Orders.RequiredDate], [Orders.ShippedDate] FROM [Orders] WHERE [Orders.OrderDate]<GetVariable('MyDateVariable')