ETL Services Input Query (SELECT) Reference
This topic provides information about how to write the SELECT statement, the input query, for an ETL script.
Tip: Click the Wizard button to use the Script Wizard to help you write your script.
ETL Services provide significant flexibility when you specify an input query. Aggregation, filtering, sorting and joining can all be leveraged when specifying the data that ETL Services will stream through when producing a new data source.
Sources
ETL Services can query three different types of data:
- Data sources
- Dimension levels
- Measure grains
In addition to being able to process input data, Birst has the advantage of being able to query data that has already been processed and is part of the Birst data set. This allows you to produce custom calculations on top of already processed data or to use data that had been loaded previously as part of a transformation.
ETL Services provides a convenient syntax in specifying these sources. To reference a data source, simply use its name:
SELECT [Order Details.OrderID], [Order Details.ProductID], [Order Details.UnitPrice], [Order Details.Quantity] FROM [Order Details]
Birst supports aliases for table names in scripts. Provide the alias in single quotes ('aliasname') after the name of the table. For example:
SELECT [Emp.EmployeeID], [Emp.LastName] FROM [Employees] 'Emp'
Tips:
- Birst does not support alias names with Level() and Grain() operators in a scripted source.
- Spaces in alias names are not supported.
To reference dimension data from a given level, use the Level() operator. The level operator allows you to reference a hierarchy and a level as if it were a table:
SELECT [Level(Order Details.Orders).OrderID], [Level(Order Details.Orders).ProductID], [Level(Order Details.Orders).UnitPrice], [Level(Order Details.Orders).Quantity] FROM [Level(Order Details.Orders)]
To reference a measure grain, use the Grain() operator. The Grain operator allows you to refer to the grain of a data source rather than having to type all of the levels that compose a grain:
SELECT [Grain(Order Details).OrderID], [Grain(Order Details).ProductID], [Grain(Order Details).UnitPrice], [Grain(Order Details).Quantity] FROM [Grain(Order Details)]
Aggregation
In addition to querying columns directly, numerical columns can be aggregated as well. In the case below, to aggregate unit price and quantity by product, just add the aggregation rule operators:
Select [Order Details.ProductID], Avg([Order Details.UnitPrice]), Sum([Order Details.Quantity]) From [Order Details]
ETL Services supports Sum, Count, CountDistinct, Avg, Min, and Max aggregation rules.
DISTINCT/TOP Operators
In addition to querying columns directly and using aggregations in the SELECT portion of the input query, Birst supports using the DISTINCT keyword for de-duping data and the TOP operator for returning the TOP N rows in the result set.
SELECT DISTINCT [Categories.CategoryID], [Categories.CategoryName], [Categories.Description], [Categories.Picture] FROM [Categories]
SELECT TOP 1 [Categories.CategoryID], [Categories.CategoryName], [Categories.Description], [Categories.Picture] FROM [Categories]
Sorting
To sort the incoming data by elements in the select list, use Order By.
Important: Everything in the Order By must also be in the SELECT list.
Select [Order Details.OrderID], [Order Details.ProductID], [Order Details.UnitPrice], [Order Details.Quantity] From [Order Details] Order By [Order Details.ProductID]
Joining
Often, data from two sources must be combined to create a new transformed set of data. ETL Services supports joining any two basic queries. The syntax is a little different from standard SQL syntax as ETL Services joins after any aggregation has occurred.
Select [Orders.OrderID],[Orders.ShipName],[Orders.Freight] From [Orders] Inner Join Select [Order Details.OrderID],SUM([Order Details.Quantity]) From [Order Details] On [Orders.OrderID]=[Order Details.OrderID] Order By [Orders.ShipName],[Orders.OrderID]
ETL Services supports outer joins as well as inner joins. To use an outer join, use LEFT OUTER JOIN, RIGHT OUTER JOIN or FULL OUTER JOIN.
Tip: A maximum of 10 joins can be used in one ETL script.
Note: Because of limitations in Redshift databases, the use of ETL functions is not supported on the join keys when used on both sides of the join of a SELECT statement in a scripted source.
-
Example: Select [Orders.OrderID],[Orders.ShipName],[Orders.Freight] From [Orders] Inner Join Select [Order Details.OrderID],SUM([Order Details.Quantity]) From [Order Details] On ISNULL([Orders.OrderID])=ISNULL([Order Details.OrderID]) Order By [Orders.ShipName],[Orders.OrderID]
This is not supported in Redshift. We recommend creating intermediate scripts to apply the desired functions on the columns in the intermediate scripts, and then use the resulting columns in the new scripts to achieve the join.
Union/UnionAll
The ability to Union and UnionAll is supported in ETL Services input queries.
Tip: Union and UnionAll are not considered joins.
Important: The column data types match in order to merge data.
Select [Orders.OrderID] from [Orders]
UnionAll
Select [Order Details.OrderID] from [Order Details]
NULL Support
For sources with column mismatches, Birst supports using NULL as a placeholder for missing columns.
Select [Orders.OrderID],NULL from [Orders]
UnionAll
Select [Order Details.OrderID],[Order Details.Product ID] from [Order Details]
Referencing Union Columns in Script Portion
To reference a Union input query column in the script portion, you can use either an explicit reference or a column index value reference.
Explicit reference
Explicitly reference the data source and
column name such as:
[Order ID]=[Orders.OrderID]
Only columns from the data source mentioned first
in the SELECT statement can be explicitly referenced. If columns from
other data sources in the SELECT statement are explicitly referenced,
they will populate NULL values.
Column index value reference
Use the column index value to reference using the GETCOLUMNVALUE() function. To assign the Product ID to the output column use the index value to grab the Product ID column from the input query.
SELECT statement:
SELECT [Orders.OrderID],NULL from [Orders]
UnionAll
Select [Order Details.OrderID],[Order Details.Product ID] from [Order Details]
Script:
[Product ID]=GETCOLUMNVALUE(1)
WRITERECORD
Filtering
Filters can be applied to each SELECT statement by adding a WHERE clause.
The supported operators in a WHERE clause are:
=, <>,>, <, >=, <=, AND, OR, &&, IN/NOTIN, ISNULL/ISNOTNULL, LIKE/NOTLIKE (% wildcard is supported)
The supported functions in a WHERE clause are:
DATEDIFF(), DATEADD(), DATEPART(), IFNULL()
For example:
SELECT [Orders.OrderID], [Orders.OrderDate], [Orders.RequiredDate], [Orders.ShippedDate] FROM [Orders] WHERE DATEDIFF(DAY,[Orders.OrderDate],[Orders.ShippedDate])>100 AND IFNULL([Order Details.UnitPrice],0.0) <> 0.0
Example using a variable in the WHERE clause:
SELECT [Orders.OrderID], [Orders.OrderDate], [Orders.RequiredDate], [Orders.ShippedDate] FROM [Orders] WHERE [Orders.OrderDate]<GetVariable('MyDateVariable')
ETL Services Overview
Birst ETL Services Script
Functions
Creating
a Scripted (ETL) Data Source