ETL Services Input Query (SELECT) Reference
This topic provides information about how to write the SELECT statement, the input query, for an ETL script. ETL Services provide significant flexibility when you specify an input query. Aggregation, filtering, sorting and joining can all be leveraged when specifying the data that ETL Services stream through when producing a new data source.
Sources
ETL Services can query three different types of data:
- Data sources
- Dimension levels
- Measure grains
In addition to processing input data, Birst has the advantage of querying data that already processed the Birst data set. This allows you to produce custom calculations on top of already processed data or to use data that had been loaded previously as part of a transformation.
ETL Services provides a convenient syntax for specifying these sources. To reference a data source, use its name:
SELECT [Order Details.OrderID], [Order Details.ProductID], [Order Details.UnitPrice], [Order Details.Quantity] FROM [Order Details]
Birst supports aliases for table names in scripts. Provide the alias in single quotes ('aliasname') after the table name. For example:
SELECT [Emp.EmployeeID], [Emp.LastName] FROM [Employees] 'Emp'
To reference dimension data from a given level, use the Level() operator. The level operator allows you to reference a hierarchy and a level as if it were a table:
SELECT [Level(Order Details.Orders).OrderID], [Level(Order Details.Orders).ProductID], [Level(Order Details.Orders).UnitPrice], [Level(Order Details.Orders).Quantity] FROM [Level(Order Details.Orders)]
To reference a measure grain, use the Grain() operator. The Grain operator allows you to refer to the grain of a data source rather than having to type all the levels that compose a grain:
SELECT [Grain(Order Details).OrderID], [Grain(Order Details).ProductID], [Grain(Order Details).UnitPrice], [Grain(Order Details).Quantity] FROM [Grain(Order Details)]
Aggregation
In addition to querying columns directly, numerical columns can be aggregated as well. ETL Services supports Sum, Count, CountDistinct, Avg, Min, and Max aggregation rules.
In the case below, to aggregate unit price and quantity by product, just add the aggregation rule operators:
Select [Order Details.ProductID], Avg([Order Details.UnitPrice]), Sum([Order Details.Quantity]) From [Order Details]
DISTINCT/TOP Operators
In addition to querying columns directly and using aggregations in the SELECT portion of the input query, Birst supports using the DISTINCT keyword for de-duping data and the TOP operator for returning the TOP N rows in the result set.
SELECT DISTINCT [Categories.CategoryID], [Categories.CategoryName], [Categories.Description], [Categories.Picture] FROM [Categories]
SELECT TOP 1 [Categories.CategoryID], [Categories.CategoryName], [Categories.Description], [Categories.Picture] FROM [Categories]
Sorting
To sort the incoming data by elements in the select list, use Order By.
Select [Order Details.OrderID], [Order Details.ProductID], [Order Details.UnitPrice], [Order Details.Quantity] From [Order Details] Order By [Order Details.ProductID]
Joining
Often, data from two sources must be combined to create a new transformed data set. ETL Services supports joining any two basic queries. The syntax is a little different from standard SQL syntax as ETL Services joins after any aggregation has occurred.
Select [Orders.OrderID],[Orders.ShipName],[Orders.Freight] From [Orders] Inner Join Select [Order Details.OrderID],SUM([Order Details.Quantity]) From [Order Details] On [Orders.OrderID]=[Order Details.OrderID]
ETL Services supports outer joins as well as inner joins. To use an outer join, use LEFT OUTER JOIN, RIGHT OUTER JOIN or FULL OUTER JOIN.
Select [Orders.OrderID],[Orders.ShipName],[Orders.Freight] From [Orders] Inner Join Select [Order Details.OrderID],SUM([Order Details.Quantity]) From [Order Details] On ISNULL([Orders.OrderID])=ISNULL([Order Details.OrderID])
Union/UnionAll
The ability to Union and UnionAll is supported in ETL Services input queries. Union and UnionAll are not considered joins.
Select [Orders.OrderID] from [Orders]
UnionAll
Select [Order Details.OrderID] from [Order Details]
NULL Support
For sources with column mismatches, Birst supports using NULL as a placeholder for missing columns.
Select [Orders.OrderID],NULL from [Orders]
UnionAll
Select [Order Details.OrderID],[Order Details.Product ID] from [Order Details]
Referencing Union Columns in Script Portion
To reference a Union input query column in the script portion, you can use either an explicit reference or a column index value reference.
- Explicit reference
- Explicitly reference the data source and column name such
as:
Only columns from the data source mentioned within the first SELECT statement can be explicitly referenced. If columns from other data sources in the SELECT statement are explicitly referenced, they will populate NULL values.[Order ID]=[Orders.OrderID]
- Column index value reference
- Use the GETCOLUMNVALUE() function to reference a column index value. To assign the
Product ID to the output column use the index value to grab the Product ID column from
the input query.
SELECT statement:
SELECT [Orders.OrderID],NULL from [Orders] UnionAll Select [Order Details.OrderID],[Order Details.Product ID] from [Order Details]
Script:
[Product ID]=GETCOLUMNVALUE(1) WRITERECORD
Filtering
Filters can be applied to each SELECT statement by adding a WHERE clause.
The supported operators in a WHERE clause are:
=, <>,>, <, >=, <=, AND, OR, &&, IN/NOTIN, ISNULL/ISNOTNULL, LIKE/NOTLIKE (% wildcard is supported)
The supported functions in a WHERE clause are:
DATEDIFF(), DATEADD(), DATEPART(), IFNULL()
For example:
SELECT [Orders.OrderID], [Orders.OrderDate], [Orders.RequiredDate], [Orders.ShippedDate] FROM [Orders] WHERE DATEDIFF(DAY,[Orders.OrderDate],[Orders.ShippedDate])>100 AND IFNULL([Order Details.UnitPrice],0.0) <> 0.0
Example using a variable in the WHERE clause:
SELECT [Orders.OrderID], [Orders.OrderDate], [Orders.RequiredDate], [Orders.ShippedDate] FROM [Orders] WHERE [Orders.OrderDate]<GetVariable('MyDateVariable')