Joins

Joins in Birst
Bridge Tables
Snowflake Joins
Redundant Joins

Joins in Birst

A join specifies a relationship between a dimension table definition and a measure table definition or between two dimension table definitions. The join relationship is defined in terms of columns from each of those table definitions. Each column needs to be qualified such that Birst knows which underlying table it comes from. For example, when joining the OrderDetails table definition to the Products table definition, there is a key/foreign-key relationship on ProductID that can be specified as below:

"products".ProductID="OrderDetails".productID

Each logical table definition name is surrounded by quotes. Logical names are used because table definitions may refer not only to single physical tables, but to compound tables and opaque views. With the join condition above added between these two tables, an adhoc query selecting “Product Name” and “Revenue” would result in the following physical query:

SELECT Products2_.ProductName AS 'F1_ProductName',SUM(OrderDetails0_.Quantity*OrderDetails0_.UnitPrice) AS 'F3_Revenue' FROM dbo.OrderDetails OrderDetails0_ INNER JOIN dbo.Orders Orders1_ ON Orders1_.OrderID=OrderDetails0_.OrderID INNER JOIN dbo.Products Products2_ ON Products2_.ProductID=OrderDetails0_.ProductID GROUP BY Products2_.ProductName

Each column must be qualified by the name of the table definition -- not the physical table name. (In this case they are the same, but in practice this is not always the case.) This tells the analysis engine which table definition and table source to refer to for this join.

If a column in the join relationship is only qualified by the table definition name, it is assumed to belong to the primary (first) table defined in the table source for that table definition. If, however, the table definition uses a compound table source (multiple physical tables) and the join relationship requires a reference to a column that is not in the primary table, the table name that it comes from must be qualified as well. This can be seen in the case where the Order Details table definition is joined to the Time dimension. Since the date information is contained in the Orders table, it must be qualified. For example, if there were a Month table definition that had a date column MO_END_DATE:

Month("OrderDetails".Orders.OrderDate)=Month("month".MO_END_DATE) AND Year("OrderDetails".Orders.OrderDate)=Year("month".MO_END_DATE)

Bridge Tables

Some joins are a little more complicated and require joining through one or more other tables to complete. The Categories table definition is an example of this. The physical table OrderDetails contains a foreign key to Products, but does not contain a key to the Categories table. It is therefore necessary to join OrderDetails to Products first. Then we must join Products to Categories to establish the join relationship. The Birst join relationship syntax allows for this. Tables that are not present in either of the dimension table definition or the measure table definition can be added to the join condition. Simply fully qualify columns used from those tables with the physical name. For example, the join condition between Categories and OrderDetails would be:

"OrderDetails".ProductID=products.productID AND products.CategoryID="categories".CategoryID

Notice that the table Products is referred to directly in the join condition. As long as columns from bridge tables are fully qualified, they will be appropriately treated in the queries sent to the database.

Snowflake Joins

If you wanted to understand Unit Sales by both Product Category and Product Name together, a bridge table approach may not be the best method. If the columns in the categories table are the only ones present in its table definition, then a query that includes both Category Name and Product name will have two joins, one to Products to get Product Name and the other through Products again to Categories to get the Category Name.

This approach has a drawback, however. It requires all useful joins by measure tables to Products to be duplicated through to Categories – making maintenance more difficult. Ideally, it is better to state the join relationship between Products and Categories, and let the engine figure out when to use it. Indeed, that is what snowflake joins do. (A snowflake refers to a table that is joined to the main dimension table in a star-schema structure in data warehousing – in effect turning a star into a snowflake). The engine allows you to create a single join between Products and Categories. The engine will determine if either the Products table or the Categories table or both is needed based on the query.

Important: A snowflake join is only valid between tables within a single dimension.

Redundant Joins

Sometimes a physical table or table source can serve as both a source for measures and for dimension columns. In such a case it must be mapped as two table definitions: one for a measure table definition and one for a dimension table definition. However, while it is possible and perfectly legal to do a self-join, it may not be as efficient. Infor provides a mechanism for creating a join between two table definitions that are, in fact, the same physical source: a redundant join. If a join is labeled as such (as shown in the Complex Join Definition dialog box below) the implied self-join is removed and a simpler query is sent to the database with only one reference to that table source.

See Performing a Complex Join for information on how to set a redundant join.