Star schema
In the data warehouse database, data is arranged into hierarchal groups called dimensions and into facts and aggregate facts. The combination of facts and dimensions is called a star schema.
A star schema contains a fact table and one or more dimension tables. Fact and dimension tables can be graphically represented in a star-like formation. A simple star schema is displayed:
Fact table
A fact table contains at least two types of columns:
- Keys relate rows in the fact table to a row in each keyed dimension table.
- Measures are the business metrics required for analysis by your users. Measures can be facts or degenerate dimensions.
For example, in the illustration, orderQty and unitPrice are facts, while salesOrderNo is a degenerate dimension (though it is still considered a measure).
The fact table provides the core information about a metric or set of metrics.
For example, using the column names in the illustration above, a user can analyze specific products (product_ID), sold to specific customers (customer_ID), by specific sales employees (salesemp_ID), on specific dates (calendar_ID).
The measures in the illustration provide sales order number, order quantity and unit price for all valid intersections of the four keys.
Dimension table
A dimension table contains two types of columns:
- Keys relate a row in the dimension table to zero or more rows in the fact table.
- Dimensions provide context or meaning to each keyed fact. Dimensions are similar to attributes in standard RDBMs.
The dimensions of the dimension table add context to the facts. For example, in the illustration, the dimensions of the product table add context to the related product_ID in the fact table.
The star schema is a core set of metrics or facts that pull context from related dimension tables.
You can use a dimension table in more than one DWD solution.
For this reason, fact tables can present significant storage issues. Infor recommends you keep the number of keys to the minimum needed to meet user requirements.
Surrogate keys
Typically, data is loaded into the dimension tables from different data sources. Because of the possibility of identical key values being loaded from different OTLP tables, we recommend that you create surrogate keys for each dimension table and use the surrogate key to make the relationship to the fact table.
The dimension table will then have two keys:
- A key from the data source (the original OLTP key)
- A surrogate key, generated during the ETL process
The surrogate key enables you to include additional data sources later without having to consider the possibility of duplicate OLTP key values. In addition, these keys are transparent and easy to structure for reporting tools.
Dimensions and de-normalization
If the process of denormalizing OLTP data was uncontrolled, the number of columns in the dimension table would equal the number of columns in all the joined OLTP tables. To control the number of dimensions loaded, you manually select the columns in the OLTP sources to include in the dimension table when you build a DWD solution.
This column selection feature allows you to load only columns that your users need. Additional columns can be included in the load at a later date.
Multiple fact tables
It is possible to have more than one fact table in a star schema. Multiple fact tables are used when some, but not all, dimensions relate to some, but not all, facts.
Rapidly and slowly changing dimensions
Data warehouse analysis is ideally suited to monitoring trends in data. To monitor time-changing data, you include dimensions to track rapid changes and slow changes in data over time.
For example, sales figures are typically rapidly changing data. However, an employee's job title is typically slowly changing data. To maintain continuity of data in the warehouse, the star schema must include dimensions that track rapidly and slowly changing data. You can then build OLAP cubes to monitor rapidly changing data before, after, or during changes in slowly changing data.
For example, if a product is repackaged, you can monitor sales (rapid change) before, after, and across the repackaging date (slow change).