Best practices to load data from the source system to Infor d/EPM OLAP
This section describes the best practices to adopt for loading structured data from the local database to Data Lake and then to the Infor d/EPM custom OLAP cubes.
This diagram shows an example of a custom integration data flow:
Data from an on-premises database is sent to Data Lake. Then the data is imported to the Infor d/EPM Staging database and fed to the OLAP cubes.
The best practices topics contain this terminology:
Term | Description |
---|---|
Dimension | Structure that categorizes fact data and measures. In financial terminology, a dimension can be accounts, entities or calendar settings. |
Fact data | Values that are related to dimensions (metadata). For example, fact data can be financial amounts for specific entities and accounts for the specific period. |
Identifier | Identifies a record. |
Variation | Determines which version of a record with a certain identifier is the latest. The latest is the variation with the highest value. |
We recommend that, to use the described practices, you store metadata (dimensions) and fact data in separate relational tables. Optionally, you can define relationships between those tables in a mapping table.
Example
In this example, some adventurers drink coffee every day. The example describes how to collect data about their coffee consumption and then analyze the data in Infor d/EPM application.
For example, you can create these relational tables in your on-premises database:
- Dimensions:
- A Calendar dimension that stores calendar information:
Date_NK Date WeekOfYear WeekOfYearNr MonthOfYear QuarterOfYear QuarterOfYearNr YearISO 20210101 01.01.2021 w01 1 Jan Q1 1 2020 20210102 02.01.2021 w01 1 Jan Q1 1 2020 20210103 03.01.2021 w02 2 Jan Q1 1 2020 20210104 04.01.2021 w02 2 Jan Q1 1 2021 20210105 05.01.2021 w02 2 Jan Q1 1 2021 Date_NK is a calendar numeric key, which is later used in the relational table with fact data.
- An Adventurer dimension with a list of adventurers who drink coffee:
ID Name ParentID Weight Order LangEN 100 All adventurers 100 All adventurers 101 Mark Bean 100 1 101 Mark Bean 102 Tony Roast 100 1 102 Tony Roast 103 Stephanie Aroma 100 1 103 Stephanie Aroma - A Coffee dimension with a list coffee types that are drunk by adventurers:
ID Name ParentID Weight Order LangEN 10 All coffee 10 All coffee 100 Mocha pot 10 100 Mocha pot 101 Ethiopia 100 1 101 Ethiopia 102 Kenya 100 1 102 Kenya 103 Colombia 100 1 103 Colombia 104 Costa Rica 100 1 104 Costa Rica 105 Other Mocha 100 1 105 Other Mocha 300 Espresso 10 300 Espresso - A Measure dimension with measures:
Measure MeasureDesc Price Average price per cup in Czech crowns (CZK), for example Quantity Number of cups Spending In CZK, for example Measures can be included in a fact data table in which they determine the meaning of values, such as a quantity or price.
- A Calendar dimension that stores calendar information:
- Fact data:
- An Adventurer table that shows coffee consumption by adventurers per day. This table
shows only a sample of data for the beginning of a year:
Calendar Adventurer Coffee Measure Value 20210101 101 101 Quantity 1 20210102 101 102 Quantity 2 20210101 102 103 Quantity 2 20210103 102 104 Quantity 3 20210102 103 105 Quantity 5 20210103 103 300 Quantity 3
- An Adventurer table that shows coffee consumption by adventurers per day. This table
shows only a sample of data for the beginning of a year:
To streamline data loads from the source system to Data Lake, load only new or changed data instead all data records. To do that, you can either prepare tables with only new or changed data, or configure the existing source tables and set incremental keys for them.
If you set up an incremental key, the queries to the source system search only for new records and records that have been updated since the last import. The data loads are faster and take less Data Lake storage space.
The table from which you load data incrementally must have an identity column. For example, RecordID. That column contains unique numbers that are generated automatically when new records are inserted into a table.
For example, you can create an export table (CA_FactCoffeeAdventure_export) with this structure:
RecordID | ID | Calendar | Adventurer | Coffee | Measure | Value | ModifyDate | ToDelete | InsertDate |
---|---|---|---|---|---|---|---|---|---|
1 | 20210101 101 101 Quantity | 20210101 | 101 | 101 | Quantity | 1 | 01 02 2021 22:53 | 0 | 16 02 2021 17:47 |
2 | 20210102 101 102 Quantity | 20210102 | 101 | 102 | Quantity | 2 | 01 02 2021 22:53 | 0 | 16 02 2021 17:47 |
3 | 20210101 102 103 Quantity | 20210101 | 102 | 103 | Quantity | 2 | 01 02 2021 22:53 | 0 | 16 02 2021 17:47 |
4 | 20210103 102 104 Quantity | 20210103 | 102 | 104 | Quantity | 3 | 01 02 2021 22:53 | 0 | 16 02 2021 17:47 |
5 | 20210102 103 105 Quantity | 20210102 | 103 | 104 | Quantity | 5 | 01 02 2021 22:53 | 0 | 16 02 2021 17:47 |
6 | 20210103 103 300 Quantity | 20210103 | 103 | 300 | Quantity | 3 | 01 02 2021 22:53 | 0 | 16 02 2021 17:47 |
To take all records with a modified date that is later than the last export time stamp and to insert those records into export tables, you can write this procedure, for example:
CREATE PROCEDURE [dbo].[pr_CA_FactCoffeeAdventure_export]
AS
BEGIN
SET NOCOUNT ON;
-- Define last export date and time of the start of the procedure
DECLARE @PreviousExportDate datetime
DECLARE @CurrentExportDate datetime
SET @PreviousExportDate = COALESCE((SELECT MAX([ExportDate]) FROM [dbo].[table_exports_timestemps] WHERE [TableName] = 'CA_FactCoffeeAdventure'),'01/01/1900')
SET @CurrentExportDate = GETDATE();
-- Inserting data modified since the last insert.
BEGIN
INSERT INTO [dbo].[CA_FactCoffeeAdventure_export]
(ID
,[Calendar]
,[Adventurer]
,[Coffee]
,[Measure]
,[Value]
,[ModifyDate]
,[ToDelete]
)
SELECT CONVERT(varchar(8),Calendar) + '_' + CONVERT(varchar(8),Adventurer) + '_' + CONVERT(varchar(8),Coffee) + '_' + Measure
,[Calendar]
,[Adventurer]
,[Coffee]
,[Measure]
,[Value]
,[ModifyDate]
,[ToDelete]
FROM [dbo].[CA_FactCoffeeAdventure]
WHERE [ModifyDate] > @PreviousExportDate
AND [ModifyDate] <= @CurrentExportDate
END
-- Logging the current export
BEGIN
INSERT INTO [dbo].[table_exports_timestemps]
([TableName]
,[ExportDate])
VALUES
('CA_FactCoffeeAdventure'
,@CurrentExportDate)
END
END
GO
See Knowledge Base Article 2235440 on the Infor Support Portal. The article contains files that support this example. That is, the entire script to create tables and procedures and an Excel sheet with the data set on which the example is based.