Best practices to load data from the source system to Infor EPM OLAP

This section describes the best practices to adopt for loading structured data from the local database to Data Lake and then to the Infor EPM custom OLAP cubes.

This diagram shows an example of a custom integration data flow:

Custom integration data flow diagram

Data from an on-premises database is sent to Data Lake. Then the data is imported to the Infor EPM Staging database and fed to the OLAP cubes.

The best practices topics contain this terminology:

Term Description
Dimension Structure that categorizes fact data and measures. In financial terminology, a dimension can be accounts, entities or calendar settings.
Fact data Values that are related to dimensions (metadata). For example, fact data can be financial amounts for specific entities and accounts for the specific period.
Identifier Identifies a record.
Variation Determines which version of a record with a certain identifier is the latest. The latest is the variation with the highest value.

We recommend that, to use the described practices, you store metadata (dimensions) and fact data in separate relational tables. Optionally, you can define relationships between those tables in a mapping table.

Example

In this example, some adventurers drink coffee every day. The example describes how to collect data about their coffee consumption and then analyze the data in Infor EPM application.

For example, you can create these relational tables in your on-premises database:

  • Dimensions:
    • A Calendar dimension that stores calendar information:
      Date_NK Date WeekOfYear WeekOfYearNr MonthOfYear QuarterOfYear QuarterOfYearNr YearISO
      20210101 01.01.2021 w01 1 Jan Q1 1 2020
      20210102 02.01.2021 w01 1 Jan Q1 1 2020
      20210103 03.01.2021 w02 2 Jan Q1 1 2020
      20210104 04.01.2021 w02 2 Jan Q1 1 2021
      20210105 05.01.2021 w02 2 Jan Q1 1 2021

      Date_NK is a calendar numeric key, which is later used in the relational table with fact data.

    • An Adventurer dimension with a list of adventurers who drink coffee:
      ID Name ParentID Weight Order LangEN
      100 All adventurers 100 All adventurers
      101 Mark Bean 100 1 101 Mark Bean
      102 Tony Roast 100 1 102 Tony Roast
      103 Stephanie Aroma 100 1 103 Stephanie Aroma
    • A Coffee dimension with a list coffee types that are drunk by adventurers:
      ID Name ParentID Weight Order LangEN
      10 All coffee 10 All coffee
      100 Mocha pot 10 100 Mocha pot
      101 Ethiopia 100 1 101 Ethiopia
      102 Kenya 100 1 102 Kenya
      103 Colombia 100 1 103 Colombia
      104 Costa Rica 100 1 104 Costa Rica
      105 Other Mocha 100 1 105 Other Mocha
      300 Espresso 10 300 Espresso
    • A Measure dimension with measures:
      Measure MeasureDesc
      Price Average price per cup in Czech crowns (CZK), for example
      Quantity Number of cups
      Spending In CZK, for example

      Measures can be included in a fact data table in which they determine the meaning of values, such as a quantity or price.

  • Fact data:
    • An Adventurer table that shows coffee consumption by adventurers per day. This table shows only a sample of data for the beginning of a year:
      Calendar Adventurer Coffee Measure Value
      20210101 101 101 Quantity 1
      20210102 101 102 Quantity 2
      20210101 102 103 Quantity 2
      20210103 102 104 Quantity 3
      20210102 103 105 Quantity 5
      20210103 103 300 Quantity 3

To streamline data loads from the source system to Data Lake, load only new or changed data instead all data records. To do that, you can either prepare tables with only new or changed data, or configure the existing source tables and set incremental keys for them.

Note: For incremental keys, we recommend that you use whole numbers that grow incrementally.

If you set up an incremental key, the queries to the source system search only for new records and records that have been updated since the last import. The data loads are faster and take less Data Lake storage space.

The table from which you load data incrementally must have an identity column. For example, RecordID. That column contains unique numbers that are generated automatically when new records are inserted into a table.

For example, you can create an export table (CA_FactCoffeeAdventure_export) with this structure:

RecordID ID Calendar Adventurer Coffee Measure Value ModifyDate ToDelete InsertDate
1 20210101 101 101 Quantity 20210101 101 101 Quantity 1 01 02 2021 22:53 0 16 02 2021 17:47
2 20210102 101 102 Quantity 20210102 101 102 Quantity 2 01 02 2021 22:53 0 16 02 2021 17:47
3 20210101 102 103 Quantity 20210101 102 103 Quantity 2 01 02 2021 22:53 0 16 02 2021 17:47
4 20210103 102 104 Quantity 20210103 102 104 Quantity 3 01 02 2021 22:53 0 16 02 2021 17:47
5 20210102 103 105 Quantity 20210102 103 104 Quantity 5 01 02 2021 22:53 0 16 02 2021 17:47
6 20210103 103 300 Quantity 20210103 103 300 Quantity 3 01 02 2021 22:53 0 16 02 2021 17:47

To take all records with a modified date that is later than the last export time stamp and to insert those records into export tables, you can write this procedure, for example:


CREATE PROCEDURE [dbo].[pr_CA_FactCoffeeAdventure_export]
AS
BEGIN
	SET NOCOUNT ON;
	
	-- Define last export date and time of the start of the procedure

     DECLARE @PreviousExportDate datetime
	DECLARE @CurrentExportDate datetime

     SET @PreviousExportDate = COALESCE((SELECT MAX([ExportDate]) FROM [dbo].[table_exports_timestemps] WHERE [TableName] = 'CA_FactCoffeeAdventure'),'01/01/1900')
	SET @CurrentExportDate = GETDATE();

     -- Inserting data modified since the last insert.
	BEGIN
          INSERT INTO [dbo].[CA_FactCoffeeAdventure_export]
             (ID
		   ,[Calendar]
             ,[Adventurer]
             ,[Coffee]
             ,[Measure]
             ,[Value]
             ,[ModifyDate]
             ,[ToDelete]
             )

          SELECT CONVERT(varchar(8),Calendar) + '_' + CONVERT(varchar(8),Adventurer) + '_' +  CONVERT(varchar(8),Coffee)  + '_' + Measure
		   ,[Calendar]
		   ,[Adventurer]
		   ,[Coffee]
		   ,[Measure]
		   ,[Value]
		   ,[ModifyDate]
		   ,[ToDelete]
		FROM  [dbo].[CA_FactCoffeeAdventure]
		WHERE [ModifyDate] > @PreviousExportDate
		    AND [ModifyDate] <= @CurrentExportDate
	 END

      -- Logging the current export
	 BEGIN
		INSERT INTO [dbo].[table_exports_timestemps]
              ([TableName]
              ,[ExportDate])
		VALUES
              ('CA_FactCoffeeAdventure'
              ,@CurrentExportDate)
	 END
END
GO

See Knowledge Base Article 2235440. The article contains files that support this example. That is, the entire script to create tables and procedures and an Excel sheet with the data set on which the example is based.