Setting Transaction and Snapshot Policies for a Data Source

You can set various transaction and snapshot policy options for data sources.  The different policy options have different effects on the physical queries. These policies apply to both raw data sources and scripted sources.

There are seven commonly used policies. The implications of each one are described in this topic, to help you choose the right transaction and snapshot policy for each of your data sources.

To set the transaction and snapshot policy for a data source

1. Go to Admin - Define Sources - Manage Sources.
2. Select a data source.
3. Click the Properties tab. The snapshot and transaction policies are in the Processing Directives section.

Option 1: Both Rows in Data Source are Transactions (vs. Snapshot) and Keep Data from Last Upload Only Options are Unchecked

Option 2: Both Rows in Data Source are Transactions (vs. Snapshot) and Keep Data from Last Upload Only Options are Unchecked and Show All Snapshots is Checked

Option 3: Only Keep Data from Last Upload Only is Checked

Option 4: Both Keep Data From Last Upload Only and Rows in Data Source are Transactions (vs. Snapshot) are Checked

Option 5: Only Rows in Data Source are Transactions (vs. Snapshot) is Checked and No Delete Keys are Specified

Option 6: Only Rows in Data Source are Transactions (vs. Snapshot) is Checked and Delete Keys are Specified

Option 7: Snapshot Policy is Defined under Advanced Options

Option 1: Both Rows in Data Source are Transactions (vs. Snapshot) and Keep Data from Last Upload Only Options are Unchecked

Option 1 is the default setting. In this case, no records are deleted or updated in the corresponding fact table in the warehouse during data processing. By default, the corresponding report query filters the data for the last load number only.

Logical Query

SELECT [Time.Year] 'F1', [ShippedDate: Sum: Freight] 'F3' FROM [ALL]

Corresponding Physical Query

Notice the filter on Load ID in the physical query.

SELECT DW_DM_TIME_DAY1_.Year$ AS 'F1', SUM(DW_SF_DAY_EMPLOYEES_ORDERS_SHIPPERS0_.Freight$) AS 'F3'

FROM S_N22a52811_6c49_4775_9b12_5fa1cf95494b.DW_SF_DAY_EMPLOYEES_ORDERS_SHIPPERS DW_SF_DAY_EMPLOYEES_ORDERS_SHIPPERS0_

INNER JOIN dbo.DW_DM_TIME_DAY DW_DM_TIME_DAY1_ ON DW_SF_DAY_EMPLOYEES_ORDERS_SHIPPERS0_.Time$ShippedDate_Day_ID$=DW_DM_TIME_DAY1_.Day_ID$

WHERE DW_SF_DAY_EMPLOYEES_ORDERS_SHIPPERS0_.LOAD_ID=2

GROUP BY DW_DM_TIME_DAY1_.Year$

Use Case for Option 1

You would use this option if you want to take snapshots of your data each time the data is loaded and processed. Since snapshots are being taken, you do have the ability to query all loads for desired dimensions by using the Show All Snapshots option as explained in Option 2.

The reason this is the default setting in Birst is in case you are loading all the data each time, versus loading only incremental data. Birst reports on the latest data by default so that you don’t get duplicate data in your reports.

If the data source contains a large amount of data, this option can create large data volumes if this source is loaded daily. This in turn can make queries run slower. See Option 7 for information about setting a snapshot policy which lets you specify how often to take snapshots of your data.

Option 2: Both Rows in Data Source are Transactions (vs. Snapshot) and Keep Data from Last Upload Only Options are Unchecked and Show All Snapshots is Checked

When the Rows in Data Source are Transactions (vs. Snapshot) and Keep Data from Last Upload Only options are not checked (as in Option 1), you can enforce the query to run on all the snapshots for certain desired dimensions by checking the Show All Snapshots checkbox for the corresponding hierarchy under Admin, Define Sources, Hierarchies as shown below.

Logical Query with a Snapshot Dimension

SELECT [Time.Year] 'F1', [Snapshot Date.Snapshot Date] 'F3', [ShippedDate: Sum: Quantity] 'F5' FROM [ALL]

Corresponding Physical Query

Notice that there is no filter on Load ID in the physical query.

SELECT DW_DM_TIME_DAY1_.Year$ AS 'F1', DW_DM_SNAPSHOT_DATE_SNAPSHOT_DATE2_.Snapshot_Date$ AS 'F3',SUM(CAST(DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS_SHIPPERS_SNAPSHOT_DATE_SUPPLIERS0_.Quantity$ AS BIGINT)) AS 'F5'

FROM S_ece38cd6_d99f_4a73_b247_a5286aef86e6.DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS_SHIPPERS_SNAPSHOT_DATE_SUPPLIERS DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS_SHIPPERS_SNAPSHOT_DATE_SUPPLIERS0_

INNER JOIN dbo.DW_DM_TIME_DAY DW_DM_TIME_DAY1_ ON DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS_SHIPPERS_SNAPSHOT_DATE_SUPPLIERS0_.Time$ShippedDate_Day_ID$=DW_DM_TIME_DAY1_.Day_ID$

INNER JOIN S_ece38cd6_d99f_4a73_b247_a5286aef86e6.DW_DM_SNAPSHOT_DATE_SNAPSHOT_DATE DW_DM_SNAPSHOT_DATE_SNAPSHOT_DATE2_ ON DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS_SHIPPERS_SNAPSHOT_DATE_SUPPLIERS0_.Snapshot_Date$Snapshot_Date1461374132$=DW_DM_SNAPSHOT_DATE_SNAPSHOT_DATE2_.Snapshot_Date1461374132$

GROUP BY DW_DM_TIME_DAY1_.Year$, DW_DM_SNAPSHOT_DATE_SNAPSHOT_DATE2_.Snapshot_Date$

Use Case for Option 2

You would use this option if you have chosen to load data as snapshots and want to query historical data (data that was loaded in the past) for certain dimensions in your reports.

Option 3: Only Keep Data from Last Upload Only is Checked

Selecting this option truncates and loads the corresponding fact table in the warehouse during data processing. Note that this has no impact on loading the dimension tables from this source. Dimension tables will never be truncated (to ensure reports do not get broken since dimension tables can be joined to other dimension tables). The report displays fact data from the last load only as all previous loaded data has been truncated.

Use Case for Option 3

You would use this option if you only want to query the most recent data from this source and do not want to maintain historical data. No snapshots are taken when Keep Data from Last Upload Only is selected.

Option 4: Both Keep Data From Last Upload Only and Rows in Data Source are Transactions (vs. Snapshot) are Checked

This behaves the same as Option 3 when only Keep Data from Last Upload Only is checked.

Use Case for Option 4

There is no reason to check Rows in Data Source are Transactions (vs. Snapshot) if Keep Data from Last Upload Only is checked. Checking Rows in Data Source are Transactions (vs. Snapshot) does not make any difference in this case. Either way, snapshots are not kept when Keep Data from Last Upload Only is checked.

Option 5: Only Rows in Data Source are Transactions (vs. Snapshot) is Checked and No Delete Keys are Specified

Important: Use caution when selecting this option. You should only use this option if you can ensure that unique information is being uploaded into Birst. Specifying delete keys, as described in Option 6, is the more common practice.

Delete keys can be set in Advanced Options by selecting the Bulk Insert and Delete Measure Records option. (See Specifying Delete Keys for more information about delete keys.) If no delete keys are specified, there is no truncate/delete/update on the corresponding fact table during data processing. The report will display data from all load numbers. When using this option (without specifying the delete keys), it must be ensured that only incremental data is uploaded to Birst, with no overlap between two different uploads, as this will cause duplication of data in the corresponding fact table.

Notice that there is no Load ID filter in the physical query.

Logical Query

SELECT [Time.Year] 'F1',[ShippedDate: Sum: Freight] 'F3' FROM [ALL]

Corresponding Physical Query

SELECT DW_DM_TIME_DAY1_.Year$ AS 'F1',SUM(CAST(DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS_SHIPPERS_SUPPLIERS0_.Quantity$ AS BIGINT)) AS 'F3'

FROM S_N22a52811_6c49_4775_9b12_5fa1cf95494b.DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS_SHIPPERS_SUPPLIERS DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS_SHIPPERS_SUPPLIERS0_

INNER JOIN dbo.DW_DM_TIME_DAY DW_DM_TIME_DAY1_ ON DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS_SHIPPERS_SUPPLIERS0_.Time$ShippedDate_Day_ID$=DW_DM_TIME_DAY1_.Day_ID$

GROUP BY DW_DM_TIME_DAY1_.Year$

Use Case for Option 5

You would use this option if:

The source contains transactional data (for example, data from an ordering system).

You want your reports to display data from all load numbers.

Only incremental data is being uploaded into Birst.

You can ensure that unique information is being uploaded into Birst.

Option 6: Only Rows in Data Source are Transactions (vs. Snapshot) is Checked and Delete Keys are Specified

Delete keys can be set in Advanced Options by clicking the Advanced Options button at the bottom of the Properties tab and selecting the Bulk Insert and Delete Measure Records option as shown below.

When delete keys are specified, fact data is treated as transactions where the delete key combination is unique in the corresponding fact table in the warehouse. If a record with the same delete keys is uploaded again, then the existing record in the fact table is deleted and the new one is inserted. Report queries run on all the load numbers.

Notice that there is no Load ID filter in the physical query.

Logical Query

SELECT [Time.Year] 'F1',[ShippedDate: Sum: Freight] 'F3' FROM [ALL]

Corresponding Physical Query

SELECT DW_DM_TIME_DAY1_.Year$ AS 'F1',SUM(CAST(DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS_SHIPPERS_SUPPLIERS0_.Quantity$ AS BIGINT)) AS 'F3'

FROM S_N22a52811_6c49_4775_9b12_5fa1cf95494b.DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS_SHIPPERS_SUPPLIERS DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS_SHIPPERS_SUPPLIERS0_

INNER JOIN dbo.DW_DM_TIME_DAY DW_DM_TIME_DAY1_ ON DW_SF_DAY_EMPLOYEES_ORDER_DETAILS_PRODUCTS_SHIPPERS_SUPPLIERS0_.Time$ShippedDate_Day_ID$=DW_DM_TIME_DAY1_.Day_ID$

GROUP BY DW_DM_TIME_DAY1_.Year$

Use Case for Option 6

You would use this option if:

  • The source contains transactional data (for example, data from an ordering system).

  • You want your reports to display data from all load numbers.

  • You want to be able to update records in your fact tables. (For example, if an order is placed then modified the next day, when loading that order again you want to delete the previous record and insert a new one.)

See Specifying Delete Keys for more information about delete keys.

Option 7: Snapshot Policy is Defined under Advanced Options

In this scenario, neither the Rows in Data Source are Transactions (vs. Snapshot) or Keep Data from Last Upload Only boxes are checked (as in Option 1). Instead a snapshot policy is defined in the Advanced Options as shown below.

When a snapshot policy is defined, that source is processed only when data processing runs on the snapshot day or date selected. Otherwise it works the same as described in Option 1.

Use Case for Option 7

You would use this option if you want to control when snapshots are taken. With a snapshot policy, you can specify snapshots of the data to be taken on a daily, weekly, or monthly basis, rather than every time the data is uploaded. With Option 1 (the Rows in Data Source are Transactions and Keep Data from Last Upload Only boxes are not checked and no snapshot policy is defined), snapshots are taken every time the data source is processed. Daily snapshots can create large data volumes. You can specify how frequently to take snapshots based on your reporting needs and the volume of data. Snapshots are commonly set on a weekly or monthly (e.g., last day of month) basis.

See Also
Setting Data Source Processing Properties
Time Zone Shifting for Data Processing
Creating Hierarchies
Specifying Delete Keys
Processing Data