Staging Upsert

The Staging Upsert step is used to map the Data Lake data to the staging or target database. It sets the identifier path and variation management properties from the data catalog. The staging upsert step evaluates each record against the identifier path in the staging or target database for existence. If the record does not exist, the record that is coming from Data Lake is inserted into the staging or target database. If the record does exist, the variation property is evaluated. If the incoming Data Lake record has a higher variation than the existing record, then the record in the on-premises database is updated with the newer Data Lake record. Otherwise the Data Lake record is ignored.

Adding the output type StagingUpsert to your transformation:

  1. Click Design > Output.
  2. Double-click or drag the StagingUpsert step to the Transformation tab.
    1. Link the DataLakeInput (Read_CSYCAL) step to the StagingUpsert step.
    2. Hover over the DataLakeInput step, click the output connector icon.
    3. Drag the arrow over to the StagingUpsert step and select Main output of step.
  3. Configure the StagingUpsert step. Double-click the step to open the StagingUpsert dialog box.
  4. Specify this information:
    Step name
    Update the step name - Write_{object}, for example: Write_CSYCAL.
    Connection
    Use the drop-down list to select the previous connection created during the Table Input step. If that step was not completed, complete the connection setup with the procedure in Table input.
    Target schema
    If a target database schema is used, specify the schema name in this field. If dbo is used, this field can be left blank.
    Target table
    Specify the target table name where the DataLake object data is written. Convenient is to utilize the same name as the DataLake object.
  5. Click Get default.
    The identifier path properties are retrieved and placed into the Key(s) to look up section. The VariationNumber field is updated with the variation path from the data catalog. If any name changes were made during the DataLakeInput step in the flow, the table field and stream field columns reflect these changes.
  6. Click Get update fields.
    This action retrieves the streamed field(s) from the DataLakeInput step and map them one by one into the target field names.
  7. Click OK, the StagingUpsert step is now configured.
  8. Return to the StagingUpsert (Write_CSYCAL) step.
  9. Click SQL, the Simple SQL editor dialog box is displayed that shows a Create Table and Create Index script.
    1. Click Execute to create the staging / target table. A Results dialog box is displayed with the success or failure of the execution.
    2. Click OK to close the Results dialog box.
    3. Click Close to close the Simple SQL editor dialog box.
    4. Click OK to close the Staging Upsert dialog box
  10. Link the Table Input (Get_DLDocDt_CSYCAL) step to the DataLakeInput (Read_CSYCAL) step.
    1. Hover over the DataLakeInput step, click the output connector icon.
    2. Drag the arrow over to the DataLakeInput (Read_CSYCAL) step and click the step to set the link.
  11. Test the Table Input (Get_DLDocDt_CSYCAL) step.
    1. Double-click the Table Input step to open the Table Input dialog box.
    2. Click Preview in the Table Input dialog box.
    3. Click OK to verify the number of returned rows. The Examine preview data dialog box shows the maximum DL Document Date from the specified table or the default date from the kettle.properites file if there is no data in the table.
  12. Click Close and click OK.