Merge Columns from multiple spreadsheets to a prepared source, transformation tutorial
Note: The steps in this section are an example.
-
In Modeler Prepare, locate the CorpData
Customers table.
This was the Customers tab in the original spreadsheet. Rest your cursor over the list items and use the popup to find the file, then click on it.Verify the title turns blue to indicate that it is selected.
- Rest your cursor to the right of the item in the list to see, and then click Create a New Prepared Source.
-
Specify the Name of the source, then
click Confirm.
Modeler creates the new prepared source and it is ready for you to edit it.In our tutorial scenario, the manager wants to put additional customer info into the same table as the corporate customer table. This means that the columns in the local spreadsheet need to be appended and the rows matched up with the corporate information. You can use a transform in Modeler Prepare to do this.
- In the new prepared source, click Add Step in the pipeline.
- Select the Merge transform. The Merge window is displayed.
- Select the MyPromos_xlsx MyCustomers table. You may have to scroll to see it.
-
To create a merge, and to make sure the rows and columns end up in the right
organization, a shared or key column needs to be identified. This is called a
"join column". The column titles do not have to match but the data should be
pretty much the same.
In this example, both spreadsheets used the same CustomerID keys. Keys are meant to be unique, so if two spreadsheets use the same keys, they can be merged together in a logical way. It is important that at least one of the two columns has only one row for each unique key, otherwise the data may not match up correctly.Note: Part of deciding how to use the join depends on what rows the final prepared source should contain. This is controlled by the type of join. For example, the default inner join will match the rows between the left and right tables, and return only the ones that contain the same CustomerID in both columns. An outer join would return all the rows from each table, a right join would result in all the rows from the right and any rows that match the same CustomerID from the left, and a left joint would result in all the rows from the left table and any rows that match the same CustomerID from the right table.In this scenario, the manager does not care about all the rows that are not from the local customers, and does care about getting all the columns from the corporate data. So an inner join is the best choice.
-
Click the Join, then select
Inner Join.
The merge is defined.
-
Click Apply.
Modeler compares the two tables and merges them based on the join, and hides the deselected columns. It returns the results, in this case the 5 local customers with all the data from corporate, augmented by the email and Twitter columns provided by the manager. Modeler also shows the merge step in the pipeline below the data.