Query processing
Compass queries are processed through a series of steps before the query returns results. The primary query steps are converting or transforming the Data Lake data, and running the query.
The data conversion steps read the data of Data Lake objects,
also called payloads, and convert it to store it more efficiently for query purposes. This
process involves retrieving the Data Lake DSV and
newline-delimited JSON object data and converting it. The process also partitions the data.
The partitions divide the data, making it more efficient for a query to read. The
partitioning method groups the data by the Data Lake addition
date, referenced by the lastmodified
property. This is the date that the
data object is added to Data Lake. This partitioning scheme is
relevant for incremental data loads.
The Data Catalog object metadata is vital to the data conversion process. The object metadata provides the instructions used to process the data object schema into a table and column structure used by the queries. Several components of the metadata definition that are used are noted in the query considerations and best practices.
When Data Catalog data object definitions are updated, it might be necessary to clear the current data storage definition, the Compass data storage or both.
See Data administration stored procedures.
Data objects are converted on-demand when a query is run. The data objects referenced in the query are compared to the latest data in Data Lake. Any objects stored in Data Lake that are not converted, are converted when the query executes. This is to ensure that the query retrieves all data available for the query. A query fails for a data conversion error.
The query execution steps involve preparing the query, executing the query, and preparing the results.
To prepare the query, the data objects and properties referenced in the query are validated against the Data Catalog object metadata, the query hints and syntax are validated and converted, the query is executed, and the results are prepared.
For information on how to handle data conversion failures and errors in the query execution steps, see Data Lake query error handling.