Query processing
Compass queries are processed through a series of steps before the query returns results. The primary query steps are converting, or transforming, the Data Lake data and executing the query.
The data conversion steps read Data Lake data objects, also called “payloads”, data and convert the data to store it more efficiently for query purposes. This process involves retrieving the Data Lake DSV and newline-delimited JSON object data and converting it. The process also segments the data into partitions. The partitions divide the data, making it more efficient for a query to read. The partitioning method groups the data by the Data Lake addition date, referenced by the lastmodified property. This is the date that the data object is added to the Data Lake. This partitioning scheme is relevant for incremental data loads.
The Data Catalog object metadata is critical to the data conversion process. The object metadata provides the instructions used to process the data object schema into a table and column structure used by the queries. Several components of the metadata definition that are used are noted in the query considerations and best practices.
When Data Catalog data object definitions are updated, it may be necessary to clear the current data storage definition, the Compass data storage or both.
See Data administration stored procedures.
Data objects are converted on-demand when a query is run. The data objects referenced in the query are compared to the latest data in the Data Lake. Any objects stored in the Data Lake that have not been converted, are converted when the query executes, to ensure that the query retrieves all data available for the query. A query will fail for a data conversion error.
The query execution steps involve preparing the query, executing the query, and preparing the results.
To prepare the query, the data objects and properties referenced in the query are validated against the Data Catalog object metadata, the query hints and syntax are validated and converted, the query is executed, and the results are prepared.
For information on how to handle data conversion failures and errors in the query execution steps, see Data Lake query error handling.