Query processing
During data conversion, the data of Data Lake objects, also called payloads, is read and converted into a format that is optimized for query processing. This process involves retrieving the Data Lake DSV and newline-delimited JSON object data and converting it into Compass-formatted tables. Part of this process is partitioning the data to improve query efficiency.
Compass supports these query processing modes:
- Transactional mode
- Analytical mode
The modes define how incoming Data Lake objects are processed during conversion, including record-level deduplication and evaluation of existing converted data during new data ingestion.
The modes define how data is grouped and referenced by the lastmodified property, which you can access with the infor.lastmodified() function. The lastmodified property indicates the date and time when an object becomes displayed in Data Lake after the object was stored. This partitioning approach is relevant for incremental loads and ensures that only newly-ingested data is processed in subsequent queries.
The Data Catalog object metadata is vital to the data conversion process. The object metadata provides the instructions used to process the data object schema into a table and column structure used by the queries. Several components of the metadata definition that are used are noted in the query considerations and best practices.
When Data Catalog data object definitions are updated, you may have to clear the current data storage definition or the Compass data storage, or both.
For more information, see Data administration stored procedures.
Data objects are converted on demand when a query is run. The data objects that are referenced in the query are compared against the latest available data in Data Lake. Any Data Lake objects that haven't yet been converted are converted at query time to ensure that all available data is processed. If a data conversion error occurs, the query fails.
After data conversion is complete, query processing starts. The processing includes preparing the query, running the query, and preparing the query results.
During query preparation, the data objects and properties that are referenced in the query are validated against the Data Catalog object metadata. Query hints and syntax are validated and translated as required. The query is then run and the results are prepared for return.
For information on how to handle data conversion failures and errors in query processing, see Data Lake query error handling.