Using the data object retrieval APIs
Data Lake is a scalable, elastic object store for capturing raw data in its original and native format. Data Lake provides interfaces for these tasks:
- Retrieving a list of data objects stored in Data Lake
- Retrieving a single data object by a data object ID
- Marking an object as corrupt
- Retrieving statistical information about a stored object
When you retrieve data from Data Lake with the Retrieval APIs, we recommend that you follow these best practices:
- Use index timestamps, such as
dl_document_indexed_date
, for incremental retrieval patterns. - Apply a 5-second lag interval from the highest indexed timestamp that is referenced during a call. This is to ensure a comprehensive retrieval of data objects and limitation of incidents of data objects overlooking.
- In an API call, sort data objects in an ascending order of an indexed timestamp to allow the last object's timestamp to be referenced.
- Whenever possible, refrain from using wildcard searches.
- Whenever possible, include
dl_document_name
as part of your filter for the data objects to retrieve from Data Lake.
Interface and consumption methods are exposed through the Data Lake API Service registered within the Data Fabric Suite in API Gateway. For more information on how to use API Gateway and how to interact with Swagger documentation for the API methods, see ION documentation.
By default, the content in Data Lake is stored, and streamed to
clients, in a compressed state. For exceptionally large content retrievals, especially
through the /dataobjects/byfilter
API, this deflating content method
ensures that performance of the gateway and requesting clients remains nominal.
Authorized API applications and RESTful API clients that are used for API testing can
advertise supported content encoding to the server. To stream and persist data in a
compressed format, the requesting party can configure their request with this request
header: Accept-Encoding: deflate
. With the identity
value
in a request HTTP header, clients can stream their requested content with no encoding in
place. This setting is typically configured with this request format:
Accept-Encoding: identity
. Not all clients support the
identity
value. See your API application or client’s documentation to
determine whether these request HTTP header values are supported.