Deleting data from Data Lake

You can remove unwanted data objects that are stored in Data Lake.

Purged objects cannot be retrieved from Data Lake. Purging of data objects can be done in these ways:

  • In Atlas, select one or more data objects and click the Purge button or icon.
  • In Purge, use the advanced filters to search for data objects to purge.
  • In Purge, use the unique data object ID(s) to purge.
  • Use the Delete APIs available in the Data Lake endpoint.

After a purge event starts it takes some time to complete.

When a purge process completes, the matching reformatted Compass data is cleared. The Compass data is cleared for the affected object names since the oldest store date of the purged data objects. Running the next Compass query over the same object names can take some time. Existing data objects must be reformatted and made available in Compass again. If a purge process fails, an error is displayed. You must clean the Compass cache by running the clear_data stored procedure and then perform the purge again.

Note: The clearing of Compass data is not applicable for AWS GovCloud.

You cannot revert an active purge event. Nonetheless, a purging process can be stopped by a user to prevent any further data objects from being purged. Any objects that were already purged before stopping the process cannot be restored and are permanently removed from Data Lake. If the event is not stopped, the purge activity continues until all objects that are defined by the purging parameters are removed from Data Lake.

Stopping a purge event results in a partially completed purge. You can verify the status of a purge event in the purge logs.