Recovery Workflow

The ORCA recovery workflow diagram below visualizes the process of recovering missing data in Cumulus. As each user organization will have unique needs when recovering data, the diagram provides details on which components can be modified and which components should not be modified. Refer to the legend on the diagram for details on which components can be modified.

A developer kicks off the recovery processes manually via the AWS UI, which then triggers an API that kicks off a recovery workflow. Recovery is an asynchronous operation since data requested from GLACIER can take 4 hours or more to reconstitute, and DEEP_ARCHIVE can take 12 hours. Since it is asynchronous, the recovery container relies on a database to maintain the status of the request and event driven triggers to restore the data once it has been reconstituted from archive into an S3 bucket. Currently data is copied back to the Cumulus S3 primary data bucket which is the default bucket. The developer has the option to override the default bucket with another restore bucket if desired. Determining the status of the recovery job is done manually by querying the database directly or by checking the status on the UI. In addition, you can also check the job status by calling the ORCA API.

See the recovery workflow Step Function module for additional details on recovering files belonging to a granule.

For additional details on the recovery workflow input and output, see this documentation.