Below, find a diagram with the components that comprise an instance of Cumulus.
This diagram details all of the major architectural components of a Cumulus deployment.
While the diagram can feel complex, it can easily be digested in several major components:
End Users can access data via Cumulus's
distribution submodule, which includes ASF's thin egress application, this provides authenticated data egress, temporary S3 links and other statistics features.
End user exposure of Cumulus's holdings is expected to be provided by an external service.
For NASA use, this is assumed to be CMR in this diagram.
The core of the ingest and processing capabilities in Cumulus is built into the deployed AWS Step Function workflows. Cumulus rules trigger workflows via either Cloud Watch rules, Kinesis streams, SNS topic, or SQS queue. The workflows then run with a configured Cumulus message, utilizing built-in processes to report status of granules, PDRs, executions, etc to the Data Persistence components.
Workflows can optionally report granule metadata to CMR, and workflow steps can report metrics information to a shared SNS topic, which could be subscribed to for near real time granule, execution, and PDR status. This could be used for metrics reporting using an external ELK stack, for example.
Cumulus entity state data is stored in a set of PostgreSQL compatible database, and is exported to an Elasticsearch instance for non-authoritative querying/state data for the API and other applications that require more complex queries. Currently the entity state data is replicated in DynamoDB and this will be removed in a future release.
Discovering data for ingest is handled via workflow step components using Cumulus
collection configurations and various triggers. Data can be ingested from AWS S3, FTP, HTTPS and more.
Cumulus utilizes a user-provided PostgreSQL database backend. For improved API search query efficiency Cumulus provides data replication to an Elasticsearch instance. For legacy reasons, Cumulus is currently also deploying a DynamoDB datastore, and writes are replicated in parallel with the PostgreSQL database writes. The DynamoDB replicated tables and parallel writes will be removed in future releases.
PostgreSQL Database Schema Diagram
Cumulus is deployed via Terraform and is organized internally into two separate top-level modules, as well as several external modules.
The Cumulus module, which contains multiple internal submodules, deploys all of the Cumulus components that are not part of the
Data Persistence portion of this diagram.
The data persistence module provides the
Data Persistence portion of the diagram.
Other modules are provided as artifacts on the release page for use in users configuring their own deployment and contain extracted subcomponents of the cumulus module. For more on these components see the components documentation.