RDS: Choosing and Configuring Your Database Type
Background
Cumulus uses a PostgreSQL database as its primary data store for operational and archive records (e.g. collections, granules, etc). The Cumulus core deployment code expects this database to be provided by the AWS RDS service; however, it is agnostic about the type of the RDS database.
RDS databases are broadly divided into two types:
- Provisioned: Databases with a fixed capacity in terms of CPU and memory capacity. e.g. Memory optimized, Burstable, and Optimized Reads. You can find a complete list of the available database instance class types in this AWS documentation.
- Serverless v2: Databases that can scale their CPU and memory capacity up and down in response to database load. Amazon Aurora is the service which provides serverless RDS databases.
Provisioned vs. Serverless
Generally speaking, the advantage of provisioned databases is that they don't have to scale. As soon as they are deployed, they have the full capacity of your chosen instance size and are ready for ingest operations. Of course, this advantage is also a downside: if you ever have a sudden spike in database traffic, your database can't scale to accommodate that increased load.
On the other hand, serverless databases are designed precisely to scale in response to load. While the ability of serverless databases to scale is quite useful, there can be complexity in setting the scaling configuration to achieve the best results. Recommendations for Aurora serverless database scaling configuration are provided in the section below.
To decide whether a provisioned or a serverless database is appropriate for your ingest operations, you should consider the pattern of your data ingests.
If you have a fairly steady, continuous rate of data ingest, then a provisioned database may be appropriate because your database capacity needs should be consistent and the lack of scaling shouldn't be an issue.
If you have occasional, bursty ingests of data where you go from ingesting very little data to suddenly ingesting quite a lot then a serverless database may be a better choice because it will be able to handle the spikes in your database load.
General Configuration Guidelines
Aurora Serverless V2 Capacity Range
Serverless V2 allows users to configure the minumum and maximum number of ACUs the custer should use.
The Aurora Serverless V2 Docs provide guidance for determining your capacity requirements and modifying the capacity range of your RDS cluster.
Cumulus Core Login Configuration
Cumulus Core uses a admin_db_login_secret_arn
and (optionally) user_credentials_secret_arn
as inputs that allow various Cumulus components to act as a database administrator and/or read/write user. Those secrets should conform to the following format:
{
"database": "postgres",
"dbClusterIdentifier": "clusterName",
"engine": "postgres",
"host": "xxx",
"password": "defaultPassword",
"port": 5432,
"username": "xxx",
"disableSSL": false,
"rejectUnauthorized": false,
}
database
-- the PostgreSQL database used by the configured userdbClusterIdentifier
-- the value set by thecluster_identifier
variable in the terraform moduleengine
-- the Aurora/RDS database enginehost
-- the RDS service host for the database in the form (dbClusterIdentifier)-(AWS ID string).(region).rds.amazonaws.compassword
-- the database passwordusername
-- the account usernameport
-- The database connection port, should always be 5432rejectUnauthorized
-- If disableSSL is not set, set to false to allow self-signed certificates or non-supported CAs. Defaults to false.disableSSL
- If set to true, disable use of SSL with Core database connections. Defaults to false.
SSL encryption
Current security policy/best practices require use of a SSL enabled configuration. Cumulus by default expects a configuration that includes a datastore that requires an SSL connection with a recognized certificate authority (RDS managed databases configured to use SSL will automatically work as AWS provides AWS CA bundles in the Lambda runtime environment). If deployed in an environment not making use of SSL, set disableSSL
to true
to disable this behavior.
Self-Signed Certs
Cumulus can accommodate a self-signed/unrecognized cert by setting rejectUnauthorized
as false
in the connection secret. This will result Core allowing use of certs without a valid CA.
Cumulus Serverless RDS Cluster Module
Cumulus provides a Terraform module that will deploy an Aurora Serverless RDS cluster. If you are using this module to create your RDS cluster, you can configure the cluster minimum and maximum capacity, and more as seen in the supported variables for the module.
Optional: Manage RDS Database with pgAdmin
Setup SSM Port Forwarding
In order to perform this action you will need to deploy it within a VPC and have the credentials to access via NGAP protocols.
For a walkthrough guide on how to utilize AWS's Session Manager for port forwarding to access the Cumulus RDS database go to the Accessing Cumulus RDS database via SSM Port Forwarding article.