Upgrade execution table to CUMULUS-3320
Background
As part of the performance/feature evaluation work in completing the update requested as part of CUMULUS-3320, several updates were required to make executions
table deletes/reads more performant.
These changes required creating new indexes and modifying a table constraint - operations which can take some time and require manual upgrade steps for database deployments where the updates will exceed the bootstrap lambda's 15 minute timeout.
Please note that testing in a production-similar environment is strongly advised if you are concerned about precise query times and/or downtime.
The following procedures detail how this upgrade may be performed:
Adding the executions_parent_cumulus_id_index
index
Reasoning
This index is required to support effective deletion of records. Our query analysis implies the same table foreign-key constraint results in a slow table scan to enforce the constraint for each deletion.
Procedure
Users may opt to either automatically migrate the database, or manually create the index:
Utilize the normal cumulus deployment
The cumulus module will run the migrations on the database when it's deployed. For databases with smaller holdings and not under heavy load such that the sum of the migrations to be run will complete in < 15 minutes, the migration will run the following SQL query:
CREATE INDEX CONCURRENTLY IF NOT EXISTS executions_parent_cumulus_id_index ON executions(parent_cumulus_id)'
Manually create the index
Create index (executions_parent_cumulus_id_index) with ingest halted
The recommended approach to create the index is to halt ingest activity that requires write access to the execution table, then run the query to create the index. This will result in a much faster index operation than attempting to create the index concurrently, however the table will be locked until the index is complete.
For reference, in testing, using a 4ACU Aurora Serverless V1 cluster on a table with 15 million rows this migration took roughly 2 minutes to complete.
To do this, use the following query:
CREATE INDEX executions_parent_cumulus_id_index ON executions(parent_cumulus_id)'
Create index concurrently/with ongoing ingest
The required index can also be created while the database is in use prior to installing the upgrade containing CUMULUS-3320 by running the following query:
CREATE INDEX CONCURRENTLY executions_parent_cumulus_id_index ON executions(parent_cumulus_id)'
Please note this may take significantly longer than creating the index non-concurrently, especially if the table is under heavy use.
Indexing Failure
If the concurrent index query fails for any reason, you may have an invalid
index - if this occurs, make sure to drop and/or recreate the index to avoid resources being used for the invalid index:
DROP INDEX CONCURRENTLY executions_parent_cumulus_id_index
The index operation can then be re-attempted.
Adding the executions_collection_cumulus_id_index
index
Reasoning
This index is required to support effective execution searches by cumulus_collection_id
Procedure
Users may opt to either automatically migrate the database, or manually create the index:
Utilize the normal cumulus deployment
The cumulus module will run the migrations on the database when it's deployed. For databases with smaller holdings and not under heavy load such that the sum of the migrations to be run will complete in < 15 minutes, the migration will run the following SQL query:
CREATE INDEX CONCURRENTLY IF NOT EXISTS executions_collection_cumulus_id_index ON executions(collection_cumulus_id)
Manually create the index
Create index with ingest halted
The recommended approach to create the index is to halt ingest activity that requires write access to the execution table, then run the query to create the index. This will result in a much faster index operation than attempting to create the index concurrently, however the table will be locked until the index is complete.
For reference, in testing, using a 4ACU Aurora Serverless V1 cluster on a table with 15 million rows this migration took roughly 3 minutes to complete.
To do this, use the following query:
CREATE INDEX executions_collection_cumulus_id_index ON executions(collection_cumulus_id)
The required index can also be created while the database is in use prior to installing the upgrade containing CUMULUS-3320 by running the following query:
CREATE INDEX CONCURRENTLY executions_collection_cumulus_id_index ON executions(collection_cumulus_id)
Please note this may take significantly longer than creating the index non-concurrently, especially if the table is under heavy use.
Indexing Failure
If the concurrent index query fails for any reason, you may have an invalid
index - if this occurs, make sure to drop and/or recreate the index to avoid resources being used for the invalid index:
DROP INDEX CONCURRENTLY executions_collection_cumulus_id_index
The index operation can then be re-attempted.
Updating the executions_parent_cumulus_id_foreign
constraint
Notes
- This update may require ingest downtime as updates to the table require some time. See instructions below.
- This update should be performed after adding the
executions_parent_cumulus_id_foreign
constraint
Reasoning
This constraint as exists provides no action on deletion, this results in deletion of parent-child execution trees being both onerous and non-performant. During work/evaluation for the CUMULUS-3320 feature implementation it was determined that adding ON DELETE SET NULL
to this key makes wholesale/bulk deletes relatively performant (when combined with an index on the field) for large numbers of records while also being acceptable for 3rd party/smaller RESTful queries.
Procedure
Users may opt to either:
Utilize the normal cumulus deployment
The cumulus module will run the migrations on the database when it's deployed. For databases with smaller holdings and not under heavy load such that the sum of the migrations to be run will complete in < 15 minutes, the migration will run the following SQL query inside a transaction. This query will write-lock the table, which may cause write failures due to timeouts if ingest is ongoing:
ALTER TABLE executions DROP CONSTRAINT IF EXISTS executions_parent_cumulus_id_foreign
ALTER TABLE executions ADD CONSTRAINT executions_parent_cumulus_id_foreign FOREIGN KEY (parent_cumulus_id) REFERENCES executions(cumulus_id) ON DELETE SET NULL';
Manually update the constraint
To manually update the constraint, do the following, in order:
Modify the existing constraint without validation
The following query will, in a transaction, remove the existing constraint and re-enable the updated constraint. Adding NOT VALID
to the creation will cause postgres not to validate values in existing rows, but will enforce them for new rows. This greatly reduces downtime requirements as the constraint can be validated while the table is in use, and this transaction should be extremely quick:
BEGIN;
ALTER TABLE executions DROP CONSTRAINT IF EXISTS executions_parent_cumulus_id_foreign;
ALTER TABLE executions ADD CONSTRAINT executions_parent_cumulus_id_foreign FOREIGN KEY (parent_cumulus_id) REFERENCES executions(cumulus_id) ON DELETE SET NULL NOT VALID;
END;
Re-validate the constraint
This query can be run at any time following the above step and will not lock the full table for new row inserts/selects:
ALTER TABLE executions VALIDATE CONSTRAINT executions_parent_cumulus_id_foreign
For reference, in a table of roughly 18 million records without active ingest running on a Serverless V1 cluster set at 4ACUs, this validation took roughly 2 minutes in repeated testing, however running the validation during simulated active heavy writes it took significantly longer at around 20 minutes.
Verify the constraint/indexes exist on the table
To validate the upgrades have completed, use one of the following options:
PSQL terminal
If you're using a PSQL connection to the DB, you can get a full output of table structures and constraints using \d
:
\d executions
resulting in:
example
some_db=> \d executions
Table "public.executions"
Column | Type | Collation | Nullable | Default
----------------------------+--------------------------+-----------+----------+------------------------------------------------
cumulus_id | bigint | | not null | nextval('executions_cumulus_id_seq'::regclass)
arn | text | | not null |
async_operation_cumulus_id | integer | | |
collection_cumulus_id | integer | | |
parent_cumulus_id | bigint | | |
cumulus_version | text | | |
url | text | | |
status | text | | not null |
tasks | jsonb | | |
error | jsonb | | |
workflow_name | text | | |
duration | real | | |
original_payload | jsonb | | |
final_payload | jsonb | | |
timestamp | timestamp with time zone | | |
created_at | timestamp with time zone | | not null | CURRENT_TIMESTAMP
updated_at | timestamp with time zone | | not null | CURRENT_TIMESTAMP
Indexes:
"executions_pkey" PRIMARY KEY, btree (cumulus_id)
"executions_arn_unique" UNIQUE CONSTRAINT, btree (arn)
"executions_collection_cumulus_id_index" btree (collection_cumulus_id)
"executions_parent_cumulus_id_index" btree (parent_cumulus_id)
"executions_url_unique" UNIQUE CONSTRAINT, btree (url)
Check constraints:
"executions_status_check" CHECK (status = ANY (ARRAY['running'::text, 'completed'::text, 'failed'::text, 'unknown'::text]))
Foreign-key constraints:
"executions_async_operation_cumulus_id_foreign" FOREIGN KEY (async_operation_cumulus_id) REFERENCES async_operations(cumulus_id)
"executions_collection_cumulus_id_foreign" FOREIGN KEY (collection_cumulus_id) REFERENCES collections(cumulus_id)
"executions_parent_cumulus_id_foreign" FOREIGN KEY (parent_cumulus_id) REFERENCES executions(cumulus_id) ON DELETE SET NULL
Referenced by:
TABLE "executions" CONSTRAINT "executions_parent_cumulus_id_foreign" FOREIGN KEY (parent_cumulus_id) REFERENCES executions(cumulus_id) ON DELETE SET NULL
Check that the executions_parent_cumulus_id_foreign
constraint is present with 'ON DELETE SET_NULL' and does not have 'NOT VALID' appended to the end
Check that the two added indexes executions_parent_cumulus_id_index
and executions_collection_cumulus_id_index
exist and do not show as INVALID
.
Standard Postgres queries
Verify executions_parent_cumulus_id_foreign
Run the following query:
SELECT
conname AS constraint_name,
pg_get_constraintdef(pg_constraint.oid) AS definition
FROM
pg_constraint
WHERE
conrelid = 'executions'::regclass
AND contype = 'f';
You should get a result like:
constraint_name | definition
-----------------------------------------------+--------------------------------------------------------------------------------------
executions_async_operation_cumulus_id_foreign | FOREIGN KEY (async_operation_cumulus_id) REFERENCES async_operations(cumulus_id)
executions_collection_cumulus_id_foreign | FOREIGN KEY (collection_cumulus_id) REFERENCES collections(cumulus_id)
executions_parent_cumulus_id_foreign | FOREIGN KEY (parent_cumulus_id) REFERENCES executions(cumulus_id) ON DELETE SET NULL
(3 rows)
Check that the executions_parent_cumulus_id_foreign
constraint is present with 'ON DELETE SET_NULL' and does not have 'NOT VALID' appended to the end
Verify indexes were created
Run the following query:
SELECT
indexname AS index_name,
indexdef AS definition
FROM
pg_indexes
WHERE
tablename = '<tablename>';
You should see a result like:
index_name | definition
----------------------------------------+--------------------------------------------------------------------------------------------------------------
executions_url_unique | CREATE UNIQUE INDEX executions_url_unique ON public.executions USING btree (url)
executions_arn_unique | CREATE UNIQUE INDEX executions_arn_unique ON public.executions USING btree (arn)
executions_pkey | CREATE UNIQUE INDEX executions_pkey ON public.executions USING btree (cumulus_id)
executions_collection_cumulus_id_index | CREATE INDEX executions_collection_cumulus_id_index ON public.executions USING btree (collection_cumulus_id)
executions_parent_cumulus_id_index | CREATE INDEX executions_parent_cumulus_id_index ON public.executions USING btree (parent_cumulus_id)
(5 rows)
Check that the two added indexes executions_parent_cumulus_id_index
and executions_collection_cumulus_id_index
exist and do not show as INVALID
.