Best Practices
Unit Testing
All code should reach minimum 80% coverage through Unit Tests.
Code Style
We use the Google Style Guide for style elements such as documentation, titling, and structure. We also recommend reviewing Clean Architecture
Stop on Failure
Failures within ORCA break through to the Cumulus workflow they are a part of. To this end, raising an error is preferred over catching the error and returning a null value or error message. The code examples below exemplify this idea by showing how to raise an error using python in different contexts.
try:
value = function(param)
except requests_db.DatabaseError as err:
logging.error(err)
raise
if not success:
logging.error(f"You may log additional information if desired. "
f"param: {param}")
raise DescriptiveErrorType(f'Error message to be raised info Cumulus workflow.')
Retries can then be configured in the workflow json if desired. See
documentation
and
tutorials
for more information.
The following snippet from the copy_to_archive lambda demonstrates usage of retries for a lambda in an ingest workflow.
MaxAttempts
is set to 6, meaning that it will run the function a maximum of 7 times before transitioning to the WorkflowFailed
state.
IntervalSeconds
determines how many seconds the workflow will sleep between retries.
A BackOffRate
of 2 means that the IntervalSeconds
will be doubled on each failure beyond the first.
"CopyToArchive": {
...
"Type": "Task",
"Resource": "${copy_to_archive_task_arn}",
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.exception",
"Next": "WorkflowFailed"
}
],
"Next": "WorkflowSucceeded"
},
If the retries are exceeded and the error is caught, then the workflow will show that it jumped to the WorkflowFailed
state.
If the 'WorkflowFailed' state was not triggered, then the workflow will move on to the step defined in Next
.
In the event that an error may be transient, and failing would cause a large amount of redundant work for other objects, retrying a failing operation in code is acceptable with a strictly limited number of retries. You will likely want to log each individual error for analytics and debugging.