Updates to task granule file schemas
Background
Most Cumulus workflow tasks expect as input a payload of granule(s) which contain the files for each granule. Most tasks also return this same granule structure as output.
However, up to this point, there was inconsistency in the schemas for the granule files objects expected by each task. Furthermore, there was no guarantee of consistency between granule files objects as stored in the database and the expectations of any given workflow task.
Thus, when performing bulk granule operations which pass granules from the database into a Cumulus workflow, it was possible for there to be schema validation failures depending on which task was used to start the workflow and its particular schema.
In order to rectify this situation, CUMULUS-2388 was filed and addressed to create a common granule files schema between nearly all of the Cumulus tasks (exceptions discussed below) and the Cumulus database. The following documentation explains the manual changes you need to make to your deployment in order to be compatible with the updated files schema.
Updated files schema
The updated granule files schema can be found here.
These former properties were deprecated (with notes about how to derive the same information from the updated schema, if possible):
filename- concatenate thebucketandkeyvalues with a directory separator (/)name- usefileNamepropertyetag- ETags are no longer provided as an individual file property. Instead, a separateetagsobject mapping S3 URIs to ETag values is provided as output from the following workflow tasks (guidance on how to integrate this output with your workflows is provided in the Upgrading your workflows section below):update-granules-cmr-metadata-file-linkshyrax-metadata-updates
fileStagingDir- no longer supportedurl_path- no longer supportedduplicate_found- This property is no longer supported, howeversync-granuleandmove-granulesnow produce a separategranuleDuplicatesobject as part of their output. ThegranuleDuplicatesobject is a map of granules by granule ID which includes the files that encountered duplicates during processing. Guidance on how to integrategranuleDuplicatesinformation into your workflow configuration is provided below.