Parse PDR
This task utilizes the Cumulus Message Adapter to interpret and construct incoming and outgoing messages.
Links to the npm package, task input, output and configuration schema definitions and more can be found on the auto-generated Cumulus Tasks page.
Summary
The purpose of this task is to do the following with the incoming PDR object:
Stage it to an internal S3 bucket
Parse the PDR
Archive the PDR and remove the staged file if successful
Outputs a payload object containing metadata about the parsed PDR (e.g. total size of all files, files counts, etc) and a granules object
The constructed granules object is created using PDR metadata to determine values like data type and version, collection definitions to determine a file storage location based on the extracted data type and version number.
Granule file types are converted from the PDR spec types to CNM types according to the following translation table:
HDF: 'data',
HDF-EOS: 'data',
SCIENCE: 'data',
BROWSE: 'browse',
METADATA: 'metadata',
BROWSE_METADATA: 'metadata',
QA_METADATA: 'metadata',
PRODHIST: 'qa',
QA: 'metadata',
TGZ: 'data',
LINKAGE: 'data'
Files missing file types will have none assigned, files with invalid types will result in a PDR parse failure.
Task Inputs
Input
This task expects an incoming input that contains name and path information about the PDR to be parsed. For the specifics, see the Cumulus Tasks page entry for the schema.
Configuration
This task does expect values to be set in the workflow_config
CMA parameters for the workflows. A schema exists that defines the requirements for the task.
For the most recent config.json schema, please see the Cumulus Tasks page entry for the schema.
Below are expanded descriptions of selected config keys:
Provider
A Cumulus provider object. Used to define connection information for retrieving the PDR.
Bucket
Defines the bucket where the 'pdrs' folder for parsed PDRs will be stored.
Collection
A Cumulus collection object. Used to define granule file groupings and granule metadata for discovered files.
Task Outputs
This task outputs a single payload output object containing metadata about the parsed PDR (e.g. filesCount, totalSize, etc), a pdr object with information for later steps and a the generated array of granule objects.
Examples
See the SIPS workflow cookbook for an example of this task in a workflow.