===========
Quick Start
===========

In general, there are two ways to run the uPSP processing applications:

-  :ref:`Processing one test condition <sec-single-processing>`: Useful for initial checkout of raw
   data from a wind tunnel test setup. The user manually runs each step
   in the pipeline.
-  :ref:`Batch processing multiple test conditions <sec-batch-processing>`: Useful after initial
   checkout. A framework auto-generates scripts that will invoke each
   pipeline step for each wind tunnel test condition. The user controls
   the pipeline via top-level “launcher” scripts.

.. _sec-single-processing:

Producing surface pressure time history for one test condition
==============================================================

The following instructions allow a user to process a single test
conditions with the uPSP software. The steps will use bracketed
shorthand like ``<name>`` to refer to the path to input files on
disk (for more details concerning file formats, see :doc:`file-formats`).

#. Acquire the following input files:

   -  ``<video-file>``: High-speed video of your test subject (:ref:`sec-video-file`)
   -  ``<grid-file>``: Test subject 3D model (:ref:`sec-grid-file`)
   -  ``<tgts-file>``: Registration targets and fiducials (:ref:`sec-tgts-file`)
   -  ``<steady-file>``: Steady-state surface pressure (:ref:`sec-steady-file`)
   -  ``<wtd-file>``: Time-averaged wind tunnel conditions (:ref:`sec-wtd-file`)
   -  ``<paint-file>``: Unsteady gain paint calibration coefficients (:ref:`sec-upsp-gain-file`)
   -  ``<cal-file>``: Camera calibration parameters (one file per camera) (:ref:`sec-cam-cal-file`)

#. Create the following processing configuration files:

   -  ``<excal-cfg-file>``: Configuration for ``upsp-external-calibration`` (:ref:`sec-excal-file`)
   -  ``<inp-file>``: Input deck for ``psp_process`` (:ref:`sec-input-deck`)

#. Run ``upsp-extract-frames`` to extract the first frame from the camera video file as a PNG.

   .. code:: bash

      #!/bin/bash
      upsp-extract-frames \
        -input=<video-file> \
        -output=first-frame.png \
        -start=1 \
        -count=1

   Run ``upsp-extract-frames -h`` for more usage details.

   The output file ``first-frame.00001.png`` contains the first frame
   from the input ``<video-file>``, scaled to 8-bit depth.

#. Run ``upsp-external-calibration`` to compute the external camera
   calibration relative to the model position in the first frame (this
   step accounts for “wind-on” deflection of the model position that
   is not accounted for in the time-averaged model position reported by
   the wind tunnel data systems):

   .. code:: bash

      #!/bin/bash
      upsp-external-calibration \
        --tgts <tgts-file> \
        --grd <grid-file> \
        --wtd <wtd-file> \
        --cfg <excal-cfg-file> \
        --cal_dir <cal-dir> \
        --out_dir . \
        --img first-frame.00001.png

   ``<cal-dir>`` refers to the name of the directory containing
   ``<cal-file>``. Run ``upsp-external-calibration -h`` for more usage
   details.

   The output file ``cam01-to-model.json`` contains the external
   calibration parameters that will be fed to ``psp_proces`` in the next
   step.

#. Run ``psp_process`` to produce a (usually quite large) time history of
   the surface pressure at each point on the model grid. This can take
   significant time and memory if run on a personal computer; we recommend
   instead that the application be run in parallel on a compute cluster.
   The application is best run so that one MPI rank runs on each compute node,
   and then thread-level parallelism is leveraged by each MPI rank to scale
   across cores on a given node.

   An example PBS job script for running ``psp_process`` on the Pleaides cluster at the
   NASA Advanced Supercomputing (NAS) Division is shown below. For more details
   about PBS syntax and uPSP-specific rules of thumb for sizing the job allocation,
   see :ref:`sec-nas-parameters`.

   .. code:: bash

      #!/bin/bash
      #PBS -q normal
      #PBS -l select=40:model=ivy
      #PBS -l walltime=00:20:00
      export MPI_DSM_DISTRIBUTE=off
      export OMP_STACKSIZE=250M
      export OMP_NUM_THREADS=16
      source /usr/local/lib/global.profile
      module purge
      module load mpi-hpe/mpt.2.25
      mpiexec psp_process \
        -input_file=<inp-file> \
        -h5_out=<out-dir>/output.h5 \
        -paint_cal=<paint-file> \
        -steady_p3d=<steady-file>

   ``<out-dir>`` refers to the value of the ``@output/dir`` variable
   specified in the ``<inp-file>``. Run ``psp_process -h`` for more
   usage details.
   
   The output ``pressure_transpose`` file contains the surface pressure
   time history for each node on the model grid (see :ref:`sec-pressure-transpose-file`).
   Several diagnostic images are printed to verify the external calibration and (optional)
   fiducial patches align well with the position of the model in the
   first video frame.

#. (Optional) post-processing steps

   -  Run ``add_field`` to add the ``pressure_transpose`` data into the
      HDF5 file produced by ``psp_process``. For some of its command
      line arguments, ``add_field`` must be provided with the number of
      vertices in the 3D model and the number of frames that were
      processed. Example usage below shows how to obtain these values
      from inspecting files output by ``psp_process`` in the BASH script
      language. ``<out-dir>`` should be replaced with the same directory
      as ``<out-dir>`` in the previous step (the directory containing
      outputs from ``psp_process``).

      .. code:: bash

         #!/bin/bash

         # Set output_dir to the folder containing outputs from `psp_process`
         # in previous step.
         output_dir=<out-dir>
         trans_h5_file=$output_dir/output.h5

         # Inspect the 'X' file, which is a flat binary dump of the
         # X-coordinates of the input wind tunnel model grid vertices.
         # The number of coordinates in the file gives the size of the model.
         data_item_size=4 # coordinates stored as 4-byte float's
         model_size="$(expr $(stat --printf="%s" $output_dir/X) '/' $data_item_size)"
         trans_flat_file="$(find $output_dir -type f -name 'pressure_transpose')"
         trans_flat_file_size="$(stat --printf="%s" $trans_flat_file)"
         trans_flat_file_number_frames="$(expr $trans_flat_file_size '/' '(' $model_size '*' $data_item_size ')')"

         echo ">>> number of model nodes: $model_size"
         echo ">>> data item size: $data_item_size"
         echo ">>> time history flat file: '$trans_flat_file'"
         echo ">>> time history flat file size: $trans_flat_file_size"
         echo ">>> time history flat file number of frames: $trans_flat_file_number_frames"
         echo ">>> adding flat-file data to '$trans_h5_file'"

         ex="add_field $trans_h5_file frames $trans_flat_file $trans_flat_file_number_frames"
         echo ">>> running: '$ex'"
         t_start="$(date +%s.%N)"
         echo ">>> started at: $(date)"
         $ex
         t_end="$(date +%s.%N)"
         echo ">>> elapsed time: $(python -c "print('%4.1f' % ($t_end - $t_start))") seconds"

         echo ">>> run-add-field DONE."

.. _sec-batch-processing:

Batch processing multiple test conditions
=========================================

The following instructions allow a user to batch process one or more
test conditions from a wind tunel test with the uPSP software.

Batch processing is configured by ``upsp-make-processing-tree``, a tool
that auto-generates a file tree and associated command-line scripts that
the user can then run to execute each step in the uPSP pipeline for one
or more datapoints. The configuration process is illustrated in
:numref:`flowchart-batch-processing` and consists of the following steps:

#. The user locates raw data files from a wind tunnel test on disk
#. The user prepares four Javascript Object Notation (JSON) configuration files:

   -  A **datapoint index**, listing the path to each raw input file for
      each datapoint

      -  This often consists of writing test-specific scripts/tools to
         grok each input file on disk

   -  A **processing parameters file**, containing parameter settings
      for each step in the pipeline
   -  A **PBS job parameters file**, containing PBS scheduler
      settings (group ID, reservation wall time, number of nodes, etc.)
   -  A **plotting parameters file**, containing parameters for plotting
      steps in the pipeline

#. The user runs ``upsp-make-processing-tree`` and provides it with each
   configuration file. The script will autogenerate a file tree on disk
   to store all artifacts for batch processing

Once the processing tree is generated and saved to disk, the user
can navigate to the ``03_launchers`` subfolder and trigger each step
in the pipeline as follows:

#. Each step in the pipeline is launched using a script named
   ``step+<step-name><+optional-subtask-name>``.

   -  They should be run in the order given here (some steps use outputs
      from previous steps):

      1. ``step+extract-first-frame``: extract the first frame from each
         camera video file.
      2. ``step+external-calibration``: run the wind-on, first-frame
         external calibration for each camera.
      3. ``step+psp_process+psp-process``: run ``psp_process`` -
         image-to-grid projection and calibration to units of pressure.
      4. ``step+psp_process+add-field``: post-process ``psp_process``
         outputs; add largest pressure-time history dataset into the
         HDF5 output file.

   -  Each step launcher script can be invoked as follows:

      -  ``./<step-launcher-script> <datapoint-id-1> <datapoint-id-2> ...``
         to process a specific subset of datapoints. By default, all
         datapoints are processed.
      -  ``./qsub-step <step-launcher-script> <datapoint-id-1> <datapoint-id-2> ...``
         to launch the step on the cluster as one or more jobs (uses
         ``qsub``). The jobs can then be monitored using
         ``qstat -u $USER``. The jobs reservations are configured using
         the PBS job parameters supplied in the PBS job parameters JSON file.

#. Once complete, data products for each datapoint will be available
   under ``04_products/00_data/<step-name>/<datapoint-id>``.

The JSON file format was chosen for batch processing configuration files
due to its ubiquitous usage in industry and broad
cross-platform/cross-language support. Users should be familiar with
plain-text editing of JSON files and can reference the official JSON
syntax `here <https://www.json.org/json-en.html>`__.

.. _flowchart-batch-processing:
.. figure:: _static/flowchart-batch-processing.png
   :alt: uPSP NAS batch processing flowchart.
   :name: fig:flowchart
   :width: 100.0%

   uPSP NAS batch processing flowchart.

.. _sec-nas-parameters:

Guidelines for setting NAS PBS job parameters
=============================================

For complete details and tutorials, see the HECC wiki, `“Running Jobs with
PBS” <https://www.nas.nasa.gov/hecc/support/kb/running-jobs-with-pbs-121/>`__.

Specific to the current implementation of the uPSP processing code, the following
is rationale for practical "rules of thumb" for scaling the size of the PBS job to
your input data size. Trial-and-error may be required to define these parameters
correctly after initial best-guesses.

Given the following variable definitions:

- :math:`N_R``: Number of MPI ranks
- :math:`N_C`: Number of cameras
- :math:`F_R`: Camera resolution (pixels)
- :math:`N_M`: Number of 3D model grid nodes
- :math:`N_T`: Number of time samples

Then, tl;dr a rule-of-thumb is to ensure each MPI rank has access to at least
:math:`M_T = M_C + M_1 + M_2` bytes of local memory, where

- :math:`M_C = O\left(K_C (N_T N_C F_R)/N_R\right)` accounts for storage of camera frames
- :math:`M_1 = O\left(K_1 (N_T N_M)/N_R\right)` accounts for storage of the 3D-projected data (pixel counts)
- :math:`M_2 = O\left(K_2 (N_T N_M)/N_R\right)` accounts for storage of the calibrated, 3D-projected data (physical pressure units)

and :math:`K_C = 2`, :math:`K_1 = 8`, and :math:`K_2 = 8` are reasonable constant "fudge factors" accounting for
variability in camera bit depth and intermediate storage of elements of the solution in memory.

A practical application of this rule of thumb is as follows:

- At NASA, for one example test condition, we collected 60K frames from 4x cameras, each approximately 1MP resolution
- The wind tunnel 3D model had approximately 1M vertices
- So:

  - :math:`M_C \approx 2 \cdot 60K \cdot 4 \cdot 1M / N_R \approx 5E11 / N_R`
  - :math:`M_1 \approx 8 \cdot 60K \cdot 1M / N_R \approx 5E11 / N_R`
  - :math:`M_2 \approx 8 \cdot 60K \cdot 1M / N_R \approx 5E11 / N_R`
  - :math:`M_T \approx 1.5E12 / N_R` (bytes)

- We can use 40 MPI ranks, one per compute node, so the memory requirement per
  compute node is approximately 1.5E12 / 40 = 3.75E10 bytes, or 37.5GB. The NAS
  Ivy nodes each have `64GB of memory available <https://www.nas.nasa.gov/hecc/support/kb/preparing-to-run-on-pleiades-ivy-bridge-nodes_446.html>`_,
  so we can fit our job into a PBS session with 40 Ivy nodes and 40 MPI ranks.
  From practical experience, we know this job takes less than 10 minutes of wall clock time,
  so we can use the following PBS directive to allocate the job:

  .. code:: bash

    #PBS -lselect=40:model=ivy:walltime:00:10:00


Rationale for the rule of thumb is based on complexity analysis of the current algorithm implementation,
described in more detail in :doc:`swdd`.