Process Control Manager (PCM) Interface¶

The Process Configuration Manager (PCM) refers to two interfaces through which users describe processes that produce data products (e.g. PCM Processs GUI) and the data products that are produced. The description of the output products are referred to as Data Object Designs, thus the GUI is referred to as the PCM DOD GUI. Because the PCM is intended to expediate the transfer of a scientific algorithm into the ARM production processing system, it is expected the desired output product is well defined.

PCM Description¶

The Process Configuration Manager (PCM) is the master interface from which a user can access the Process Definition and other tools used to view, edit, and define an ADI application’s process, input, and output configurations.

Currently loaded interfaces are displayed on the right panel, while access to ARM’s datastreams and processes is maintained on the left panel via the Processes and Datastreams tabs.

A user can have multiple tools open and be simultaneously viewing or editing datastreams, processes, and DODs.

Each instance of an active tool is maintained in a set of tabs located along the top of the PCM’s right panel as shown in the following figure The Process Configuration Manager (PCM). By default the PCM will enable the Datastream tab of the left panel and display the Intro tab on the right panel.

Note in the left frame, below the Filter the List heading, the Production and Development tabs are grayed out. This indicates that the list of datstreams displayed is a combined list of both production and development datsatreams. To filter out the development datastreams, select the Production button. To filter datastreams by a string, enter the string into the blank cell below the Production button.

_images/pcm_annotated_UI.png — The Process Configuration Manager (PCM)¶

From the PCM:Datastreams panel a user can view existing datastreams, their associated DODs, create new datastreams, and define DODs for a datastream. From the PCM:Processes panel a user can view an existing process, or create a new process.

Process Definition¶

Defining an ARM process consists of defining its inputs and outputs, and documenting where it will run through the Process Definition Tool component of the ADI PCM.

A Process Definition includes definitions of inputs, outputs, and operating parameters relating to the process versus an input, output, or transformation being applied to an input. A summary of the information needed to define an ADI process and information helpful in completing the process definition is presented below.

Process Name

required

Name of the process. The executable will be the name followed by ‘_vap.’

Process Locations and Other Options Form¶

ARM Facilities for which this Process Should Run:

required

Each ARM site/facility pairing for which the process is valid to run. Can only run at facilities that are documented in the DSDB.

Automated Email:

required

Email address to receive error and warning messages produced by the process.

Processing Interval

required

The number of seconds of data that each interaction takes across the ADI modules that follow the initialization and prior to the finishing (retrieve, merge, transform, create output datasets, store data). Defaults to a single day (86400 seconds). If set to 0 the size of the chunk of data processed through the retriever, merge, process modules equals the size equal to the begin and end dates, plus any time offsets.

Inputs/Outputs Form¶

Process Type:

required

If the process will be retrieving data from netCDF data files, select ADI. Basic type is for non-netCDF input files. Load input netCDF files into the DOD as needed as described in <KLG need updated link to Creating a New DOD by Importing from a NetCDF File in other pcm_dod.rst>

Retreival Defnition:

required

Input datastream for an ADI type process are specified as part of the retrieval definition process. Select the Edit Retrieval button and complete the Retrieval Definition Form as described in <KLG need updated link to Defining a Datastream Class>

Process Output Datastream Classes:

required

Select an existing datastream class and data level from the drop-down list of available values. Enter a new datastream base platform name and datalevel that conforms to ARM naming standards. If a DOD for that datastream is in the DSDB, an expandable reference that is also a link to the DOD interface page is displayed.

Defining a New Process¶

To define a new process, perform the following steps.

Open the Processing Configuration Manager (PCM) and login from https://engineering.arm.gov/pcm/Main.html. Use your ARM wiki user name and password.

Select the Processes tab in the upper left panel of the PCM as shown:

Define a new VAP.

Enter a name for the process. For this tutorial, we will name the process example_vap.
Select the process type as VAP.
Enter the facilities at which the VAP should run by selecting site and facility pairs from the provided drop list. To display the drop list, start typing the name of the facility. You can then select the facility from the list of candidates. For this tutorial, select NSA C1 to start. To delete a selection, click the X beside the item name. If a site is not listed it needs to be loaded into the database. Contact Sherman Beus for further information.
Enter the output datastream name, which for this tutorial is examplevap.c1 (note that site and facility are not included in this definition). As in the previous step, a drop list will display candidate datastream names.
Select the ‘Save’ button to save the entries to the DSDB.

In the example shown below the VAP process created is ‘example_vap’, it runs at the sgp C1, sgp B1, and nsa C1 facilities, and produces the output datastream examplevap.c1. If the examplevap.c1 datastream has not been previously defined, saving the process information to the DSDB will result in the addition of the examplevap.c1 to the list of datastreams available from the PCM:Datastreams view. Note the Process Definition form is labeled as an ‘example_vap Process Form’ tab at the top of the right panel next to the ‘Intro’ tab.

_images/pcm_process_definition_tool_edit_retrieval.png

Updating an Existing Process¶

To rename an existing process perform steps one and two from Defining a New Process, edit the name of the process and save the change.

It is not possible to duplicate a process. However, the Text Export/Import button displayed in the lower right hand corner of the Retriever Editor form can be used to copy all the Retriever Table database entries into another process. To fully duplicate the process, the attributes associated with the process will need to be reentered into the new processes Locations and Other Options, and Inputs/Outputs forms.

Specifying Variables to Retrieve and Conversions and Transforms to Apply¶

At the most basic level, defining the inputs to a VAP consists of documenting the name of a variable and the datastream from which it should be retrieved. Historically, a significant effort was expended performing pre-analysis data consolidation and transformations to prepare input data for scientific analysis. To minimize, if not eliminate, the need for VAP developers to perform such tasks, ADI allows a user to

Define preferred and alternative input datastream sources
Assign a generic name to retrieved variables that will be referred in the automated source code
Use a simple check box to retrieve companion QC variables
Apply unit and data type changes to the data as part of the retrieval process.

Additional control is also provided to define input data source preference by site/facility pairing or time range dependencies.

The inputs of all VAPs must be specified using a retriever process. As such, by default the ‘This process uses a retrieval for its input configuration’ box will be checked and an ‘Edit Retrieval’ button should be evident in the lower left side of the right panel. Selecting this button will bring up the Retrieval Definition form shown in the Retrieval Definition Form as sown in the following example screen capture. Note that the Retrieval Definition form has replaced the Process Definition form but it is still organized under the ‘example_vap VAP Process’ tab.

To return to the main Process Definition form select the ‘Done’ button at the bottom left of the right panel.

Note

Retriever data will not be stored in the DSD until the ‘Save’ button in the Process Definition form has been selected.

_images/pcm_retrieval_definition_form.png

Retrieval Definition Table Overview¶

The Retrieval Definition form allows the user to not only specify the variable and data source from which to retrieve a variable, but also to perform some basic transformations of units and data type. These options are checkboxes in the bar above the table. Selecting an item adds the data entry column to the table. In addition to the transformations, the bar also allows the user to retrieve data for a particular variable for some extra time before and/or after the process period specified in the command line, and to automatically retrieve the companion QC variable. A description of each of the columns in the Retrieval Definition table is given below.

ADI Retrieval Definition Form Parameters
Process Element	Required	Description and Comments
Source(s)	Yes	Datastream Source(s) is the datastream(s) from which the value(s) for the variable should be retrieved. Populated via the Data Sources Definition form. A single value can be retrieved from a prioritized list of preferred and alternate datastreams. (in Figure 3.2 the first_cbh variable is retrieved from either the vceil25k.a1 or vceil25k.b1 based on the indicated conditions and correlated to a user) defined variable ‘cloud_base_height’).
Variable Name	Yes	The Variable Name consists of the user defined name of the variable to be retrieved, and an indication of whether finding the variable in one of the specified input data sources is a requirement that must be met for the VAP to successfully run. Variable names in the ‘Variable Name’ column must be unique. If the ‘Required’ check box is marked, the VAP process will fail to run for a given observation (i.e., input data file) unless the variable specified is successfully retrieved. If the ‘Required’ check box is marked, an asterisk will follow the Variable Name. This will be the name to which the retrieved data will be referred to in the DSDB and in auto-generated code. It is not necessary for this name to match the name in the datastream(s) from which the variable is retrieved. Coordinate dimension variables (i.e., time, height, range, etc.) should not be included in the Retrieval Definition table, as all coordinate dimensions of retrieved variables are automatically retrieved. This automatic retrieval is only successful when the dimension name and variable name in the input datastream file are identical.
Coord System	No	Coord System is the name assigned by developer to the coordinate system for a given variable. The parameters associated with a coordinate system are assigned via the Coordiante System Definition Form. A transformation method must be defined for each dimension of a variable’s coordinate system. ADI supports two methods of assigning a coordinate system to a given dimension; (1) to assign a uniform system (i.e. a coordinate system characterized by a constant interval between all samples of the dimension) (2) a mapping (a coordinate system not explicitly defined, but indicated by selecting a coordinate variable from another datastream to which a retrieved variable’s dimension will be transformed). These are more fully discussed in Coordinate System Definition Form Overview It is recommended that all retrieved variables passed through to an output datastream, even when the input and output coordinate systems are identical, have an explicitly name and are defined using a mapping or static values. For cases where the output coordinate system is the same as that of the input datastream, it should be defined as a mapping onto itself. This will fill gaps in data to create a more complete file.
Outputs	Yes	The name of the output datastream(s) and level(s) that a retrieved variable will be propagated to as part of the data consolidation process, and the name of the variable as it will be found in the output datastream(s). Populated via the Output Field Mapping Form. The output datastream(s) are prepolated with all possible output datastreams documented in the Inputs / Outputs section of the Process Definition Form. For a retrieved variable to exist in a output datastream, the name must be entered into the empty cell adjacent to the datastream name and level in the toutput Field Mapping Form.
Units	No	Specifies the units into which the retrieved data will be converted. Units are converted using Unidata’s UDunits library DEFAULT value results in units staying the same as found in the input file from which the variable is retrieved.. Units are entered free form. Please reference Unidata’s web page for further information: http://www.unidata.ucar.edu/ software/udunits/udunits-2/ udunits2.html.
Data Type	No	A drop list of possible data types into which the retrieved data can be converted. If a value is provided the data type will default to type float. If the data type remains as a default value through the population of the Data Sources Definition form, and a field is selected from the drop list of available values, the data type will be updated to the type of the selected field as found in the specified datastream. If the default value is overridden in the Retrieval Definition table, the data type will not be updated as a result of field selections in the Data Sources Definition form.
QC	Yes	Indicates whether the companion QC variable will be retrieved in addition to the variable noted in the Variable Name and whether if successfully finding the companion QC variable is a requirement for the VAP to run. It is assumed that the companion QC variable will be equal to the name of the variable in the input datastream file preceded by ‘qc_.’ If the ‘Required’ check box is marked, the VAP process will fail to run for a given observation (i.e., input data file) unless both the variable and its QC is successfully retrieved.
Offsets(Seconds)	No	If both the input and output bins do not both line up with the processing interval boundaries, to be absolutely sure you get all the input data you need outside the edge of a processing interval you will need to define offsets to [size of input bin] + [site of transformed bin]. This will retreive enough data including the worst case of dia- metrically opposed alignments (alignment of 0.0 in one, and 1.0 in the other). Allows a user to retrieve additional data for each processing interval either before the interval or after. This includes before the begin date, or after the end date of the begin and end date entered at the command line at run time. The begin_date and end_date values are for the “current processing interval” and are not adjusted by the offsets. All records with times before begin_date or after end_date are records within the specified offsets (for normal daily processing these would be from the previous day or the next day) day). Note that begin_date and end_date are input parmeters to all user hooks. If an offset defined at the start of 60 secs, and an offset at the end of 60sec for sample interval of 60sec, then the samples will go from 0 to 1441. But the output files created will still be 1440 in size and consist of the samples 1 to 1440. Typically used to provide a buffer of data to a type of analysis that needs to see a larger window of data than the processing interval of the ADI process. Despite the processing being over the entire period, the output file will only be over the processing interval.

Data Sources Definition Form Overview¶

The Data Sources Definition form allows a user to define the source(s) of the data to retrieve and assign to the user defined variable. It allows for lists of preferred and alternate data sources, multiple possible variable names, and location and time dependencies. A description of each of the columns in the Data Sources Definition form is given below.

ADI Data Sources Definition Form Parameters
Process Element	Required	Description and Comments
Priority	No	Integer representation of priority when alternative Datastream Sources are specified. When priority is not populated, the first row is the highest priority and the last is the lowest. Dragging and dropping the rows into the desired order is another way to adjust priority.
Datastream Class	Yes	Datastream from which the variable with the name noted in the ‘Field(s)’ column will be retrieved. Must be populated first before any of the other elements in the Data Sources Definition form can be populated.
Field(s)	Yes	Name of the variable to retrieve as found in the datastream defined as the Datastream Class. Initially populated with a default value equal to the user defined Variable Name from the Retrieval Definition form. This default value is noted by brackets. Value defaults to the user defined Variable Name in the Retrieval Form. If the datastream is loaded into the history database, clicking on the Fields cell will bring up a drop list populated with all possible variable names. If not, the user should enter the desired variable name followed by a <return>. If more than one variable is entered into the Fields column, the retriever searches the input datastream file for each of the variables in the order listed, until one is found. The variable names shown in the drop list reflect all the variables that have existed for that datastream over all time, not just the variables in the datastream’s latest DOD.
Location	No	The variable names shown in the drop list reflect all the variables that have existed for that datastream over all time, not just the variables in the datastream’s latest DOD.
Location Dependency	No	Used when the datastream from which to retrieve data is a function of the site/facility at which the VAP is being run.
Time Dependency	No	Used when the datastream from which to retrieve data is a function of what period the VAP process is running. If a begin or end time dependency is not selected, the time dependency defaults to the beginning of the datastream or end of the datastream respectively.

An example of both a location and time dependency is illustrated in the preceding figure. In this example, when the VAP is run for sgpB4, the user defined variable ‘cloud_base_height’ will be correlated to the first_cbh variable in the vceil25k.a1 datastream. If it is not running at sgpB4 and the date being processed falls before April 1, 2001, the user defined variable ‘cloud_base_height’ will be correlated to the variable ‘first_cbh’ in the vceil25k.a1 datastream. For process times April 1, 2001 or greater, and when processing at sites other than sgpB4, the user defined variable ‘cloud_base_height’ will be correlated to the first_cbh variable in the vceil25k.b1 datastream.

Output Field Mapping Form Overview¶

This form is accessed by double clicking a cell in the Retriever Editor, Output(s) column. It consists of row for each of the possible output datastreams with a drop box containing all the variables in that output DOD. To associate a variable in the Retriever Editor to a specific output variable simply select the desired variable from the drop box next to the datastream. This will result in the values associated with the retrieved variable being mapped to selected variable in the output datastream.

Coordinate System Definition Form Overview¶

In most cases, a new coordinate system can be fully defined via the coordinate system definition form. To transform a variable to a new coordinate system means to define new values for one or more of the variables dimensions, and update the variable’s values to reflect the new ‘grid’. The coordinate system of the retrieved variable will be referred to as the ‘source’; the coordinate system of the new grid will be referred to as the ‘target’. The form supports 3 transformation types (1) averaging, (2) interpolation, and (3) nearest subsample. The parameters that can be specified via the form are documented in the following table for each transformation type.

General Coordinate System Definition Form Parameters
Process Element	Required	Description and Comments
Variable(x,y, …n) where x,y, …n represent the dimensions that make up target coordinate system.	Yes	Name of each dimension for the retrieved variable. The order of the dimensions must match the order of the dimensions of the retrieved variable. If the name of dimension is to be changed, the new name should be entered.
Coordinate system name	Yes	The name of the coordinate system as stored in the CDSTrans structure and named in ADI templater generated header files. If no transformation is performed on a retrieved variable’s dimensions, then a CDSin structure is used to store the information and a coordinate system name is not needed.
Units	No	If set, the dimension will be converted to the indicated units prior to the transformation.
Data type	No	If set, the data type will be converted to the type indicated prior to the transformation.
Use mapping	No	Control button. If selected, it updates the form to display a table from which the user can select the datastream’s grid, onto which the indicated dimension will be mapped. If not selected, drop boxes and cells necessary to define a uniform grid are displayed.

text

Uniform Grid Coordinate System Definition Form Param
Process Element	Required	Description and Comments
Transform type	No	Allows user to select the type of transform applied, such as average, interpolation, subsample etc. By default if output bins are larger then input bins then the data is averaged, if output bins are smaller then data is interpolated, if bin size is the same no transformation is applied.
Bin alignment	No	Tells you where in the bin the coordinate variable for the dimension is located in the context of ‘beginning, middle, and end’ values. Default value is middle.
Interval	Yes *	Specifies the difference between two values of the given coordinate variable to generate a uniform grid.
Start	Yes *	The value of the coordinate dimension for the first element in the output grid.
End	Yes *	The value of the coordinate dimension for the last element in the output grid.
Length	Yes *	The number of bins, or distinct values for the coordinate dimension. For the dimension time this equals the number of samples in the file.
Transform type	No	Allows user to select the type of transform applied (average, interpolation, subsample, etc.) By default if output bins are larger then input bins then the data is averaged, if output bins are smaller then data is interpolated, if bin size is the same no transformation is applied.

For the interval, start, end, and length parameters, the user sets three of the four and the last is calculated and automatically set.

Mapped Grid Coordinate System Definition Form Parameters
Process Element	Required	Description and Comments
Datastream group	Yes	The datastream to map to is determined by the user entering the name of the datastream group for which the target datastream is the highest priority datastream.

In addition to the parameters provided in the form, additional parameters can be defined in a configuration file to further refine the transformation. Each of the transformation types, and the flat file that can be used to define them are discussed in detail in Transforming or Regridding Retrieved Variables onto a New Coordinate System.

The entries on the Coordinate System Definition form support the two most common types of transformations, averaging and interpolation. Through this form, the target grid can be defined in one of two ways:

By specifying a constant interval between values, a start value, an end value, and the total number of samples.
By selecting an existing grid on which to map a variable.

The former is referred to as a uniform transformation, the latter, a mapped transformation. Unless the transform type is explicitly defined in the transform configuration file, the libraries determine whether an averaging or interpolation transformation is needed. If the target grid bins are larger than the source grid bins, the data will be averaged to match the new grid. If the target grid bin size is smaller, then interpolation will be applied. If either or both grids are irregular, then ADI will attempt to guess which default transformation should be used based on the average interval over the whole span of the grid.

_images/pcm_uniform_transform_view_coordinate_system.png

The coordinate system in the figure above is an example of a uniform transformation of the time dimension. It has been assigned the name “thirty_second”, transforms the dimension time onto a uniform grid that starts at 0 seconds, grows in increments of 30 to 86370, with a total 2880 values.

_images/pcm_datastream_mapping_transform_coordinate.png

Populating the Retrieval Definition Table¶

Retrieval Definition table variables and data sources can be populated by either:

Specifying the variable names and datastream sources by typing in the fields in the Retrieval Definition Table.
Accessing the DOD of an existing datastream using the Datastreams tab on the left panel of the Process Configuration Manager and dragging and dropping variables to retrieve into the table.

Manual entry is more efficient where there is more than one datastream from which to retrieve the variable. When a variable’s source is a single datastream (no alterate sources if that datastream is unavailable), it is more efficient to access the DOD of the input datastream and drag and drop variables onto the Retrieval Definition table.

To enter the Retrieval Definition form.
1. From the Process Definition Tool (Figure 2.4) select the ‘This process uses a retrieval for its input configuration’ button.
2. Select the ‘Edit Retrieval’ button.
Populate the Retrieval Definition Table.

Note

do not retrieve netCDF standard (lat, lon, alt) or coordinate dimension variables (time, height, range) for a retrieved variable as these will be automatically retrieved.

Note: Do not retrieve netCDF standard (lat, lon, alt) or coordinate dimension variables (time, height, range) for a retrieved variable as these will be automatically retrieved.

Manual Entry of Input Data Variables and Sources¶

Select the green plus symbol located to the left of the table form.
Select the ‘custom_field_1’ variable and enter the name of the variable to retrieve.
Indicate whether the variable must be found for the VAP to run via the ‘Required’ check box.
Select ‘Source(s)’ [NONE] in the Datastream column
Select the pencil icon to bring up the Data Sources Definition form.
Select the Datastream column corresponding to the Field with the value of the variable you just defined to bring up a drop list of possible datastreams.

If the data source is a single datastream with no alternative sources:
- Select the datastream from which the variable should be retrieved then proceed to step ‘g’.
If the data source is a single datastream with alternative sources based on datastream availability:
- Select the most preferred datastream from which the variable should be retrieved then proceed to step ii.
If the data source is a single datastream with alternative sources based on location or time dependencies:
1. Select the datastream from which the variable should be retrieved and define the most restrictive dependencies.
2. Create a new row in the Data Sources Definition table by either selecting the green plus symbol to the left of the table, or by duplicating an existing row in the table by selecting the paper symbol on the left.
3. Update the Datastream Class column of the new row to reflect the next most preferred (or next most restrictive) data source.
4. Repeat the addition of new rows until options are exhausted.
5. Review data source priority and populate the ‘Priority’ column as necessary.
If the name of the variable found in the Datastream Class datastream does not match the default value, update the entry in the ‘Field’ and select the desired variable name.
If a second value is to be retrieved and correlated to the user defined Variable Name in the Retrieval Definition form (meaning the user defined variable in the Retrieval Definition form will be an array of more than one value) specify the data sources and associated values of the additional values as follows:
1. Select the ‘Show Advanced Controls’ button in the upper left corner of the ‘Data Sources Definition’ window. This will bring up additional icons along the top left of the Query window to add, close, delete, and adjust the order of additional queries.
2. Define a new query to retrieve the additional data value by selecting either the green plus or the paper sheet icon to add a new or duplicate query.
3. For the new query define the data source(s) and update the field name.

9. Close the Data Sources Definition window by selecting the ‘x’ in the upper right corner of the window. Return to the Retrieval Definition form and specify additional variables to retrieve by adding new rows to the Retrieval Definition table.

Dragging and Dropping Input Data Variables and Sources from Existing DODs¶

You can populate the Retrieval Definition Table by Dragging and Dropping from Input Datastream DODs. For example, from the datastream and the variables you can drag and drop the desired variable into the Retriever Definition table. The Source(s), Variable Name, and QC retrieval status will be populated. Update these as required.

Select the Datastream tab and locate the datastream from which to retrieve the variable.
Select the triangle next to the highest DOD version of the desired input datastream (for example the DOD) to list the dimensions, variables, and global attributes associated with the DOD.
Select the triangle next to the Variables to expand the variables.
Select the variable to retrieve it with a single click and drag the variable from the left frame and drop it into the Retrieval Definition frame on the right (Figure 3.3).
Update the variable’s Source(s), Variable Name, Units, Data Type, QC, and Offset values as necessary.

For the example VAP we will build in this tutorial, we will retrieve first_cbh, qc_first_cbh, and backscatter variables from the vceil25k.b1 datastream. The first_cbh will be saved into a user defined variable name of ‘cloud_base_height’ and written to the output netCDF file with that name. If that datastream is unavailable the variables will be retrieved from the vceil25k.a1. The units of the first_cbh will be converted to centimeters and the successful retrieval of the QC variable will be required for the example_vap process to run. The Retrieval Definition table for example_vap, with the Data Sources Definition form open for the first_cbh variable, is shown in the following figure.

_images/pcm_data_source_def_form_VAP.png

The duplicate entry icon (paper sheets icon) in the Retrieval Definition form was used to add the backscatter variable since it is retrieved using the same Data Sources Definition query (i.e., sets of possible input datastreams). The duplicate entry was updated as appropriate for the backscatter variable (update the Variable Name, Field(s), and Units). The coordinate dimensions of the retrieved variables (time and range) and lat, lon, and alt are not included in the Retrieval Definition table as they will be automatically retrieved. Note that the Location Dependency and Time Dependency check boxes in the Data Sources Definition have been deselected as they are not applicable to this example.

Transforming or Regridding Retrieved Variables onto a New Coordinate System¶

This section will have documentation on details of transformation.

Saving the Retrieval Definition to DSDB¶

The input data retrieval specifications are saved to the DSDB from which it is accessed by the ADI templater application, create_adi_project, to create the project source code files used at run time by the VAP. Note in Figure 3.5 that the user defined variable names that are retrieved from the input datastreams are summarized above the ‘Edit Retrieval’ button.

To save retrieval data to the DSDB, perform the following steps.

Select the ‘Done’ button in the lower left corner of ‘Retrieval Definition’ form to return to the ‘Process Definition’ form.
Select the ‘Save’ button in lower left corner of ‘Process Definition’ form.

_images/pcm_completed_proc_def_form_VAP.png

Running Processes Defined in the PCM¶

Processes defined in the PCM can be run through the core ADI modules (retrieve, merge, transform, and store) by running the data_consolidator application through the PCM Process GUI or by running the data_consolidator application in a terminal. These core ADI modules are also executed when a VAP process is executed but the process is made up of not only the core libraries, but also code specific to the VAP’s algorithm that is defined in that VAP’s user hooks. There is value in running PCM processes independent of VAP specific code to either validate that the PCM Process and PCM DOD are setup correctly prior to beginning the development of the source code for a new VAP, or to create a consolidated dataset for use in scientific analysis.

Data Consolidation Tool¶

The ‘data_consolidator’ is an application that performs the transformations and mappings from retrieved variables to output variables for any process defined in the PCM. As such it allows users to consolidate data from diverse datastreams without the need to create or compile any source code. It takes as input the name of the retriever process whose retrievals, transformations, and input to output mappings are to be applied and the typical ARM process arguments. The data_consolidator can be run from either the UI or from a terminal.

Running Data Consolidation from UI¶

To run the data_consolidator open the PCM process you want to run and select ‘Run Data Consolidator’ from the left panel.

Select ‘Run Now’ button and enter the requested information.

Running Data Consolidation from Terminal¶

The data_consolidator command line arguments include the typical arguments for any ADI process with the addition of “-n <process>” to specify the process. The frequently used arguments include:

-n  <process>
-s <site>
-f <facility>
-b <begin date>
-e <end date>
-a <database>  (possible values are "dsdb_ref" and "devws")
-D <debug level>   (level 2 will dump retrieved, transformed, and output structs)
-P (to log provenance)
-R ( reprocessing flag to allow the overwrite of previously created netCDF files).

Additional arguments include:

--log-dir <path>  (path to the log file directory)
--log-file <file> (name of the log file)
--log-id <id> (replaces the timestamp in log file name with the specified id)
--max-runtime <seconds> (sets the max runtime for the process, 0 disables max runtime check)
--files <file1,file2,…> (for ingests only, specifies comma delimited list of files to process)
--asynchronous (disables the process lock file, disables check for chronological data processing, disables overlap checks with previously processed data, forces a new file to be created for every output dataset).
--dynamic-dods (creates a dod on the fly when the process does not have one assigned to it in the PCM).  This requires the following
   - A datastream name be entered into the PCM Proceses Inputs and Outputs
   form that does not have a DOD associated with it. The output
   file will use this datastream name.
   - The PCM Retriever Editor have entries in the Outputs column that
   map the retrieved variable to the output datastream.
   The names provided in the mapping will be the names used in the
   output file produced.

With respect to (-e), end date, please note that the process will run for the date specified as the begin date up to the end date (i.e., NOT through the end date).

If the debug level is set to two (-D 2), the data_consolidator app will dump the contents of the retrieval, transform, and output structures to a subdirectory ‘debug_dumps’. The dump files created and the structure they contain are listed below.

<site><process_name><facility>.YYYYMMDD.HHMMSS.post_retrieval.debug
<site><process_name><facility>.YYYYMMDD.HHMMSS.pre_transform.debug
<site><process_name><facility>.YYYYMMDD.HHMMSS.post_transform.debug
<site><output_datastream_name><facility>.<output_datastream_level>.YYYYMMDD.HHMMSS.process_data.debug

FAQ¶

Q: If I do not intend to alter any of the coordinate variables of a retrieved variable in any way, do I have to assign a name to the coordinate system in the PCM Coord System column?

A: No. If no name is provided the coordinate system will be assigned a name equal to auto_ <datastream_name>_<datastream_level>. For example auto_mfr10m_b1. If any change is made to the coordinate system a name of coords_<x> is assigned as x incrementally increases. It is recommended that users apply a more meaningful name to their coordinate system definitions.

Q: What time does base time reflect?

A: The default value of base_time will always be the time of midnight prior to the first sample time. You can change this to be the time of the first sample in the file by calling dsproc_set_base_time just prior to setting the times in the output dataset.

Process Control Manager (PCM) Interface¶

PCM Description¶

Process Definition¶

Process Locations and Other Options Form¶

Inputs/Outputs Form¶

Defining a New Process¶

Updating an Existing Process¶

Specifying Variables to Retrieve and Conversions and Transforms to Apply¶

Retrieval Definition Table Overview¶

Data Sources Definition Form Overview¶

Output Field Mapping Form Overview¶

Coordinate System Definition Form Overview¶

Populating the Retrieval Definition Table¶

Manual Entry of Input Data Variables and Sources¶

Dragging and Dropping Input Data Variables and Sources from Existing DODs¶

Transforming or Regridding Retrieved Variables onto a New Coordinate System¶

Saving the Retrieval Definition to DSDB¶

Running Processes Defined in the PCM¶

Data Consolidation Tool¶

Running Data Consolidation from UI¶

Running Data Consolidation from Terminal¶

FAQ¶

Table of Contents

Previous topic

Next topic

This Page