Algorithm Development

The main modules of the ARM Data Integrator (ADI) workflow are executed using the Data System Process Library (DSPL)( libdsproc3 ). This library forms the basis of the projects created by the create_adi_project template generator.

The data consolidator is a special instance of a transform template that does not utilize any hooks.

The Data System Process Library is a suite of callable C functions that simplify the access to, manipulation of, and addition of data to ADI’s internal data structures.

Algorithm Overview

This section describes VAP algorithms created by the create_adi_project application in the context of the ADI main processing modules, hooks, and ADI’s internal data structures. Beginning with Algorithm Development, we provide an overview of the libdsproc3 library and offer example code for tasks commonly performed within an ADI process.

An ADI VAP source file consists of a main module and function definitions for the hooks utilized by the algorithm. The main module header file consists of a prototype of the hooks, and data constants for process related information. Three additional header files define data constants for accessing variables in the input, transformed, and output internal data structures.

Main Module

The main function, located in the <process_name>_vap.c file and shown in the following figure, ADI PCM Main Flowchart, sets hooks that are to be used in the process and calls dsproc_main which serves as the entry point to ADI’s data processing pipeline. The data_consolidator main function consists of a call to dsproc_main with no hooks set. The functions used to set the hooks are described in section Dsproc_main.

ADI main flowchart

ADI PCM Main Flowchart

Dsproc_main

The process diagram of the dsproc_main function is represented in the following figure ADI dsproc_main Flowchart. The dsproc_main function calls ADI’s data processing modules (functions prefixed with dsproc_) and the hook functions that were set in in Main(). The functions shown in gray are implemented in both transform and retriever process models (i.e., templates run with -t options equal to either transform or retriever). The functions shown in blue are only relevant to a transform process model, while the orange functions represent hooks available to both retriever and transform templates, but for which a hook is not automatically set in, nor are functions defined in Main(). If one of these hooks are needed, the developer will need to update the main() function to set these hooks and add the functions and prototypes for the hooks to the project.

Dsproc main flowchart

ADI dsproc_main Flowchart

The process interval indicated represents the size of the chunks of data that will be processed and the size of the output file produced. Note that the data processing is performed within these process interval chunks. The figure above, ADI dsprov_main Flowchart, also illustrates the flow of data within a given process interval in the context of the internal data structures, which are noted in italicized text between the function calls. These structures include a retrieved data structure (ret_data) that stores the information in the format in which it was pulled from the input files, a transformed data structure (trans_data) that stores the data after any coordinate dimension transformations defined in the PCM have been applied, and the output data structure (out_data), which is the structure from which output files are created. While only the most relevant of these structures is passed into a particular hook, once a structure has been created the variables within that structure are available to hooks downstream of its creation point via variable access functions provided by the libdsproc3 library. These functions are discussed in more detail in the section ADI Data System Process Library.

Supporting Data

!!!I need to rewrite this so it is a general description of the parameters made available to the developer through the use of header files in C and structures in Python and IDL!!!

A main header file is created which contains prototypes of the hooks used by the process and data constants related to the process such as the process name and output datastream names and levels. Header files for each of the internal data structures (ret_data, trans_data, and out_data) are also created. These files provide data constants that allow the developer to access variables within the respective structures without defining their own environment variables.

The following list documents the data constants that developers should make use of for each of the auto-generated header files. Variables in the retrieved data structure are accessed by name, transformed variables are described in context of the coordinate systems to which they belong, and output variables are described in context of the output datastreams to which they are written.

<process> vap.h

<PROCESS>_VAP_NAME - VAP name, use to get the dsid of output datastream

<OUTPUT_DS_NAME>_DS_NAME - output datastream name(s), use to get dsid of output datastream

<OUTPUT_DS_NAME>_DS_NAME - output datastream level(s), use to get dsid of output datastream

MISSING_VALUE - use for -9999 values

<process> input_fields.h

invarNameList[] = {NULL terminated list of all retrieved variables } - list of all retrieved variables using user-defined names (not names as found in the file). Use to identify number of retrieved variables. Use with specific index or over loop as input to a Dataset Variables libdsprc3 function.

*<VAR_NAME> - indexes to list of var names in invarNameList, use with invarNameList to access specific var name in list (invarNameList[TEMP], where #define TEMP 0).

<process>_trans_fields.h

transvarNameList_<coord_name>_<ds_group>_<coord>[] = {NULL terminated list of all variables from the indicated data sources datastream group being transformed to the referenced coordinate system } - list of variables undergoing a coordinate system transformation organized into subgroups based on the data source group they belong to and the coordinate system to which they are being gridded.

  • Use to identify the number of transformed variables from a particular datastream group.

  • Use with specific index or over loop as input to a Dataset Variables libdsprc3 function.

<COORD_NAME>_<DS_GROUP>_<VAR_NAME> - indexes to list of vars for the specified coordinate system and datastream source, use with transvarNameList_<coord_name>_<ds_group>[].

<process>_out_fields.h

outvarNameList_<out_ds_name>_<out_ds_level>[] = {NULL terminated list of all variables from the indicated output datastream } list of variables undergoing a coordinate system transformation organized into subgroups based on the data source group they belong to and the coordinate system to which they are being gridded.

  • Use to identify number of transformed variables from a particular datastream group.

  • Use with specific index or over loop as input to a Dataset Variables libdsprc3 function.

<OUT_DS_NAME>_<VAR_NAME> - index to list of vars for the specified output datastream. Use with outvarNameList_<out_ds_name>_<out_ds_level>.

Supporting Functions

!!Description of Supporting Functions here.!!

Algorithm Development Tutorial

This section describes the process of developing an algorithm by providing examples that illustrate the various elements of a VAP.

ADI_example1

This algorithm example retrieved variables from the aosmet.a1 and aoscpc.a1 datastreams. Variables retrieved include windspeed and wind direction.

  • A pre_transform_hook is used to convert the windspeed and wind direction into windspeed u and v components.

  • A transform is applied to the windspeed u and v components to convert them to a 1 minute sampling interval.

  • A post_transform_hook is then used to convert the windspeed u and v components back to wspd and wdir.

  • The process_data hook illustrates working with variable data on an individual and group basis.

  • Process_data creates new variables that were not retrieved, but were defined in the output datstream, and assigns them values.

The transformed wspd, wdir, and new variables are presented in the output datastream aosmetexample1.a1. Below are detailed descriptions of the steps performed in this VAP. The test data is run for the ‘sbs’ site, at facility ‘S2’, for 20110401.

C Source Code

The C source code files and test data can be viewed here:

https://engineering.arm.gov/~gaustad/docs/vap/adi_example1/C_source/

IDL Source Code

The IDL source code files and test data can be viewed here: https://engineering.arm.gov/~gaustad/docs/vap/adi_example1/idl_source

Python Source Code

The IDL source code files and test data can be viewed here: https://engineering.arm.gov/~gaustad/docs/vap/adi_example1/py_source

Output Datastreams

adicpcexample1.a1
aosmetexample1.a1

Hook Descriptions

The following hooks were implemented in adi_example1.

init_process_hook

Not used.

pre_transform_hook

The pre_transform_hook() function calls the adi_example1_pre_transform_hook() function in adi_example1_pre_transform_hook.c written for adi_example1 VAP. The pre_transform_hook function performs the following actions:

  • Get pointers to retrieved windspeed and wind direction variables using dsproc_get_retrieved_var.

  • Determines sample count from CDSVar sample count attribute.

  • Get array of data values from variable pointers dsproc_get_var_data_index.

  • Create two new variables in retrieved dataset for windspeed u and v components by cloning existing retrieved windspeed and wind direction variables dsproc_clone_var.

  • Allocate memory for newly created variables dsproc_alloc_var_data_index.

  • Find the missing value of variable using dsproc_get_var_missing_values.

  • Assign transform to new u and v windspeed variables by copying transformation information from retrieved windspeed variable which has the desired transform applied using dsproc_copy_var_tag.

  • Retrieve windspeed and wind direction variables that had the transform applied so that windspeed u and v component variables could use transform definition. Update flag retrieved windspeed and wind direction variables so transform defined in PCM is not applied using dsproc_set_var_flags.

  • Print retrieved dataset to a flat file for debugging using cds_print. This is created only when VAP runs with the -D 2 option.

post_transform_hook

The post_transform_hook() function call adi_example1_post_transform_hook() function in the file adi_example1_post_transform_hook.c written by the developer. The post_transform_hook function performs the following:

  • Get pointers to retrieved windspeed and wind direction variables using dsproc_get_retrieved_var.

  • Get pointers to transformed windspeed u and v component variables by name using dsproc_get_transformed_var.

  • Get pointers to the transformed windspeed u and v component variable data using dsproc_get_var_data_index.

  • Create new windspeed and wind direction variables in the transformed dataset by cloning variables in the retrieved dataset dsproc_clone_var (Recall that the retrieved windspeed and wind direction variables were not transformed).

  • Allocate memory for newly created wind speed and wind direction variables dsproc_alloc_var_data_index.

  • Find missing value of variable using dsproc_get_var_missing_values

  • Calculate newly created windspeed and wind direction variable from the transformed windspeed u and v components.

  • Copy variable tag information from retrieved windspeed and wind direction variables to new transformed windspeed and wind direction created from transformed windspeed u and v components using dsproc_copy_var_tag. The variable information needed is the specification of the output datastream to which the new variables should be mapped.

  • Remove the var tags from the windspeed u and v component variables so they are not mapped to the output dataset dsproc_delete_var_tag.

Note

If these variables were to be preserved and included in the output, this step should not be performed).

  • Print transformed dataset to flat file for debugging using cds_print. This is created only when the VAP is run with the -D 2 option.

process_data

This function illustrates two ways of working with variables either individually or by grouping them by the output datastream to which they belong. Specific steps applied in process_data hook are listed below:

  • Define and allocate memory for the CDSVars and data that will be used in this function.

  • Get pointer to output CDSVar variable structures using dsproc_get_output_datastream_id then dsproc_get_output_var

  • Get number of samples for each output datastream by getting the time coordinate variable using datastream dsproc_get_time_var, and then accessing its sample_count attribute

  • Get pointers to data of the CDSVars. For variables in the output that have been prepopulated with retrieved and then transformed variable data call dsproc_get_var_data_index. For variables in the output dataset not based on retrieved variables call dsproc_init_var_data_index (to allocate memory and set initial values to either MISSING or zero) or dsproc_alloc_var_data_index (to allocate memory without initializing values, for cases where a fill values is preferred as the intial value).

  • Find missing value of variable using dsproc_get_var_missing_values.

  • Perform analysis that includes creating two new variables rh_ambient_n and temperature_ambient_n as multiple of existing variables rh_ambient and temperature_ambient. If the original values were missing the qc of the new variables are also set to missing. The new variables are documented in the asosmetexmaple1.a1 datastream.

  • Dump both the transformed (input_data) and output (out_met_ds) dataset to flat files for debugging using dsproc_dump_transformed_datasets and dsproc_dump_output_datasets. These are created only when the VAP runs with the -D 2 option.

ADI_example2

This example is of a ‘retriver’ VAP’, meaning a vap for which the transformation module is skipped. By doing this, the algorithm is relieved of requirements needed to apply transformations which can allow developers to process data in ways not otherwise possible. This example uses the ‘retriever’ template so that it can illustrate observation based processing, meaning the processing of each individual input file through the ADI data processing pipeline by skipping over the merge and process data modules. Skipping merging is key to retaining individual input files, and it is necessary to skip the process data module to prevent ADI from attempting to automatically map the retrieved variables to the output datastream. The ADI modules invoked for this example consist of initialize, retrieve, post_retrieval, and finish.

  • Uses the ‘retriever’ template which means the transformation module is skipped

  • Invokes observation based processing in the post_retrieval_hook which is available to retrieval type vaps

  • Illustrates use of user data structure to pass information between the ADI modules

The test for this example is run for site ‘sgp’, facility ‘C1’ for 20000303.

Source Code

C Source Code

The C source code and data files can be viewed here: https://engineering.arm.gov/~gaustad/docs/vap/adi_example2/c_source

IDL Source Code

The IDL source code and daata files can be viewed here: https://engineering.arm.gov/~gaustad/docs/vap/adi_example2/idl_source

Python Source Code

The Python source code and data files can be viewed here: https://engineering.arm.gov/~gaustad/docs/vap/adi_example2/py_source

Output Datastreams


adiexample2.c1

Hook Descriptions

init_process_hook - initializes user data structure

pre_retrieval_hook - not used

post_retrieval_hook

Invokes observation processing. Populates user data structure. Stores data observation by observation

pre_transform_hook - not used

post_transform_hook - not used

process_data_hook - not used

ADI_example3

This is an example of a process that updates an input datastream so that it conforms to ARM DOD standards. Specifically this process maps 30qcecor.c1 data to a 30qcecor.s1 output DOD that includes additional global attributes to capture the original input file meta data and adds new global attributes not yet automatically populated by the ADI libraries. Specific changes to the retrieved DOD include:

  • renames ‘h’ variable to ‘latent_head_flux’ through the PCM Output Field Mapping Form

  • creates user data structure to pass input and output datastream ID values across hooks

  • loops over all global attributes, filtering for attributes with a suffix equal to the input datastream name and data level (30qcecor_c1). Once these are idnetified, global attributes in the input datastream with the same name excluding the sufix are copied to capture the meta data associated with the the original input file. For example the adiexample3.s1 output DOD includes a global attribute = command_line__30qcecor_c1 and that contains the same information as the command_line global attribute found in the input datastream, 30qcecor.c1.

  • updates the facility_id, location_decription and platform_id for new standards

Currently this example is only implemented in IDL

Source Code

IDL Source Code

The IDL source code and data files can be viewed here: https://engineering.arm.gov/~gaustad/docs/vap/adi_example3/idl_source

Output Datastreams


adiexample3.c1

Hook Descriptions

init_process_hook - initializes user data structure

pre_transform_hook - not used

post_transform_hook - not used

process_data_hook

Invokes observation processing. Populates user data structure.

finish_process_hook - not used

ADI_example4

This example takes the 6 narrowband upwelling irradiance variables for each filter and consolidates these variables into a single two-dimensional variable. It was developed to test the IDL and Python bindings for dimension and attribute dsproc functions. The example performs the following actions:

  • Sets the dimension length of a new two-dimensional variable.

  • Populates coordinate dimension variables.

  • Changes valid_max attribute value of a variable (up_hemisp_broadband) from what was read from a file to 1000.

  • Defines comment attributes of hemisp_broadband.

  • The process_data hook creates two new variables and assigns values based on multiples of existing values.

The test for this example is run for site ‘sgp’, facility ‘C1’ for 20000303.

Source Code

C Source Code

The C source code and data files can be viewed here: https://engineering.arm.gov/~gaustad/docs/vap/adi_example4/c_source

IDL Source Code

The IDL source code and daata files can be viewed here: https://engineering.arm.gov/~gaustad/docs/vap/adi_example4/idl_source

Python Source Code

The Python source code and data files can be viewed here: https://engineering.arm.gov/~gaustad/docs/vap/adi_example4/py_source

Output Datastreams


adiexample4.c1

Hook Descriptions

init_process_hook - not used

pre_transform_hook - not used

post_transform_hook - not used

process_data

This function tests dimension and attribute functions. Specific steps applied in process_data hook are listed below:

  • Get pointer to output CDSVar variable structures using dsproc_get_output_datastream_id then dsproc_get_output_var

  • Set the dimension length of the second dimension of the two-dimensional upwelling narrowband variable by first getting the CDSout data structure using dpsroc_get_output_dataset and then dsproc_set_dim_length.

    Note

    Because this example sets the dimension length of the ‘filter’ coordinate variable, the dimension length cannot be defined in the DOD. Therefore this example cannot be run using the data consolidator as a null value for ‘filter’. Doing so would result in two unlimited dimensions.

  • Validates dsproc_get_dim by verifying the length was set correctly.
    • Gets pointer to new filter coordinate variable using dsproc_get_coord_var and populates it with values by first allocating memory dsproc_alloc_var_data_index (this allocates memory without initializing values) which returns a pointer to the data.

  • Groups the six narrowband variables into a single 2-dimensional variable.
    • Get qc for destination variable, dsproc_get_qc_var

    • Get source and qc variables dsproc_get_transformed_var and dsproc_get_qc_var

    • Verify source and destination variables are compatible: dsproc_get_input_datastream_id, dsproc_get_retrieved_dataset, dsproc_get_dim_length, dsproc_var_sample_count

    • Assign data values to new 2 dimensional variable and its companion qc variable using dsproc_get_var_missing_values, dsproc_var_sample_count , dsproc_alloc_var_data_index, and dsproc_get_var_data_index

  • Test attribute functions.
    • Change scalar att value using dsproc_get_att, dsproc_get_att_value, and dsproc_set_att_value

    • Change text att value using dsproc_set_att_text

    • Change text att value using dpsroc_set_att_value

  • Dump both the transformed (input_data) and output (out_met_ds) dataset to flat files for debugging using dsproc_dump_output_datasets. This is only created when the VAP is run with the -D 2 option.

FAQ

!!Fequently Asked Questions here.!!