Cookbook

The “cookbook” contains example code for frequently performed operations.

Datasets

Examples of retrieved dataset ids and datasets are provided.

Get Output Datastream ID

C Implementation:

int dsproc_get_output_datastream_id(
    const char *dsc_name,
    const char *dsc_level);

C Example:

dsid_out = dsproc_get_output_datastream_id('surfspecalb1mlawer', 'c1');

IDL implementation:

result = dsproc::get_output_datastream_id(dsc_name=string, dsc_level=string)
IDL Example::

dsid_out = ds.get_output_datastream_id(dsc_name=’surfspecalb1mlawer’, dsc_level=’c1’) if dsid_out lt 0 then return(-1)

Get Input Datastream ID

C Implementation:

int dsproc_get_input_datastream_id(
    const char *dsc_name,
    const char *dsc_level);
Example::

dsid_in = dsproc_get_input_datastream_id(“sirs”, “b1”);

Get Retrieved Dataset

C Implementation:

CDSGroup *dsproc_get_retrieved_dataset (
  int dsid,
  obs_index);

The example assumes dsid_in has been obtained using Get Input Datastream ID and that the call is made after the merge routine (so obs_index is zero) Example:

CDSGroup *in_cds;
if(!(in_cds = dsproc_get_input_dataset(dsid_in, 0))) return(-1);

Get Output Dataset

C Implementation:

CDSGroup *dsproc_get_output_dataset (
  int dsid,
  obs_index);

The example assumes dsid_out has been obtained using Get Output Datastream ID and that the call is made after the merge routine (so obs_index is zero) Example:

CDSGroup *out_cds;
if(!(out_cds = dsproc_get_output_dataset(dsid_out, 0))) return(-1);

Get Transformed Dataset

C Implementation:

CDSGroup *dsproc_get_transformed_dataset (
  const char *coordsys_name,
  int        dsid,
             obs_index);

The example assumes dsid_in has been obtained using Get Input Datastream ID , that the call is made after the merge routine (so obs_index is zero), and that the code is applying to a process that has a coordinate system named “coordsys_in” defind in the PCM

Example:

CDSGroup *out_cds;
if(!(out_cds = dsproc_get_transformed_dataset("coordsys_in", dsid_in, 0))) return(-1);

Get Number of Samples

The number of samples for a dataset should be determined from the time variable since the time variable will be unaffected by multiple dimensions.

C Implemetation Get the time variable used by a dataset:

CDSVar *dsproc_get_time_var (void *cds_object)
  • Where cds_object can be a CDSGroup or CDSVar, the function will traverse up the CDS objects until it finds the first occurance of either the “time” or “time_offset” variable.

  • The number of samples is determined from the sample_count attribute of the time variable. This attribute is type size_t.

Example assumes a variable with values has been accessed and stored in any_var using the appropriate function (dsproc_get_output_var, dsproc_get_retrieved_var, or dsproc_get_transformed_var) Example:

CDSVar *time_var;
if( !(time_var = dsproc_get_time_var(any_var)) ) return(-1);
nsamples = time_var->sample_count;

IDL Implementation

In the example the in_var must be a CDSVar from the dataset from which the sample count is desired:

time_var = (in_var.time_var
if ~obj_valid(time_var) then return, -1
nsamples = time_var.sample_count

Variables

The following variable examples can be used.

Get Pointer to an Output Variable Structure

Get output datastream ID using:

int dsproc_get_output_datastream_id (
  const char *dsc_name,
  const char dsc_level)

Get pointer to output variable using:

CDSVar  * dsproc_get_output_var(
  int        dsid,
  const char *var_name,
  int        obs_index)
  • Use the dsid received in step (a), for the var_name

  • Make use of the const char array created by the templater and associated indexes in the <vap_name>_output_fields.h.

  • The observation index will usually be zero as the files have been merged. before the process_data loop.

  • Trap for error via NULL return.

Example:

int      dsid_out;
CDSVar   *wspd;
dsid_out = dsproc_get_output_datastream_id(UNCERT_MET_DS_NAME,
           UNCERT_MET_DS_LEVEL);
if(dsid_out < 0) return(-1);
if (!(wspd = dsproc_get_output_var(dsid_out,
    outvarNameList_metuncert_c1[METUNCERT_C1_U_WSPD_X_COMP],
        0)) ) return(-1);

Get Pointer to an Output QC Variable Structure

get pointer to an output QC variable:

CDSVar * dsproc_get_qc_var(CDSVar *var)
  • variable to use as input is the pointer returned in (I).

  • trap for NULL return value

Example:

CDSVar *qc_wspd;
if (!(qc_wspd = dsproc_get_qc_var(wspd))) return(-1);

Creating a New Variable

This refers to creating a variable object (CDSVar). Creating a new variable involves creating the CDSVar structure by either cloning an existing variable or defining it from scratch. Determine the meta data information that needs to be associated with the new variable and set it accordingly as shown in the following examples:

  1. Create a new variable from scratch using dsproc_define_var ():

    CDSVar *dsproc_define_var (
    CDSGroup *        dataset,
    const char *      name,
    CDSDataType       type,
    int               ndims,
    const char  **    dim_names,
    const char *      long_name,
    const char *      standard_name,
    const char *      units,
    void *            valid_min,
    void *            valid_max,
    void *            missing_value,
    void *            fill_value )
    

    Or by cloning it from an existing variable using dsproc_clone_var():

    CDSVar *dsproc_clone_var (
    CDSVar *          src_var,
    CDSGroup          * dataset,
    const char *      var_name,
    CDSDataType       data_type,
    const char **     dim_names,
    int               copy_data)
    

The clone function allows you to pick and choose what information from the original variable is retained and thus can give you a jump start on the definition.

  1. If the variable needs to be transformed or if it is associated with the input data source of another variable, then copy that variable’s VarTag information to the newly defined variable using dsproc_copy_var_tag (). VarTag information includes the following:

    1. input datastream

    2. name of variable in input datastream

    3. source datastream group from PCM retriever

    4. min, max, and delta of a variable

    5. coordinate system

    6. output datastream and map to

    7. DQR information

  2. If the data source information of another variable is needed, but the coordinate system will be different, then overwrite the coordinate system after having copied the VarTag using dsproc_copy_var_tag ().

  3. If the new variable is going to be in an output datastream, then map the variable to that datastream using dsproc_set_var_output_target (). Otherwise the copying of the VarTag information results in both the original and the new variable being mapped to the same output variable, which is not acceptable.

Example:

/**
*  This function consolidates the steps involved in creating a new variable.
*
*  @param src_var       - the variable to copy
*  @param dest_dataset  - the dataset into which the var will be copied.
*                         NULL to use the same dataset as the source var
*  @param dest_var_name - the name assigned to the variable.
*                         if dest_dataset was NULL, then dest_var_name cannot be the same
*                         as the src_var->name
*  @param coordsys_name - coordinate system of the variable
*                         NULL to use default coordinate system of the source var
*  @param out_dsid      - id of the output datastream the variable will be written
*                         enter -1 if the variable will not be written to any file.
*  @param out_var_name  - name variable is assigned in output file
*                         NULL to use the first out_var_name for src_var
*                         (the source var may be mapped to one or more output data
*                         files, and be assigned one or more names in these output
*                         datastreams. If NULL and there is such a mapping, the
*                         copied var will be assigned the same name in its output file
*                         as the name found in the source vars output file.  If the
*                         src_var is written to two files and given different names in
*                         each, there will be no way for this function to know which one to use
*  @param copy_data     - 1 to copy all variable attribute and tags
*                         0 to exclude data
*
*  @return
*     - pointer to the variable cloned if successful
*     - NULL if an error occured
*
*/
CDSVar *my_clone_var(
   CDSVar     *src_var,
   CDSGroup   *dest_dataset, /* NULL to use same dataset as src_var */
   const char *dest_var_name,
   const char *coodsys_name, /* NULL to use default coordinate system,
                                which is coord system of input datastream */
   int         out_dsid,     /* -1 to not write var to any output datastream */
   const char *out_var_name, /* NULL to use first out_var_name from src_var */
   int         copy_data)    /* 1 to copy data, 0 to exclude data */
{
   CDSVar     *clone;
   int         ntargets;
   VarTarget **targets;
     /* Clone variable */
   clone = dsproc_clone_var(
       src_var, dest_dataset, dest_var_name, CDS_NAT, NULL, copy_data);

   if (!clone) {
       return((CDSVar *)NULL);
   }
   /* Copy the variable tag to preserve input source information*/
   /* Variable tag information is the information documenting*/
   /* the retrived variable from which the variable is derived*/
   /* the datastream it was retrieved from, and its name as*/
   /* found in the input datastream*/
   if (!dsproc_copy_var_tag(src_var, clone)) {
       return((CDSVar *)NULL);
   }

   /* Set the coordinate system of the variable */
   if (!dsproc_set_var_coordsys_name(clone, coodsys_name)) {
        return((CDSVar *)NULL);
   }

    /* Check if we need to set the output variable target */
    if (out_dsid < 0) {
       return(clone);
   }

    /* Check if the out_var_name was specified */
    if (!out_var_name) {

        /* if the user didn't prvoide an out_var_name, then use the one */
       /* from the source var */
        ntargets = dsproc_get_var_output_targets(src_var, &targets);
        if (ntargets) {
           out_var_name = targets[0]->var_name;
       }
   }

   /* Set the output variable target */
   if (out_var_name) {

       if (!dsproc_set_var_output_target(
           clone, out_dsid, out_var_name)) {

          return((CDSVar *)NULL);
       }
   }

  return(clone);
}

Getting an Attribute Value from the DOD

When possible, attribute values are set in a VAP’s output DOD. Therefore, a VAP algorithm would ideally access the attribute value directly from the DOD and not duplicate its value. To access an attribute value you need the output DOD’s datastream ID (dsproc_get_output_datastream_id ) and if the value is a field level attribute the name of the variable as it presented in the output DOD is as described in (Getting a Variable’s Name as Presented in DOD). With this information the attribute value can be pulled from the DOD using dsproc_get_dsdod_att_value ().

An example of pulling the min_value limit is provided in the function below:

/*********************************************************************************/
/**
 *  This function gets the min value of a variable
 *  @param dsid      - the datastream id of the dod in which min is defined
 *  @param var_name  - name of variable whose min value to get
 *  @param min_value - min value of variable
 *
 *  @return
 *    - 1 if successful
 *    - 0 if no value found
 *    - -1 if an error occured
 *
 */
int get_min_value(
    int      dsid,
    char     *var_name,
    float    *min_value)
{
    size_t   length;
    float    *tmp_min;

    tmp_min = (float *)dsproc_get_dsdod_att_value(dsid, var_name,
                "valid_min",CDS_FLOAT, &length, NULL);

    if(!tmp_min && length == (size_t)-1) return(-1);
    else if(!tmp_min) return(0);

    *min_value  = *tmp_min;
    free(tmp_min);

    return(1);
}

Get Pointer to Variable Data

If memory for a variable’s array has already been allocated, a pointer to the data can be accessed using:

void *dsproc_get_var_data_index(
CDSVar *var)

Example:

float *wspd_data;
if (!(wspd_data = (float *)dsproc_get_var_data_index(wspd))) return(-1);

Allocating Memory and Assigning Values to Variable Data

Unless a variables data has been populated through either a retrieval or by mapping a retrieved variable to an output variable it is necessary to allocate memory for the variable’s data before assigning it values. There are two ways to do this.

(1) Allocate memory for CDSVar data directly

The C function shown below allocates the memory for data associated with CDSVar and returns a pointer to this memory. Values can be assigned thorugh this pointer

The example assumes nsamples has been determined using Get Pointer to an Output QC Variable Structure and new_var from Get Number of Samples

C implementation:

void *dsproc_alloc_var_data_index(
CDSVar *var,
size_t     sample_start,
size_t      sample_count)

C Example:

float  *new_data;
CDSVar *new_var;

/* new_var and nsamples obtained prior to call) */
if(!(new_data = (float *)dsproc_alloc_var_data_index(new_var, 0, nsamples)) )return(-1);
for (i = 0; i < nsamples; i++) {
    new_data[i] = <desired values>
}

The equivalent function in IDL is CDSVar::alloc_var_data. The syntax, and an example where it is used to set an output variable that has not had been mapped to an retrieved variable is shown below

IDL implementation:

result = CDSVar:: alloc_var_data(sample_start, sample_count)

IDL Example:

out_var = ds.get_output_var(dsid, var_name, obs_index)
new_data = out_var.alloc_var_data(0, sample_count)
new_data = <array of desired values>
out_var.data = new_data

(2) Allocate memory for a local array and set the CDSVar data equal to it

The second approach is the allocate memory to a local variable and then set the data associated with a CDSVar structure/object equal to it using dsproc_set_var_data.

C implementation:

void *dsproc_set_var_data(
CDSVar      *var,
CDSDataType type,
size_t      sample_start,
size_t      sample_count,
void *      missing_value,
void *      data)

C Example:

if(!(new_data = (float *)calloc(nsamples, sizeof(float))) )return(-1);
for (i = 0; i < nsamples; i++) {
    new_data[i] = <desired values>
}
if(!(dsproc_set_var_data(new_var, CDS_FLOAT, 0, (size_t)nsamples, NULL,
     (void *)new_data))) return(-1);

In IDL make_array is used to allocate memory for a local variable and the data property invokes the dsproc_set_var_data() function. See IDL documentation for the syntax of make_array.

IDL Example:

new_data = make_array(sample_count, type=out_var.type)
new_data = <desired values>
out_var.data = *new_data

Getting a Variable’s Name as Presented in DOD

It is sometimes necessary to know the name a CDSVar variable is given in an output datastream. This information is commonly used to access variable attribute values in the DOD (Getting an Attribute Value from the DOD):

/*********************************************************************************/
/**
 *  This function returns the name a var is mapped to in a particular output datastream
 *  @param dsid         - the datastream id of the dod in which var is defined
 *  @param var          - the var for which to find an output name
 *  @param var_out_name - the name of the var in the output datastream

 *  @return
 *    - 1 if successful
 *    - 0 if an error occured
 *
 */
/*********************************************************************************/
int get_var_out_name(
    int    dsid,
    CDSVar *var,
    char   **var_out_name)
{
    int         status, i;
    VarTarget **out_targets;

    /* Get var name as it is in the output datastream */
    status = dsproc_get_var_output_targets(var, &out_targets);
    if(status == 0) {
        DSPROC_DEBUG_LV1("the variable %s is not mapped to any"
           "output datastreams, so we can't determine the"
           "value of its time_variability_min attribute  \n", var->name);
        return(0);
    }

    /* loop over the output datastreams that contain the variable   */
    /* and identify its name in the output datastream of interest, */
    for (i = 0; i < status; i++) {
       if(dsid == out_targets[i]->ds_id) break;
    }
    if(i == status) {
        DSPROC_DEBUG_LV1("the variable %s is not mapped to the output "
           "datastream of interest, therefore we can't determine the value"
           "of its time_variability_min attribute in that datastream  \n", var->name);
        return(0);
    }

    strcpy(*var_out_name, out_targets[i]->var_name);

    return(1);

}

Setting a Static Variable in IDL

The IDL data property is defined in the bindings as an array of data. As such The data it is used to define must be an array. The way the bindings handle an the pointers returned by dsproc_alloc_var_data and dsproc_init_var_data when these are used to create arrays of length one, IDL considers the returned value to be a scalar and does not subsequently equate the CDSVar.data property to that value. In order to assign a static variable a value in IDL ADI processes, the value must be stored in an array of length one. Thus, the memory for that data must be create using the IDL make_array() function. Below is sample code that sets a static var called surface_status equal to 10. (Setting a Static Variable in IDL):

surace_st_var = ds.get_output_var(dsid_out, 'surface_status', 0)
if ~obj_valid(surface_st_var) then return, -1
surface_st_data = make_array(1, type=surface_st_var.type)
surface_st_data[0] = 10
surface_st_var.data = surface_st_data

Time

Time is a special dimension in ARM data; it is always defined in every netCDF file, and we have special tools for dealing with accessing, modifying, and creating those values. In our shared libraries, time is stored internally in the standard UNIX format: seconds since Janurary 1, 1970, at midnight, and most time manipulations can be done using either shared library routines or standard C time functions like gmtime().

Everything ADI does is time based, and all data sets have to have a time ariable. When the data is retrieved, ADI retrieved the time from the input netCDF file and stores it in a single time variable and then calculates a new base_time and time_offset. The base_time value it calculates/stores is typically the start of the processing interval and the time_offset is the offset from base_time.

The time in retrieved ADI datasets is ALWAYS stored as time offsets in a variable named “time” where the units attribute specifies the “base time”, regardless of how the time was stored in the input datastream. The variable methods are just convenience functions to get to this information in a way that the user will most likely need it.

Getting Input Sample Times

To read time values from a netCDF file use the function dsproc_get_sample_times().

The primary type of time variable you will use is time_t, which is (essentially) a typedef of long int, and is therefore simply a count of seconds since 1/1/1970 00:00:00. To get an array of type time_t from a dataset, do this:

CDSGroup *in_cds;
int nsamples;
...
time_t *sample_times;
nsamples = dsproc_get_sample_times(in_cds,0,0,&sample_times);

Note that the sample_times array is one-dimensional, but you actually pass in a ** pointer to dsproc_get_sample_times(). The reason for this is that the function allocates the space for the time array internally, so you must pass in a reference to that pointer in order to change its value inside the function. The value of sample_times is modified to hold the address of the allocated memory; this means it is your job to free sample_times when it is no longer needed.

The calls above grab all available data; you would use the two 0 value arguments to set start and end indices if you wanted just a subset of sample times. See the documentation for these functions for more information.

Setting Ouput Sample Times

To set the sample times in your output netCDF file use dsproc_set_sample_times() as shown in the following example.:

CDSGroup *out_cds;
int status;
...
status =
dsproc_set_sample_times (out_cds, 0, nsamples, sample_times);

Obviously, you need to construct the sample_times array first, and you have to give the number of samples you are writing. Once again, you can write a subset of the output sample times by setting the middle two arguments (sample_start and sample_count); see the dsproc_set_sample_times() documentation for more details.

Note that to set output times to match those of an input datastream, you simply have to use the same sample_times and nsamples for both calls:

CDSGroup *in_cds, *out_cds;
int nsamples;
time_t *sample_times;
...
nsamples = dsproc_get_sample_times(in_cds, 0, 0, &sample_times);
dsproc_set_sample_times (out_cds, 0, nsamples, sample_times);

Creating Solar Day Output

The start and end time of the output data can be shifted to span a different period of the same length. This is typically done to create output files that span the solar day. To do this use the dsproc_set_processing_interval_offset function. An example implemented in C and python are provided below. This code should typically be placed in the init_process user hook.:

/* -----------------------------------------------------
* We want complete solar days, so adjust the processing
* interval for the time zone of the new site.
* -----------------------------------------------------*/
if      (strcmp(site, "sgp") == 0) { site_tz = -6.0; }
else if (strcmp(site, "nsa") == 0) { site_tz = -8.0; }
else if (strcmp(site, "pvc") == 0) { site_tz = -5.0; }
else {
    DSPROC_ERROR(
        "Unsupported Site",
        "Could not find timezone for site: %s\n",
        site);
    return((void *)NULL);
}
DSPROC_DEBUG_LV1(
    "Adjusting processing interval in init_smooth_process for %s timezone: %.1g\n",
    site, site_tz);
tz_offset = (time_t)(site_tz * -3600.0);
dsproc_set_processing_interval_offset(tz_offset);
return((void *)mydata);

Using Fractional Seconds

Use a timeval structure and associated functions rather than time_t for your time array to read, store, or otherwise work with fractional seconds in your sample times.

If you need fractional seconds, you will have to use the type timeval, which is a structure with elements for epoch seconds and a microsecond offset:

struct timeval {
time_t tv_sec; /* seconds */
suseconds_t tv_usec; /* microseconds */
};

Don’t worry about the types of the elements in this structure; they both act like (long) integers.

The set and get sample time functions have corresponding timeval versions:

CDSGroup *in_cds, *out_cds;
int nsamples;
struct timeval *sample_timeval;
...
nsamples =
dsproc_get_sample_timeval(in_cds, 0, 0, &sample_timeval);
...
dsproc_set_sample_timeval (out_cds, 0, nsamples, sample_timeval);

Variables of type timeval are a little harder to work with than variables of type time_t; the latter can be treated as integers and added, subtracted, or compared to each other accordingly. For timeval variables, however, these operations take multiple stages, as you have to deal with both elements (and, for example, deal with rolling the microseconds above 1000000 correctly). Therefore, we have dedicated functions (or macros) to do these things for timeval data:

TV_EQ(tv1,tv2)

tv1 == tv2

TV_LT(tv1,tv2)

tv1 < tv2

TV_LTEQ(tv1,tv2)

tv1 <= tv2

TV_GT(tv1,tv2)

tv1 > tv2

TV_GTEQ(tv1,tv2)

tv1 >= tv2

TV_DOUBLE(tv1)

casts tv1 into a double precision float

timeval_add(&tv1,&tv2)

tv1 = tv1 + tv2

timeval_subtract(&tv1,&tv2)

tv1 = tv1 - tv2

Note that the final two functions take pointers to timeval variables, and change the values of their first arguements.

There are also standard C functions that accomplish the same thing as most of the above functions and macros; consult the man pages for timeradd(3), timersub(3), and timercmp(3) for more details.

Changing Retriever Time Offsets

In the PCM a user can elect to retrieve additional data via the “Offsets” column in the retriever table (see the section Specifying Variables to Retrieve and Conversions and Transforms to Apply). If the desired offsets differ by site these values can be adjusted in the process. This code should typically be placed in the init_process hook. In the example code below, implemented in Python, the amount of extra data retrieved is altered from whatever value it was set to in the PCM to 8 months of data.:

# If the site is twp and facility is not C3 get 8 months
# of extra data.
if (mydata.site == 'twp' and mydata.facility != 'C3'):
    beg_offset = 86400 * 8 * 30
    dsproc.set_retriever_time_offsets(dsid_in, beg_offset, 0)

Finding Midnight and Other Time Manipulations

If you need to find midnight of the day of a given sample time, either for reasons of formatting (e.g., calculation of day fraction) or to create a sample_time array (e.g., every minute starting at midnight), you can use the standard C functions gmtime(3) and timegm(3), and work with struct tm as in the following example:

struct tm {
int tm_sec; /* seconds */
int tm_min; /* minutes */
int tm_hour; /* hours */
int tm_mday; /* day of the month */
int tm_mon; /* month */
int tm_year; /* year */
int tm_wday; /* day of the week */
int tm_yday; /* day in the year */
int tm_isdst; /* daylight saving time */
};

Computers work well with “seconds since 1/1/1970 00:00:00”, but there are times when we need to use a more human-friendly representation. The standard C method for dealing with this situation is the type struct tm.

The C function gmtime(3) converts a time_t variable to type struct tm, while the function timegm(3) converts it back. Thus, to find midnight on the day of sample time t, do this:

time_t t, tmid;
struct tm *tm;
...
tm=gmtime(&t);
tm->hour=tm->min=tm->sec=0;
tmid=timegm(&tm);

Note that there are some subtleties with how the struct tm array is allocated; the memory location returns by gmtime(3) may be overwritten with subsequent calls, so if you need your struct tm variable to persist (or be thread-safe) you should either copy it to a new location or use gmtime_r(3), instead.

Further Examples

Formatting

The struct tm allows a convenient way to do formatting, as well. The following example creates an ARM standard YYYYMMDD.HHMMSS timestamp for a given time t:

char *format_ymdhms(time_t t) {
char ymd[16]='\0';
struct tm *tm;
tm=gmtime(&t);
sprintf(ymd,"%04d%02d%02d.%02d%02d%02d",
tm->tm_year+1900,
tm->tm_mon+1,
tm->tm_mday,
tm->tm_hour,
tm->tm_min,
tm->tm_sec);
return(ymd);
};

Building Sample Times by Hand

You can use struct tm directly, along with timegm(3) to convert any human-understandable time format manually into a time_t. To find midnight on day 20050503, simply do this:

struct tm tm;
time_t tmid;
tm.tm_year=2005-1900;
tm.tm_mon=5-1; // zero offset in tm_mon
tm.tm_mday=3;
tm.tm_hour=tm.tm_min=tm.tm_sec=0;
tmid=timegm(&tm);

A similar method (along with a call to sscanf(3)) can be used to convert any generic string of the format YYYYMMDD.HHMMSS into time_t. There are also functions in dsproc that will convert time_t (or timeval) variables into standard-looking text strings; see the documentation for format_secs1970(), format_timeval(), and format_time_values() for more information.

Calculating Julian Day plus Dayfraction

The following code converts a sample_time array into a double precision variable with the format jday.dayfrac:

time_t *sample_time, tmid;
struct tm *tm;
double *jday; // don't forget to allocate
...
for (i=0;i<nsamples;i++) {
tm=gmtime(&sample_time[i]);
jday[i]=tm->yday;
tm->tm->hour=tm->min=tm->sec=0;
tmid=timegm(&tm);
jday[i] += (sample_time[i]-tmid)/86400.;
}

Finding the Start Day of a Given Run

You can determine the start day of a given run by either calculating it using the above methods or reading it directly from the command line.

We have already discussed how to use a struct tm to calculate (for example) the YYYYMMDD of the first input sample point, but there is another way to accomplish the same thing: reading the command line directly. This might be preferable if, for example, you are reading several days of data and the first one is missing; in that case, the day of your first input data point might differ from the first day you actually asked for, and you may wish to recover the latter to construct an output sample time array.

The main() function inside your procedure code has direct access to the command line:

char[9] start_day;
void main(int argc, char *argv[])
{
int c;
for (c=0;c<argc;c++) {
if (strcmp(argv[c],"-b")==0) {
sprintf(start_day,"%s\0",argv[c+1]);
break;
}
}
dsproc_vap_main(...)
}

You could also use getopt(3) or something similar to parse the command line.

Debugging

Dumping Internal Data Structures

When debugging or validating a process it is very helpful to dump the contents of one or more of the internal data structures. There are three internal structures: one which stores the retrieved data, another that stores transformed data, and the last which stores the data to be documented in output files. The functions dsproc_dump_retrieved_datasets (), dsproc_dump_transformed_datasets (), and dsproc_dump_output_datasets () are provided to dump each of these internal structure(s).

To dump the retrieved data structure, use the code below:

if (dsproc_get_debug_level() > 1) {

    dsproc_dump_retrieved_datasets(
        "./", "post_transform_data.debug", 0);
}

Valid locations for the dumping the retrieved data include:

  • post_retrieval_hook()

  • pre_transform_hook()

  • post_transform_hook()

  • process_data_hook()

To dump the transformed data structure use the code below:

if (dsproc_get_debug_level() > 1) {

    dsproc_dump_transformed_datasets(
        "./", "post_retrieval_data.debug", 0);
}

Valid locations for dumping the transformed data include:

  • post_transform_hook()

  • process_data_hook()

To dump the output data structure use the code below:

if (dsproc_get_debug_level() > 1) {

    dsproc_dump_output_datasets(
        "./", "process_data_output_data.debug", 0);
}

The only valid location for dumping the output data is the process_data_hook().

Transformation Recipies

Use the following techniques to build in transformations.

Smoothing Using a Running Mean

You can use the transformation functionality of the PCM to smooth the input data by creating a running mean with a defined window around each sample.

If the data is not being transform to a larger interval, do this by using the ‘Use Mapping’ option in the PCM Coordinate System column and map the datastream to itself. An example of such a mapping is provided below.

_images/cookbook_smooth_data.png

Then setup a transformation parameter file to force the transformation to use averaging and set the width in seconds that you want to average across for each sample. The width parameter value is the size of the averaging window around each sample in the output. To creating a 5 minute running mean create a transform parameter file for the coordinate system. For the example above it would be creating a file named smooth_5min with contents:

time:transform = TRANS_BIN_AVERAGE;
time:width = 300;

The directory structure and location of transformation paramter files are discussed in the tutorial section Defining Transform Parameters in a Configuration File

If a process is being averaged to a larger interval, define a transformation interval, start, end, and length in the PCM, and alos define a value for a width parameter in a transform parameter file as described above.

Filling in Gaps in Coordinate Dimensions

WARNING: !!! At this time, there is no indication (i.e no qc bits are set) or way to identify when a value in the destination grid is based on real values from input datastreams versus whether it has been estimated through interpolation. Therefore, this method of filling gaps it not currently recommended for production ARM DATA!!!!

You can use the transformation functionality of the PCM to interpolate between gaps in coordinate dimension values that are larger than the expected bin width. If the interval of the output coordinate dimension is smaller than that of the input such estimations via interpolation naturally occur because the transform method used to go from a larger to smaller interval is interpolation.

To force this to occur on a datastream that is not being interpolated (meaning a transformation to decrease the coordinate dimension interval of the output datastream relative to the input datastream is not part of the process) define a coordiante system transformation specifying start, end, and length transform paramters on the coordiante dimension in which you want to fill gaps equal to the start, end, and length parameters in the input datastream. This technique can only be applied to datastreams that have constant interval length across all observations.

In the example below the input datastream is mfsr.b1 which has an averaging interval of 20 seconds. Whe intervals of input and output are equal the interpolation transformation method is used, which will automaticall interoplate across gaps.

_images/cookbook_filling_gaps.png

Setting coordinate system variable values, units and type on the fly

If a non-time coordinate variable’s values, data type, or units cannot be known until run time then their values must be set in the code. Use the dsproc_set_coordsys_trans_param function to set these transformation paramer values. NOTE: Currently this capability is not available in Python.

IDL example for setting values for coordiante variable ‘range’ units and values in a coordinate system named ‘output’.

; The range dimension will have a length of 140
range_data = make_array(140, /FLOAT)
range_data[0] = 15
for i =1, 99 do begin
    range_data[i] = range_data[i-1] + 30
endfor
for i =100, 139 do begin
    range_data[i] = range_data[i-1] + 100
endfor
status = ds.set_coordsys_trans_param( $
    "output", "range", "values", $
    data_type = 4,  range_data)
if status eq 0 then begin
    debug_string = string(format = '(A)', $
       'Could not set range trans param = values')
    ds.error, status_string,  msg = debug_string
    return, -1
endif

; Allocate memory for range units
range_units = 'm'
status = ds.set_coordsys_trans_param( $
    "output", "range", "units", $
    data_type = 7,  range_units)
if status eq 0 then begin
    debug_string = string(format = '(A)', $
       'Could not set range trans param = units')
    ds.error, status_string,  msg = debug_string
    return, -1
endif

How to Define Multiple DODs to a Single Datastream

To accommodate DOD differences across operating sites and over time, it is necessary to define more than one version of a DOD to a given datastream, and to provide a mechanism by which a process can determine which version should be used at which site, or for what processing period.

  1. You can create additional DOD versions for a datastream by using the DOD Configuration Form in the PCM by selecting the icon that looks like a floppy disk. This brings up a window into which a unique version number can be entered in the upper right hand corner.

  2. Specify versions by site and time in a datastream DOD configuration hash table in a DOD version configuration file named named <datastream_name>-<data_level>.dsdods located in $VAP_HOME/conf/vap/<process_name> as shown in the following example.

%gDSDODS = (
     'rain.b1' => {
        'sgp.C1'  => {
            '1992-01-01 00:00:00' => '1.0',
            '2010-06-21 00:00:00' => '3.1',
             },
        'twp.C3'  => {
            '1992-01-01 00:00:00' => '1.0',
            '2010-11-30 01:16:00' => '3.1',>
        },
    },
);
  1. Load the file into the database:

$> db_load_dsdods <hash file to load)

Error Handling

The following examples describe the error handling processes you can use.

How to Set Process Status to Unsuccessful

Use the DSPROC_ERROR convenience macro to describe the error encountered and set the exit value to ‘Unsuccesful’ as shown in the following example:

UserData *mydata;
mydata = (UserData *)calloc(1, sizeof(UserData));
if (!mydata) {
    DSPROC_ERROR(
        DSPROC_ENOMEM,
        "Memory allocation error creating user data structure\n");
    return((void *)NULL);
}

How to Set a Warning

A warning documents a possible problem, but does not result in an exit value of ‘Unsuccessful’ as shown in the following example:

UserData *mydata;
mydata = (UserData *)calloc(1, sizeof(UserData));
if (!mydata) {
    DSPROC_WARNING(
        "Memory allocation error may have occurred creating user data structure\n");
    return((void *)NULL);
}

How to Set Debug Messages

As an alternative to defining a warning message, debug messages can be reported based on the debug level under which the process was executed. There are five debug levels and a function for each. The example below is for the lowest debug level of 1. At this level the message will be reported if a process is run with a -D command line option:

/* loop over the output datastreams that contain the variable */
/* and identify its name in the output datastream of interest, */
for (i = 0; i < status; i++) {
   if(dsid == out_targets[i]->ds_id) break;
}
if(i == status) {
    DSPROC_DEBUG_LV1("the variable %s is not mapped to the output "
       "datastream of interest, therefore we can't determine the value"
       "of its time_variability_min attribute in that datastream  \n", var->name);
    return(0);
}

Adjusting the Process Interval of Output Datastreams

The time associated with the first sample of each output observation correlates the start date input parameter to the process. The time is assumed to be GMT. The last sample in each observation correlates to the time associated with the sample that falls just prior to the end date provided in the command line. A process that runs daily with a sampling interval of one hour starts at 01:00:00 GMT (HH:MM:SS) and ends on 24:00:00.

Sometimes it is beneficial to maintain the length of the processing interval, but adjust the start time of each observation to capture characteristics in individual output files. For a daily processing interval a common adjustment to apply is an offset which correlates the adjustment of GMT to local time to capture the complete solar cycle for each day in each observation.

The dsproc_set_processing_interval_offset function applies such an offset. This process should be called in the init_process_hook that is executed prior to looping across the process period in processing intervals. An example of adjusting the offset to capture solar days for the Southern Great Plains(SGP), Ganges Valley, India (pgh), and Cape Cod, MA (pvc) is provided below:

/* ---------------------------------------------------------
 * To process complete solar days, adjust the processing
 *  interval for the timezone of the specified site.
 * ------------------------------------------------------- */

if      (strcmp(site, "sgp") == 0) { site_tz = -6.0; }
else if (strcmp(site, "pgh") == 0) { site_tz =  5.5; }
else if (strcmp(site, "pvc") == 0) { site_tz = -5.0; }
else {

    DSPROC_ERROR(
        "Unsupported Site",
        "Could not find timezone for site: %s\n",
        site);

    free(sashe);
    return((void *)NULL);
}

DSPROC_DEBUG_LV1(
    "Adjusting processing interval for %s timezone: %.1g\n",
    site, site_tz);

sashe->site      = site;
sashe->tz_offset = (time_t)(site_tz * -3600.0);
dsproc_set_processing_interval_offset(sashe->tz_offset);

Setting Split Mode of Output Files

VAPs by default always create a new file when data is stored. Ingests by default create daily files that split at midnight. Sometimes it desired to split output files at more or less frequently than these defaults.

The dsproc_set_datastream_split_mode function provides a mechanism for adjusting when output files split into new files. This process should be called in the init_process_hook that is executed prior to looping across the process period in processing intervals. An example of adjusting the split interval to produce hourly, monthly, and yearly files is noted below.

/*  To split on hour */
dsproc_set_datastream_split_mode(dsid_out, SPLIT_ON_HOURS, 0, 1);

/*  To split on day */
dsproc_set_datastream_split_mode(dsid_out, SPLIT_ON_HOURS, 0, 24);

/*  To split on month */
dsproc_set_datastream_split_mode(dsid_out, SPLIT_ON_MONTHS, 1, 1);

/*  To split on year */
dsproc_set_datastream_split_mode(dsid_out, SPLIT_ON_MONTHS, 1, 12);

Python CookBook

Error Handling

The following examples and code blocks omit error handling, but if a function should return a status or indication of a problem, use the information displayed to guide your efforts.

Get Output Datastream Id

Get the ID of an output datastream.

Python Implementation

Python Example:

dsid = dsproc.get_output_datastream_id(TUTORIAL30S_C1_DS_NAME, TUTORIAL30S_C1_DS_LEVEL);

Get Output Dataset

Return a pointer to the output dataset for specific datastream and observation

Python Implementation

Python Example:

dsid = dsproc.get_output_datastream_id(TUTORIAL30S_C1_DS_NAME, TUTORIAL30S_C1_DS_LEVEL);
out_ds = dsproc.get_output_dataset(dsid, 0);

Accessing a Variable’s Data Value

Get a data index for a multi-dimensional variable.

Python Implementation

Python Example:

dsid = dsproc.get_output_datastream_id(TUTORIAL30S_C1_DS_NAME, TUTORIAL30S_C1_DS_LEVEL);
bb_var = dsproc.get_output_var(dsid, "backscatter", 0);
value = dsproc.get_var_data_index(bb_var);

Accessing a Variable’s Attribute Value

Get a copy of an attribute value from a dataset or variable.

Python Implementation

Python Example:

dsid = dsproc.get_output_datastream_id(TUTORIAL30S_C1_DS_NAME, TUTORIAL30S_C1_DS_LEVEL);
bb_var = dsproc.get_output_var(dsid, "backscatter", 0);
units = dsproc.get_att(bb_var, "units");
att_value = dsproc.get_att_value(bb_var, "units", units.get_type());

Getting Global Attribute Value

Get a copy of a global attribute value from a dataset or variable.

Python Implementation

Python Example:

dsid = dsproc.get_output_datastream_id(TUTORIAL30S__C1_DS_NAME, TUTORIAL30S_C1_DS_LEVEL);
bb_var = dsproc.get_output_var(dsid, "backscatter", 0);
units = dsproc.get_att(bb_var, "units");
att_value = dsproc.get_att_value(bb_var, "units", units.get_type());

Getting Variable Data Index

Get a data index for a multi-dimensional variable.

Python Implementation

Python Example:

dsid = dsproc.get_output_datastream_id(TUTORIAL30S_C1_DS_NAME, TUTORIAL30S_C1_DS_LEVEL);
bb_var = dsproc.get_output_var(dsid, "backscatter", 0);
backscatter = dsproc.get_var_data_index(bb_var);

Setting Variable Attribute Value

Get a copy of an attribute value from a dataset or variable

Python Implementation

Python Example:

dsid = dsproc.get_output_datastream_id(TUTORIAL30S_C1_DS_NAME, TUTORIAL30S_C1_DS_LEVEL);
bb_var = dsproc.get_output_var(dsid, "backscatter", 0);
units = dsproc.get_att(dsid, "backscatter", 0);
att_value = dsproc.get_att_value(bb_var, "units", units.get_type());
status = dsproc.set_att_value(bb_var, "units", cds.FLOAT, att_value[0]);

Getting Dimension

Get a dimension from a dataset.

Python Implementation

Python Example:

dsid = dsproc.get_output_datastream_id(TUTORIAL30S_C1_DS_NAME, TUTORIAL30S_C1_DS_LEVEL);
out_ds = dsproc.get_output_dataset(dsid, 0);
backscatter_dim = dsproc.get_dim(out_ds, "backscatter");