DRAFT
Layering the WRF I/O API
on Top of the
Models-3/EDSS I/O API

Carlie J. Coats, Jr.
MCNC Environmental Modeling Center
carlie@jyarborough.com

Contents


Introduction

The exercise of "wrapperizing" the Models-3/EDSS I/O API (M3IO) within the WRF I/O API (WRFIO) is dominated by just a few issues: These issues are quite resolvable. This paper gives a suggested implementation of the M3IO external I/O package within WRFIO, along with the metadata dictionary needed by the WRF "configure/build" system, additional requirements that this places on WRFIO, thoughts for an enhanced M3IO, and suggestions for augmenting WRFIO.

The current plan is a two-stage one:

  1. to construct a restricted prototype WRFIO ext_m3io package on top of a minimally modified version of M3IO, for use with the Summer 2002 air quality forecast effort; and

  2. to construct an enhanced version of the Models-3/EDSS I/O API (M4IO), together with a full-function WRFIO ext_m4io package built on top of it.

As this document is being written, the effort to implement (1) is underway. This effort is bearing fruit in terms of finding "holes", ambiguities, and other difficulties in the WRFIO specification.

Back to Contents


Dataset Definition Issues

M3IO takes the point of view that the API should encapsulate state to the greatest degree possible, and avoid shared state between itself and its callers. For example, a dataset must be defined in one atomic operation, in which the user fills in all the fields in a dataset definition and then calls the OPEN3() routine. Similarly, to get a file's metadata, one makes a single DESC3() call and is returned the complete description of a dataset, together with all its metadata. As is appropriate for providing the level of commonality and integrity required of an API used for multi-model environmental modeling applications, for model coupling, and for easy callability from legacy Fortran-77 applications, the set of metadata required is fixed by a proscriptive standard; it is an error not to supply the complete set of metadata needed by OPEN3() during data set creation.

On the other hand, the WRFIO paradigm is to supply tools that allow the definition of an open-ended set of metadata, at the modeler's discretion, but with the added complication of considerable shared state between WRFIO and its callers. The ext_pkg_get|put*() routines in WRFIO allow the user to define a dataset incrementally by means of repeated calls. Provided that the "configure/build" system can ensure that all of the definitions expected by the M3IO are made (between calls to ext_pkg_open_for_write_begin() and ext_pkg_open_for_write()), there would seem to be no conceptual difficulty in wrapperizing M3IO within WRFIO. A complete dictionary of the metadata which must be set in order to completely define an M3IO-implemented data set is given in the section below, Required-Attribute Dictionary.
Ensuring that the "configure/build" system generates all the required definitions and metadata attributions during data set creation is a fundamental constraint of this "wrapperizing" approach.

One problem with this open-ended operational paradigm is the lack of routines -- within both WRFIO and M3IO as they presently exist -- for inquiring about the existence, names, and types of metadata that have been stored in a WRF data set, as we describe in the section, Additional Requirements, below.

A minor implementation issue is that M3IO only supports input and output of fields of types INTEGER, REAL, and REAL8, whereas WRF also has fields of type LOGICAL. It will be very simple to add support for the additional LOGICAL type required by WRF, and it has recently become understood how to add the CHARACTER(LEN=<n> types requested by EPA for emissions and observational data. Changes for these new types not will break backwards data, source, nor link compatibility of M3IO.

A larger issue is what to do with attributes not expected by M3IO (which has a fixed list of what attributes it expects, as documented below in the section Required Attribute Dictionary ); we conclude that the wrapper should go ahead and read or write these attributes at the raw netCDF (or PVM or other) lower I/O level, so that the resulting data sets are "M3IO-plus", with additional metadata not maintained by the M3IO proper. (Due to the selective direct access nature of the underlying netCDF and PVM M3IO-implementation layers, the existence of additional attributes not accessed by the existing system is irrelevant to that system.) This WRF requirement implied by the open-ended set of "put attribute calls could lead to an augmentation of the existing M3IO, for the creation, inquiry, and access to additional metadata within M3IO datasets (probably the cleanest way to implement this WRF requirement). This issue,as well as the previous type extension issue, is discussed further below in the section, Enhancements to the Models-3 I/O API.

Another issue that does come up is the issue of time-dependent metadata, implied by the ext_pkg_put_*_td_type() routines: netCDF does not have provisions for implementing time dependent metadata, and consequently neither does M3IO. Presently, these are not used by the WRF model itself, and the draft document indicates "not yet implemented". One way to implement these is as by auxilliary variables in the data set; doing so puts some additional constraints on WRFIO, as we describe below in the section, Additional Requirements.

A final issue is the issue of WRF "stagger" -- for M3IO files, all variables have the same grid structure, so that (for an MM5/MAQSIP example), the dot point variables (for which WRF stagger is "XY") inhabit different files from the cross point variables(for which WRF stagger is none, coded "-"). The three obvious potential solutions to this problem are (1) to extend M3IO to support stagger; (2) to dimension files to the extent of "XY" stagger, pad all output variables out to the extent of those dimensions (and subset on input), and to add extra per-variable attributes describing the stagger-padding, Z-staggered variables being written both to the 2-D and the 3-D M3IO data sets that represent a particular WRFIO data set; or (3) to let a WRFIO data set be implemented by multiple files, one for each stagger type. We recommend alternative (1) below, in the section, Enhancements to the Models-3 I/O API.

Due to the difficulties of maintaining the data structures necessary to store some indefinite number of additional non-standard metadata of multiple unknown types and sizes, the initial M3IO implementation of WRFIO to be used for the Summer 2002 air quality forecasting will not support any metadata except per-variable stagger and those metadata already standard for M3IO.

Back to Contents


Field I/O and Data Distribution Issues

The WRFIO ext_pkg_read|write_field() calls map fairly cleanly to M3IO READ3() and WRITE3() calls -- the biggest difference being the domain and patch dimensionality and stagger info contained in the calls, and the added per-level optional selectivity offered by the M3IO READ3(). The WRFIO notions of "output frames" and ext_pkg_get_next_var() (etc.) can easily be implemented by maintaining an internal per-dataset state variable that keeps the current-frame simulation clock and current variable for the dataset.

Initially, we propose to to support two modes of operation, based on data distribution--the modes that do not do distributed gather/scatter operations within WRFIO, with place-holders in place to extend to a third which does implement internal WRFIO gather/scatter, as described below. (As we understand it, this restriction is consistent with the current functionalities supported by the netCDF implementation, as well.)

Question 1:
NetCDF doesn't have a notion of a LOGICAL type for variables and attributes. Is this mapped to netCDF INTEGER, , and if so, is this guaranteed to be portable?

Question 2:
What is the role of the COM argument to ext_m3io_write_field() and ext_m3io_read_field()? Can it be used to distinguish between the non-gather/scatter first and second modes, and the (internal gather/scatter) third?

Full-Domain to Full-Domain
In this scenario, full-domain gathers are performed by the driver layer prior to calling ext_m3io_write_field() or subsequent to ext_m3io_read_field(). There is a single unified full-domain data set produced. This case can be recognized on the M3IO-package side by the fact that the patch-extent arguments are the same as the domain-dimension arguments (and the COM argument is turned off?).

Distributed to Distributed (same patch decompostion)
In this scenario, distributed (parallel per-patch) calls are made to ext_m3io_write_field() and ext_m3io_read_field(). These routines do I/O on a distributed data set (i.e., with different files for different patches), already correctly partitioned appropriately across the computational nodes, for distributed (this-patch) inputs or outputs. For this mode, patch-extent arguments are different from domain-dimension arguments (and the COM argument is turned off?).

Distributed to Full-Domain
In this scenario, distributed (parallel per-patch) calls are made to ext_m3io_write_field() and ext_m3io_read_field(). These routines do I/O on a full-domain data set and then do gather/scatter operations to generate distributed (this-patch) inputs or outputs. This mode of operation is distinguished by the fact that the COM argument is turned on (?)

Distributed to Distributed (different patch decompostions)
In this scenario, distributed (parallel per-patch) calls are made to ext_m3io_write_field() and ext_m3io_read_field(). These routines do I/O on a distributed data set (i.e., with different files for different patches), on distributed data sets with patch decompositions different from the current WRF patch decomposition. We do not intend to support this mode. Is it necessary to detect it, and flag it as an error?).

Back to Contents


Dates and Times; Sequencing Issues

M3IO clock objects use pairs INTEGER JDATE,JTIME to represent time accurate to (exact integer) seconds, using a coding of YYYYDDD:HHMMSS; time steps are exact integers coded H*MMSS. There is a complete set of routines for date and time manipulation; the entire system is robust with respect to negative time steps and "denormalized" dates and times such as 1999476:-234567 (this represents April 19, 2000 at 00:13:53--a date&time 23 hours, 45 minutes, and 67 seconds before the 476-365=111'th day of the year 2000). WRFIO clock objects use character(len=??) strings (of various inconsistent lengths in the documentation) and documented as accurate to milliseconds and formatted YYYY-MM-DD-HH-MM-SS.SSS. What to do if the millisecond field for an M3IO implementation of WRFIO is nonzero is an open problem; however, it is not unreasonable that for many (or even most) applications, the WRF model time step will be an integer number of seconds, and this issue can be avoided. (On the other hand, there is a request for an enhanced I/O API that maintains an even more accurate notion of clock object, as documented below in the section, Enhancements to the Models-3 I/O API. With Fortran-90/95 function overloading, one can even make this enhancement transparent to existing applications, as a (hidden-by-F90) M3IO extension. The M3IO direct access mode of operation, and hence its deterministic record-number requirement can be preserved either by the use of rational-number fraction fields, or by double-precision fraction-fields (which have sufficient numerical precision that millisecond-level error tolerances can be preserved even for very long time step sequences). For the initial M3IO version for the Summer 2002 air quality forecasting effort, we will require exact-integer time steps; for the full M4IO, we will choose one of the enhanced formulations.

An issue arises with respect to time-step sequences. For M3IO, time step sequences are characterized by a starting date, a starting time, and a time step. The correspondence between elements of a time step sequence and "record numbers" is a deterministic one, as made possible by the facts that the relevant arithmetic is exact integer arithmetic and that the time step is constant for the entire time step sequence. This will create a "show-stopper" problem for data sets of high temporal resolution, if a WRF driver were constructed to use an adaptive time step scheme. For coarser output time steps (i.e., substantially larger than the solver time-step), we would resort to the scheme used by the MCPL() output module for MM5, to deal with the non-determinism of pre-Version-3 MM5 time-keeping: find the model time step that contains the desired input or output time step, and thereby keeping the temporal error at most one model time step (probably good enough for hourly input, for example).

A special case of this is the case of time-independent data: presently, I can see no means in WRFIO to define that a variable is time-independent (at least in the documentation to which I have access, nor do I see provision for time independent variables implied in what I read of the Registry. On the other hand, there are a number of variables for which it is clearly worthwhile to note that they are time independent: z, pb, pb8w, msft, msfu, msfv, f, e, sina, cosb, ht, etc.

Another issue is the WRFIO concept of input and output frames. Since M3IO does selective direct access I/O, it naturally performs I/O operations when they are requested, and the sequential-file "frame" concept is not even relevant. As far as I can tell, however, this is of issue only in terms of requiring that the external M3IO package for WRFIO must maintain and update the state variables related to frames (ext_m3io_end_of_frame(DataHandle,Status) must increment the specified data set's date&time, for example).

QUESTION: Given the WRFIO concept of a "current frame", (with its own date&time) as maintained by routines ext_pkg_get_next_time(DataHandle, DateStr, Status), ext_pkg_set_time(DataHandle, DateStr, Status) ext_pkg_end_of_frame(DataHandle, Status) and ext_pkg_get_next_var(DataHandle, VarName, Status) , and given that the routines ext_pkg_read_field(DataHandle, DateStr, ...) and ext_pkg_write_field(DataHandle, DateStr, ...) have a possibly different DateStr as one of the arguments, how do these latter two routines interact with the current-frame date&time? Also, how do the first three (dealing with potentially changing the current frame) interact with get_next_var?

Back to Contents


Required-Attribute Dictionary

As a consequence of its single-operation "build a file from a complete file-definition" mode of operation, and consistent with the integrity requirements for the multi-model nature of environmental modeling applications, it is required that during the "dry-run" file-creation phase, the WRF execute a set of "put-metadata" calls sufficient to complete a MM3IO file definition, and specify these metadata by the names recognized by the external_m3io package. It is an error to call ext_open_for_write_commit() without having specified the entire set of metadata for a complete M3IO data set definition.

The dictionaries of required "global" and per-variable metadata are given below.

Global Metadata

Constraint: For the initial M3IO implementation (for use in the Summer 2002 air quality forecasting effort), only the M3IO standard global metadata, as documented below, will be supported.

Question: What is the difference between WRFIO "global metadata" and "domain metadata"? Both have unique per-data-set values, applicable to the entire data set...

The following are global metadata that must be set while the file is still in the definition phase.

DOMAIN_NAME (character(len≤32))

PROJ_TYPE (integer)
Identifier-Token for the horizontal map-projection type, as documented at URL /products/ioapi/GRIDS.html#horiz
  • 1: Lat-Lon
  • 2: Lambert Conformal Conic
  • 3: General Tangent Mercator
  • 4: General Stereographic
  • 5: Universal Transverse Mercator
  • 6: Secant Polar Stereographic
  • 7: Equatorial Mercator
  • 8: General Transverse Mercator

PROJ_ALPHA (real8)
First defining angle for the map projection, as documented in the URL above, in degrees

PROJ_BETA (real8)
Second defining angle for the map projection, as documented in the URL above, in degrees

PROJ_GAMMA (real8)
Third defining angle for the map projection, as documented in the URL above, in degrees

PROJ_XCENT (real8)
Longitude of the Cartesian origin for the map projection, as documented in the URL above, in degrees

PROJ_YCENT (real8)
Latitude of the Cartesian origin for the map projection, as documented in the URL above, in degrees

DOMAIN_XORIG (real8)
Cartesian X coordinate for the starting (1,1) corner of the domain, as documented in the URL above, in meters.

DOMAIN_YORIG (real8)
Cartesian Y coordinate for the starting (1,1) corner of the domain, as documented in the URL above, in meters.

DOMAIN_XCELL (real8)
Cartesian cell-size in the X direction, as documented in the URL above, in meters.

DOMAIN_YCELL (real8)
Cartesian cell-size in the Y direction, as documented in the URL above, in meters.

DOMAIN_NCOLS (integer)
Number of dot point columns in the domain

DOMAIN_NROWS (integer)
Number of dot point rows in the domain

DOMAIN_NTHIK (integer)
Boundary thickness (# of cells) for the domain; usually +1 for an (unthickened) external boundary, or -1 for an internal boundary.

DOMAIN_NLAYS (integer)
Number of levels for the domain

DOMAIN_VTYPE (integer)
Vertical coordinate type for the domain, as documented at URL /products/ioapi/GRIDS.html#horiz and to be extended for WRF below:
  • 7,8,...: New vertical coordinate types to be defined for WRF, as documented below.
  • 1: Hydrostatic sigma-P (e.g., for MM4, MM5)
  • 2: Non-hydrostatic sigma-P
  • 3: Sigma-Z
  • 4: Pressure (Pa)
  • 5: Altitude Z (M above sea level)
  • 6: Height H (M above ground level)

DOMAIN_VTOP (real)
Domain top for sigma coordinates (Pa for Sigma-P, M for Sigma-Z)

DOMAIN_LEVELS (real(0:NLAYS) )
Vertical coordinate values for the level-surfaces Constraint: Prior establishment of DOMAIN_NLAYS is required.

START_DATETIME (character(len=24) )
starting date and time, given according to WRF conventions

TIME_STEP (character(len=24) )(?)
dataset time step, given according to WRF time step conventions (which are TBD?)
NOTE: to the extent that WRF data sets have both time independent and time stepped variables, this will need to be a per-variable attribute that must either be zero (indicating a time independent variable) or have a common value for the entire data set (for the time stepped variables)

PROG_NAME (character(len=*))
Name of the program creating the data set.

DATASET_DESC (character(len<=4800))
dataset description (think of this as up to 60 lines of 80 characters each, with NEWLINE=ACHAR(10) as the delimiter)

In addition, the M3IO runtime system always automatically maintains the following additional metadata, which are available on datasets opened for input:
DOMAIN_NVARS_2D (integer)
Number of 2-D variables for the domain

DOMAIN_NVARS_3D (integer)
Number of 3-D variables for the domain

CREATION_DATETIME (character(len=24))
Date and time (GMT) that the dataset was created, given according to WRF conventions

UPDATE_DATETIME (character(len=24))
Date and time (GMT) that the dataset was last updated, given according to WRF conventions

EXECUTION_ID (character(len=80) )
Execution-ID for the program execution that created the dataset, according to Models-3 conventions.

UPDATE_DESC (character(len=4800) )
Run/Execution-description for the program execution that created the dataset, according to Models-3 conventions.

Per-Variable Metadata

Constraint 1: For the initial Summer 2002 M3IO air quality forecast implementation, only STAGGER and the standard M3IO global and per-variable metadata, as documented below, will be supported.

Constraint 2: The indicated data set must be in define-mode (i.e., dry-run, prior to commit) in order to set these metadata.

Constraint 3: Before these (or any per-variable) metadata are specified, the variable itself must have been registered by a prior (dry-run) ext_m3io_write_field() call.

VAR_TSTEP (character(len=24) )
Time step specification for the indicated variable. Must either be 0 (indicating that the variable is time independent, or must agree with global attribute TIME_STEP.

VAR_UNITS (character(len≤32) )
Units specification for the indicated variable; should be MKS / UDUNITS compliant.

VAR_DESC (character(len≤80) )
one-line text description of the variable.

VAR_LEVELS (integer)
Number of layers for the variable (either 1 or else matches DOMAIN_LEVELS.

VAR_TYPE (integer)
Data type the variable (INTEGER, LOGICAL, REAL, REAL8).

VAR_STAGGER (character(len≤32) )
NEW: Stagger specification for the indicated variable; should be one of the following: "", "X", "Y", "Z", "XY", "XZ", "YZ", "XYZ". Case is not significant.

Back to Contents


Additional Requirements for the WRF I/O API

I'm not sure of the meaning of WARNING and FATAL in the error-value parameters in the list below; that issue should be re-visited. Moreover, from what I can infer from WRFV1/external/IOAPI , the current list of status codes is altogether too closely tied to the current netCDF implementation of WRFIO.

NOTE
The WRF documentation to which I have access describes WRF date&time representation variously as character(len=19), "accurate to exact milliseconds," and "as exemplified by the format 0000-01-00:00:00.0000". These are mutually inconsistent, and this inconsistency needs to be resolved. Moreover, the code and documentation do not consistently specify the field-delimiters; I've coded my wrappers to look for any non-digits.

Standard representation for time-deltas
(This may be already present, but I can't find it). What I would suggest is that time deltas use a format adapted from the date&time representation: something like either <sign>[[H*:]MM:]SS.SSS or <sign>H*:MM:SS.SSS where the sign, hours, and minutes fields are optional.Note that since "month" and "year" do not have invariant meanings, there should not be month-fields and year-fields in a time step representation.

NOTE In module_date_time, the routines geth_idts and geth_newdate seem to imply that time-deltas should be integers. This is an error if, as the comments in that very code indicate, date and time representation is "YYYY-MM-DD HH:MM:SS.ffff".

Specification of the vertical coordinate type(s)
There is a list of current candidates. I need to know how many items there are in this list, and what the names of them are. There also need to be standard parameter tokens for the corresponding WRF vertical coordinate type IDs...which should go in one of the WRF "constants" modules--which one?

Provision for Time Independent Variables
I do not see provision in the documentation nor what I interpret of the Registry for stating that a variable (such as terrain height, map scale factors, map rotation angles, and reference atmosphere). There need to be two extensions to the WRF to support this: augmentation of the Registry so that it allows the designation of a variable as time independent or time stepped, and a means within the WRF API to communicate this fact to WRFIO. For this latter, the standard could be that a dry-run call to ext_pkg_write_field() for which the DateStr argument is identically zero (i.e., 00:00:00.0000) designates the corresponding variable as time independent.

Grid and Map Projection characterization
Mathematically, the notion of a grid is subordinate to that of a map projection:
A map projection is a mapping from a domain on the surface of the Earth into a rectangle in a two-dimensional Cartesian space R2. The common mesoscale examples -- Lambert conformal conic, polar secant sterographic, equatorial Mercator -- require not only a set of (typically 3) defining angles, but also a Cartesian origin for their specification. It is not necessarily the case that the Cartesian origin falls at an origin "central lat-lon" coincident with the defining angles of the projection. This would seem to be an unwarranted assumption. Moreover, at the Meso-Gamma and finer scales targeted by the WRF, there will be a desire (at least for air quality applications) to support for the so-called Universal Transverse Mercator (UTM) map projections, by the way.

A grid has a map projection and a "lattice structure" defined in terms of that grid -- typically needing the following, of which I can find only the last two in the Registry. One can not assume that the grid must be centered relative to the Cartesian origin of the map projection, especially as one goes to smaller scales (and there can be particular modeling advantages to having the grid off-center, if by so doing one can line up a coordinate axis with the prevailing winds of the domain modeled).

  • starting corner X1, Y1 (e.g., the SW corner of the (1,1)-cell)
  • cell-size DX, DY
  • dimensions NX, NY

As far as I can tell, the WRF does not have any proper characterization of grids relative to the map projections within which they "live".

Routines for the definition of time-dependent metadata
  • ext_pkg_def_glb_td_char(DataHandle,Length,Status)
  • ext_pkg_def_glb_td_type(DataHandle,Count,Status)
  • ext_pkg_def_var_td_char(DataHandle,Var,Length,Status)
  • ext_pkg_def_var_td_type(DataHandle,Var,Count,Status)

INTEGER, INTENT( IN ):: DataHandle, Length, Count
CHARACTER(len=*), INTENT( IN ):: Var
INTEGER, INTENT( OUT ):: Status

Constraint 1: These methods must be called while the data set is in its data definition phase, prior to the call to ext_pkg_open_for_write_commit()

Constraint 2: These methods must be called prior to any call to a time dependent metadata routine ext_pkg_put_td*()

NOTE: the draft-document prototypes for routines ext_pkg_get_glb_td_*() and ext_pkg_get_dom_td_*() need DateStr as an additional argument.

Program termination
In order for WRFIO data sets to have correctly updated headers (or the equivalent), all program terminations must cause the call of ext_pkg_exit(). STOP and CALL EXIT() should be forbidden.

WRF_ERR_INCOMPLETE_SD_DEF parameter
It is a fatal error for data set creation to attempt to do a "commit" operation with a data set definition that is yet incomplete.

WRF_ERR_DEFN_CONSTRAINT parameter
Constraint violation (e.g., setting a per-variable metadatum prior to establishment of the variable via dry-run ext_pkg_write_field() call.)

WRF_ERR_FATAL_ parameters for the following failures:
Possibly one additional error-code WRF_ERR_FUBAR, or else specific error codes for the following:
  • ext_m3io_init() failed
  • ext_m3io_init() not yet called
  • ext_m3io_exit() failed
  • ext_m3io_open_dataset_for_read() failed
  • ext_m3io_open_dataset_for_write_commit() when the supplied data set definition is not complete: WRF_ERR_INCOMPLETE_DS_DEF
  • ext_m3io_inquire_opened() when the datahandle is incorrect for the specified file name.
  • ext_m3io_close() failed, when the indicated file exists but the close operation failed.
  • ext_m3io_*_md*(): metadata Element not in data set.

Back to Contents


Suggestions for Enhancing the WRF I/O API

Time-interpolation method ext_pkg_interp_field( <args>)
where the argument list <args> is the same as that for ext_pkg_read_field()
Constraint: The indicated variable must be of type real or real8.

The corresponding M3IO routine, INTERP3(FNAME,VNAME,<date&time>,...), has proven quite valuable. It interpolates variable VNAME from file FNAME to the specified date and time (or returns a failure-status if this is not possible). The routine handles I/O optimization for the caller (i.e., maintains double buffering) behind the scenes.

Time-Derivative method ext_pkg_ddt_field( <args>)
Constraint: The indicated variable must be of type real or real8.

Given the double-buffering of a time-interpolation routine, it is easy to construct a time-derivative routine using the same buffer system. Such a routine is occasionaly useful, e.g., for getting a rainfall rate from the prognostic cumulative rainfall variable.

Windowed Output
For research and diagnostic purposes it may be worthwhile to make special provisions for the capability to output selected variables at high temporal frequency (e.g., the model time step), but only for a specified window into the model domain (resource requirements may make full-domain model time step output impractical).

Layer-Selective input method ext_pkg_read_field_level( <args>)
where the argument list <args> augments that for ext_pkg_read_field() by adding an additional selector for model level.

Air quality modeling experience shows that there is relatively frequent use of the selective-read operations that pick out just one level (especially the model-bottom level) from a 3-D field. Given a selective direct-access lower API-layer such as netCDF, this is quite easy to implement.

Metadata-name-inquiry functions
Presently, only if the modeler already knows beforehand that a variable named foo has an attribute named bar and that the attribute has type qux, can the modeler inquire for the value of the attribute (and similarly for "domain" and "global" attributes).

Full functionality requires a complete set of inquiry functions that retrieve the names and types of all the per-variable, domain, and global metadata. The end result would be a fully reflective metadata interface for WRFIO data sets.

UNITS
I would like to suggest that units given for variables in the Registry should be compatible to the extent possible with the MKS UCAR UDUNITS package. This reflects upon capitalization issues, so that one has, e.g., M instead of m for "meters". (This is an operational requirement, rather than a software-system requirement...)

Data set opening modes "unknown" and possibly "create"
One of the most useful M3IO file-open modes is UNKNOWN, which behaves as follows:
  1. The caller must supply a data set definition.
  2. If the file does not exist, create it according to the caller-supplied definition.
  3. If the file exists, check that its definition is consistent with the caller-supplied definition.
  4. In particular, if the caller-supplied starting date&time is later than the file's starting date&time, the former must be a valid time step for the latter.
One of the least useful M3IO file-open modes is "create" mode: if the file exists, delete it and create a new file according to the caller-supplied definition. (This one was added at the insistence of the visualization people, who wanted it for scratch files, and who alone are allowed to use it: By agreement between the heads of EPA ORD and MCNC Environmental Modeling Center, modelers are forbidden to use this mode, on pain of being sentenced to not less than three years nor more than five years, on a 386/25 running Windows 3.0 and Microsoft Fortran 1.0. :-) If the visualization people get wind of this mode, they may insist upon it too...

Back to Contents


M4IO: Enhancements to the Models-3 I/O API

There are a number of potential modifications to the Models-3 I/O API that would ameliorate incompatibilities between it and the WRF I/O API. Some of these would cause source-code, link, or data incompatibilities with the existing M3IO (which has been carefully-maintained to preserve backward compatibility), and would at the very least lead to a "flag day". For this reason, and because the WRF-Chem air quality model which (because of its planned use of SMOKE) is one of the chief targets of this effort will in fact be a fourth generation air quality model (as the EPA's Models-3 is of the third generation), it is attractive to create a new library -- M4IO --as a follow-on to M3IO. At the same time (if for no other reason than SMOKE compatibility), it is desirable to make compatibility between M3IO and M4IO as great as possible. It may be possible to make this almost entirely transparent by using Fortran-90 subroutine overloading for M4IO (although the Fortran-90 type system is not strong enough to do all that the API needs to do).

NOTE: Given the time and level of effort required to implement all the M3IO enhancements given below, and the level of agreement with EPA needed for M4IO acceptance by them, we propose that for the initial WRFIO-to-M3IO implementation to be used for the Summer 2002 air quality forecasting effort, we use the current version (Version 2.1) of M3IO, and require the following constraints upon WRF configurations used with it:

Potential changes/enhancements from M3IO that go into the new M4IO are the following:

Additional LOGICAL and CHARACTER(LEN=<n> field types
Currently, M3IO supports READ3() and WRITE3() for fields only of types INTEGER, REAL, and REAL8 (INTERP3() makes sense only for the last two of these). It would be almost trivial to add support for LOGICAL; it has long been an EPA request to support fields of type CHARACTER(LEN=<n>, and we have recently worked out how to implement this cleanly, as well.
Does not break backwards data, source, nor link compatibility.
NOTE: netCDF does not support LOGICAL; do we handle this by overloading INTEGER?

New "extra-attribute" M3IO routines
LOGICAL INQATT3(FNAME,VNAME,NATTS,ANAMES,ATYPES,ASIZES)
CHARACTER*(*) FNAME ! logical file name
CHARACTER*(*) VNAME ! variable name, or "ALL"
CHARACTER*(*) ANAMES( MXATTS3 ) ! attribute names
INTEGER ATYPES( MXATTS3 ) ! " types (M3REAL,M3INT,M3DBLE)
INTEGER ASIZES( MXATTS3 ) ! " sizes/lengths
LOGICAL RDATT3(FNAME,VNAME,ANAME,ATYPE,AMAX,ASIZE,AVAL)
LOGICAL RDATTC(FNAME,VNAME,ANAME,CVAL )
LOGICAL WRATT3(FNAME,VNAME,ANAME,ATYPE,AMAX,AVAL)
LOGICAL WRATTC(FNAME,VNAME,ANAME,CVAL )
CHARACTER*(*) FNAME ! logical file name
CHARACTER*(*) VNAME ! variable name, or "ALL"
CHARACTER*(*) ANAME ! attribute name
INTEGER ATYPE ! attribute type (M3REAL,M3INT,M3DBLE)
INTEGER AMAX ! attribute dimensionality
INTEGER ASIZE ! attribute actual size
REAL AVAL( AMAX ) ! attribute value (numeric)
CHARACTER*(*) CVAL ! attribute value (character-string)
Does not break backwards data, source, nor link compatibility.
Work currently underway (prototype version coded and in test at this time); should fit into Summer 2002 air quality prototype and be used for the initial implementation of STAGGER

New "disk-synchronization" M3IO routine
LOGICAL SYNC3(FNAME)
CHARACTER*(*) FNAME ! logical file name
Does not break backwards data, source, nor link compatibility.
Work currently underway (prototype version coded and in test at this time); should fit into Summer 2002 air quality prototype and be used for the initial M3IO implementation.

Higher-resolution date and time support
WRF uses millisecond resolution; is this the appropriate way to go? There have been arguments in favor of three other approaches for date&time objects that are even stronger than millisecond temporal resolution (where below, date has a resolution of 1 day, and time has a resolution of 1 second):
  • <date&time> = <INTEGER YYYYDDD date > + <INTEGER HHMMSS time> + <REAL8 0.SSS... fractional seconds>
  • <date&time> = <INTEGER YYYYDDD date > + <REAL8 HHMMSS.SS... time>
  • <date&time> = <INTEGER YYYYDDD date > + <INTEGER HHMMSS time> + <INTEGER FRAC(2)=(NUMERATOR,DENOMINATOR) rational-number fractional seconds>
Note that for the first two of these, REAL8 provides sufficient numerical precision to allow for the deterministic calculation of time step record numbers accurate to WRF's millisecond tolerances, even for very long time step sequences. Multi-century runs would run into difficulty, however. The rational-number approach has the virtue of exact arithmetic; however, it is much more complex both for us to implement and for the average modeler to use. Currently, I (CJC) favor the first of these approaches, in terms of maximizing the combination of usability and precision.
Breaks backwards source and link compatibility.

32-character variable names
M3IO currently supports 16-character variable names and units designations; WRF names are presently mandated to be at most 31-character. (NOTE: Length 32 bytes potentially gives alignment-improvements to the internal data structures used for implementation.)
Does not break backwards data nor source compatibility. Breaks backwards link compatibility.
INTERIM STEP: modify M3IO internals so that it requires trimmed name lengths to be at most 16.
Done, 1/20/2002

New vertical coordinate types for WRF
This one is trivial to add, without breaking backwards compatibility--it just adds new parameters in the PARMS include-file.
Does not break backwards data, source nor link compatibility

Increased level-dimension support
Currently, M3IO stores a maximum of 101 full-level values in file headers and file-description data structures. This number could be increased as desired.
What is an appropriate maximum for WRF? 256? 512?
Does not break backwards data noir source compatibility; breaks link compatibility.

Support for WRF "stagger"
There are two ways to accomplish this, as described above in the section Dataset Definition Issues: to add a per-variable STAGGER attribute in M3IO data sets, and either (1) use it to adjust the actual dimensions for input and output, or (2) to pad pad all output variables out to the extent of those dimensions (and subset on input), on the basis of STAGGER. The former alternative gives smaller data set sizes and more efficient I/O, but at substantially greater code complexity and with the probability of hard-to-diagnose program crashes if old-M3IO programs are fed new-M3IO data sets.

The consequence is that different variables within a M3IO data set would have different dimensionality, breaking forwards compatibility -- old implenmentations would be unable to read new "staggered" variables without scrambling them, and might generate hard-to-diagnose program errors in the process. Does not break backwards data compatibility. Breaks backwards source and link compatibility, potentially in ways that lead to hard-to-diagnose bugs.

NAMELIST support for logical names
M3IO by default uses environment variables (initialized by script commands "setenv foo /bar/qux/dingbats.dat") to bind data set logical names (as used by model code) to physical path names (as used by system calls), whereas WRFIO has traditionally used NAMELISTs. If possible, the relevant M3IO utility routines will be modified to use either environment variables, or an M3IO_NAMELIST, ideally in that priority order, to evaluate the bindings of logical names. M3IO_NAMELIST will itself be either the logical name for the namelist-file, or else will be the physical path-name for a file to be found in the current working directory.

NOTE: Since the M3IO environment-variable utilities are themselves C called from the Fortran, it remains to be seen whether one can make this C code actually interpret the NAMELISTs. Perhaps the way to do this is to make the initialization routine do a putenv()?

Does not break backwards data, source nor link compatibility

Geodetic-Spheroid metadata
Presently, the knowledge of what spheroid is used for the map projection (and therefore for the geo-referencing of the data) is implicit, maintained in the head of the modeler. Especially for coupled environmental modeling systems, this is less than satisfactory; the data sets should contain a complete and accurate characterization of their georegistration.
Does not break backwards data compatibility. Breaks backwards source and link compatibility, potentially in ways that lead to hard-to-diagnose bugs.

User-Defined metadata
The present M3IO maintains a fixed set of global and per-variable metadata, with additional metadata stored as user-defined text in the (4800-character) file description and run/update description fields. The goal would be to provide additional interface-methods that yield a reflective interface to this user-defined metadata. We do not propose to relax the standards for mandatory metadata.
Does not break backwards data, source nor link compatibility.

Fortran-90 overloaded-interface implementation
Careful implementatoin of this would allow a backwards-compatible implementation that uses Fortran-90 polymorphism to handle the different kinds of time steps that may occur. Michael Metcalf's convert program can be of some help here, although we have seen that it does not deal with INCLUDE-files correctly, distorts the layout of comments and code, and requires some manual fixup and checking.
Does not break backwards data compatibility. Minimizes effects upon source compatibility. Breaks link compatibility. Causes portability difficulties due to the variety of different ways vendors implement modules.

New Geospatial-Element Cell Complex data type
This one isn't necessary for WRF (as it is presently conceived), nor for existing applications (although it would be useful for adding additional capabilities to SMOKE). However, it does fulfill a request from EPA for means to handle time-stepped/time-independent geospatial-coverage and finite element data in an efficient and powerful manner. It currently exists in prototype form (not included in the standard M3IO release); the EPA proposal describing its original conception can be found here
Does not break backwards data, source nor link compatibility.

Additional software-implementation layers
Currently, the M3IO is layered on top of two software libraries used for physical data storage/communication: netCDF (using files, and the netCDF2 Fortran interface), and PVM3, using the mailbox interface. It is perhaps desirable to extend or modify this in a number of ways:
  • Add additional lower software-layers -- particularly MPI2
  • Update the netCDF interface to use netCDF3
  • Revise the name-binding mechanism for greater regularity in syntax (e.g., setenv foo MPI:/bar/stuff)

Does not break backwards data, source nor link compatibility.

Back to Contents


WRF Implementation Wishlist

wrf_io_flags.h, wrf_status_codes.h location
These are of general use for all external I/O packages and should not be squirreled away under the io_netcdf leaf subdirectory.

UTM map projection support
These are a de facto standard for urban-scale air quality applications, and will be useful to support the urban-scale air quality modeling community (which already has its emissions databases configured in terms of UTM; note that emissions database development is one of the most resource-intensive activities in such air quality applications. Satisfying this request would largely be a matter of getting the correct preprocessor support; moreover, since almost all the original terrain and land cover databases are defined with respect to UTM, this should not be difficult (and would potentially avoid the loss of resolution that currently occurs when such data is re-mapped to another map projection.

Names and Keywords
Don't use variable names that happen to be the same as Fortran keywords, e..g, DATA in the argument list or implementation of ext_pkgget_glb_md_type().

foo
bar

Back to Contents


Layered WRFIO/M3IO Implementation Strategy

Date&Time Routines

The first requisite for layering WRFIO on top of an external M3IO package is to implement the simpler related objects and methods--such as date&time conversion routines--on top of which the API is built.

QUESTION 1:
Can we nail down that the WRF time-delta represention is as suggested above: similar to the WRF date&time representation,except that it may have a leading minus sign, it has hour, minute, and seconds fields, and that the leading fields may be missing? e.g.,
12.375 twelve-and-three-eighths seconds
08-00-00.000 eight hours
-08-00-00.000 negative eight hours
168-00-00.000 one hundred sixty-eight hours (one week)
Note that the use of a hyphen as the delimiter in WRF date&time representation makes the construction and interpretation of negative time-differences rather trickier!

QUESTION 2:
Can we assume that the separator in the date&time character strings is always a hyphen? (i.e., so that date-strings look like "YYYY-MM-DD-HH-MM-SS.SSS"

QUESTION 3:
Are WRF date&time objects always normalized? Or can things like 2001-12-32-01-00-00.000 (that "ought to be" 1 A.M. on Jan. 1, 2002) happen?

SUBROUTINE wrftime2m3io(DateStr,JDate,JTime[, Frac])
character(len=*), intent(in):: DateStr
integer, intent(out):: JDate, JTime
REAL8, intent(out):: Frac

Overloaded module routine that converts from WRF "YY-MM-DD-HH-MM-SS.SSS" character string time representation to (extended) M3IO integer YYYYDDD:HHMMSS[:0.xxxxD0] time representation

SUBROUTINE m3iotime2wrf(JDate,JTime[, Frac],DateStr)
integer, intent(in):: JDate, JTime
REAL8, intent(in):: Frac
character(len=19), intent(out):: DateStr

Overloaded module routine that converts from M3IO integer YYYYDDD:HHMMSS[:0.xxxxD0] time representation to WRF character string time representation

WRFDT2M3IO( DTStr, Tstep[, Tfrac] )
character(len=*), intent(in):: DTStr
integer, intent(out):: TStep
REAL8, intent(out):: TFrac

Overloaded module routine that converts a standard WRF time-delta (once that is defined) into M3IO time-delta representation.
Note that M3IO already has functions that convert back and forth between seconds abd M3IO time-delta representation.

M3IODT2WRF( Tstep[, Tfrac], DTStr )
integer, intent(out):: TStep
REAL8, intent(out):: TFrac
character(len=*), intent(out):: DTStr

Overloaded module routine that converts a standard WRF time-delta (once that is defined) into M3IO time-delta representation.
Note that M3IO already has functions that convert back and forth between seconds abd M3IO time-delta representation.

Mapping WRFIO Data Sets to M3IO Data Sets

Each WRF data set will be implemented as a set of M3IO data sets, as necessitated by the fact that individual M3IO data sets have homogeneous time step, layer, and horizontal grid structures. The decision of just how "stagger" is implemented will determine just how many M3IO data sets there will be for each WRF data set.

Another issue that comes up in this regard is the issue of name bindings: both the WRFIO and the M3IO systems achieve directory independence by mapping logical names for data sets into physical (path) names for files. Logical names are program-properties but do not imply actual file-system location or implementation by other means. This independence allows M3IO transparently to support both persistent file-based data sets and PVM-mailbox based communications channels for coupling a set of cooperating processes (as will be used initially to couple SMOKE with WRF. M3IO uses environment variables (e.g., set using the csh setenv command) to bind logical names to physical names; WRF traditionally uses NAMELISTs. It may be possible to enhance the underlying M3IO environment variable routines so that they support both means of name binding. If so, this allows us to perform a transparent "behind the scenes " upgrade that supports both means of name bindings when the enhanced routines become available.

For the initial implementation to be used for the Summer 2002 air quality forecast effort, we propose that the ext_m3io_ package behave as follows:

Design/Implementation Structure: module_m3io.F90 and wrappers-file ext_m3io.F90

This module encapsulates the state necessary to maintain the WRF datasets, as well as the input and output frames for each of them. It is USEd by the subroutines in ext_m3io.F90, which implements all of the ext_m3io_*() wrapper calls that give the WRFIO external-package API interface. The ext_m3io_*() routines provide Fortran-77 style (in practice, this means call-by-reference) implicit calling interfaces rather than explicit interfaces; this is necessitated by the fact that the subroutines (such as ext_m3io_write_field() which writes both 2D and 3D fields of types INTEGER, LOGICAL, REAL, and REAL8) are overloaded to act upon what an explicit Fortran-90 interface would see as incompatible argument lists. (IMNHO, the lack of a "void pointer" type is a serious defect in the Fortran-9x standards, but we can't really do anything about it.)

Wrapper-routines in ext_m3io.F90
Each of the routines in the ext_m3io_*() API has an implementation that wraps M3IO calls for WRF, and maintains the state tables in module_m3io accordingly.

PARAMETERs in module_m3io
The M3IO parameters from PARMS3.fh are INCLUDEd. Additionally, there are
  • INTEGER, PARAMETER:: WRF_ERR_FUBAR: Otherwise-unclassified WRFIO-M3IO error.
  • INTEGER, PARAMETER:: MXWRFIO
  • INTEGER, PARAMETER:: WRF_ERR_INCOMPLETE_DS_DEF :
  • INTEGER, PARAMETER:: FS_PENDING token to indicate a data set is currently in the dry-run state.
  • tbd...

Data Structures in module_m3io
tbd...

State Variables in module_m3io
tbd... We need to
  • Current number NWRFIO of data sets
  • List FNAME of current data set names.
  • Lists FID2D and FID3D of STATE3 ID's for the 2D and 3D M3IO files mapped to the current data set.
  • Lists of standard metadata values for data sets currently in the dry-run state.
  • tbd...

Internal Subroutines in module_m3io
INTEGER m3io_get_att_<type>(FID,VID,ATTNAME,ATT)
INTEGER, intent(IN):: FID ! m3io file ID
INTEGER, intent(IN):: VID ! m3io vble ID, or 0 for global
CHARACTER(len=*), intent(IN):: ATTNAME ! attribute-name
<type>, intent(IN):: ATT ! attribute
There is one such function for each type in {character(len=*), integer, real, REAL8 } These functions are used to store only the non-M3IO variable or data set attributes

INTEGER m3io_put att_<type>(FID,VID,ATTNAME,ATT)
INTEGER, intent(IN):: FID ! m3io file ID
INTEGER, intent(IN):: VID ! m3io vble ID, or 0 for global
CHARACTER(len=*), intent(IN):: ATTNAME ! attribute-name
<type>, intent(OUT):: ATT ! attribute
Retrieve the indicated per-variable attribute for the indicated variable or the indicated global dataset attribute from the ndicated M3IO file.
There is one such function for each type in {character(len=*), integer, real, REAL8 }. These functions are used to retrieve only the non-M3IO variable or data set attributes.
Constraint: Dataset must be in define-mode, else the call is an error.

SUBROUTINE wrftime2m3io(WRFTIME,JDATE,JTIME[,SECFRAC])
CHARACTER(LEN=*), INTENT( IN ):: WRFTIME
INTEGER , INTENT( OUT ):: JDATE, JTIME
REAL8 , INTENT( OUT ):: SECFRAC
This polymorphic routine implements the conversion from character string based WRF date&time representation to numeric M3IO or extended M3IO date&time representation.

SUBROUTINE m3iotime2wrf(JDATE,JTIME,[SECFRAC,]WRFTIME)
INTEGER , INTENT( IN ):: JDATE, JTIME
REAL8 , INTENT( IN ):: SECFRAC
CHARACTER(LEN=*), INTENT( OUT ):: WRFTIME
This polymorphic routine implements the conversion from the numeric M3IO or extended M3IO date&time representation to character string based WRF date&time representation

Back to Contents


dummy

Back to Contents


References

  1. Models-3/EDSS I/O API Homepage

  2. I/O API Coupling Mode page

  3. MCNC Environmental Modeling Center, The MM5 MCPL() Meteorology-Coupler I/O API Module.

  4. Carlie J. Coats, Jr., MCNC Environmental Modeling Center, Coupling-Mode Extensions for the Models-3 I/O API, notes for a talk delivered at the EPA Workshop on the Multimedia Integrated Modeling System, Research Triangle Park, NC, August 15-17 2000. /projects/ppar/coupling_mode.html

  5. MCNC Environmental Modeling Center, Sparse Matrix Operator Kernel Emissions (SMOKE) Modeling System. /products/smoke/index.html


  6. OpenMP Home Page. http://www.openmp.org/

  7. PVM Home Page. http://www.epm.ornl.gov:80/pvm/

  8. NetCDF Home Page. https://www.unidata.ucar.edu/software/netcdf/

  9. Metcalf and Reid's convert.f90 ftp://ftp.numerical.rl.ac.uk/pub/MandR/convert.f90

Back to Contents


Send comments to
Carlie J. Coats, Jr.
carlie@jyarborough.com