# Changes from Previous I/O API Versions

For Version 2, the I/O API was substantially re-written to support thread-safe task-parallel use in modeling. In general, data access operations may be called at the same time, provided that they do not access the same variable from the same file. In particular, because INTERP3() and DDTVAR3() always do substantial linear-algebra computations (linear interpolation and re-scaled differencing, respectively), while they only occasionally do actual I/O (because the manage behind-the-scenes double-buffering for the interpolation buffers), substantial gains in parallel efficiency may be obtained by using the OpenMP PARALLEL SECTIONS facility to perform multiple INTERP3()'s in task-parallel. Additionally, the (relatively fine-grained) KF-file access functions  KFOPEN(), KFINDX() KFREAD() KFWRITE()  may be called in parallel even acting on the same variable(s) of the same file(s).

## Extensions for Coupling Concurrent Models

As part of the MCNC Practical Parallel Project, MCNC has developed an extended Model-Coupling Mode for the I/O API. This mode, implemented using PVM 3.4 mailboxes, allows the user to specify in the run-script whether "file" means a physical file on disk or a PVM mailbox-based communications channel (a virtual file), on the basis of the value of the file's logical name. For models exchanging data via virtual files of the I/O API's coupling mode, the I/O API schedules the various processes on the basis of data availability:
• OPEN3() calls for read-access to virtual files that haven't yet been opened for write access by some other process put the caller to sleep until the file is opened; and

• READ3(), INTERP3(), or DDTVAR3() calls for virtual-file data which has not yet been written put the reading process to sleep until the data arrives, at which point the reader is awakened and given the data it requested.
There are two requirements on the modeler:
• structuring reads and writes so as to avoid deadlocks (two or more models, each asleep while waiting for input from the other); and
• providing enough feedbacks to prevent one process from "racing ahead" of the others. In a one-way coupled system, this may mean the introduction of artificial synchronization files which exist solely to provide these feedbacks.

This has several advantages from the model-engineering point of view:

• The same programs work unchanged both in standalone mode (reading input from files and writing output to files) and in coupled-model mode (reading and writing selected inputs or outputs to/from PVM mailboxes).
• Readers and writers do not need to know about each other in detail. In particular, any reader only needs to know that some writer will put the variables it needs into the mailbox. Writers don't care whether readers even exist or not.
• One writer can supply multiple readers without special programming (and without needing to know who they are). For example, in a coupled system with the MM5/MCIP meteorology model, the SMOKE emissions model, and the MAQSIP air quality model, MM5 produces 5 time-stepped output "virtual files", some variables of two of which are read by SMOKE and all of which are read by MAQSIP; and SMOKE produces one output "virtual files" read by MAQSIP. MAQSIP produces a "synchronization file" read by MM5/MCIP and used to keep MM5/MCIP from running ahead and exhausting available mailbox-buffer space.
• Since data is tagged by variable-name, simulation date, and time, the system is not subject to data scrambling because of implicit programming assumptions about the data ordering in the way that stream-like communications channels are.

In order to support the needs of surface-water, lake and estuary, and bay modeling, at the request of EPA-AREAL we have added two routines to the I/O API which will access an entire time step sequence of data in a single operation ("read or write N time steps of data starting at date and time D:T with step DT for variable V to/from file F"). The new routines are READ4D() and WRITE4D(), and very much resemble READ3() and WRITE3(), except for the specification of an entire time step sequence as arguments, the restriction to single-variable operations (ALLVAR3 is not supported for these) and file types CUSTOM3, GRDDED3, BNDARY3, and TSRIES3.

## Portability Enhancements

For I/O API version 2, conditional compilation directives were modified to make it easier to port the I/O API to other platforms. Four issues are prominent:
• Name mangling (trailing underscores) conventions
• CHARACTER string-passing conventions;
• Native length for REAL, INTEGER, and DOUBLE;
• Dynamic memory allocation mechanism, whether via Cray-pointers and MALLOC or by automatic arrays (as in g77 and F-90).
Preprocessor definitions and conditional compilation directives recognizing many of these have been provided in the code, as follows:
• FLDMN=1 specifies Feldmann-style name mangling (trailing underscore) and CHARACTER string-passing (Pass strings as C char * arguments, and add extra "hidden" int value arguments for string-length at the end of arg-lists, in the same order as the string-arguments) conventions;
• REAL8=1 specifies that Fortran REAL has the same type as C double, whereas REAL8=4 specifies the correspondence with C float.
• AUTO_ARRAYS=1 specifies availability of off-the-stack allocation for local arrays.
This provides support for at least the following additional platforms and compilers:
• gcc/g77 (FLDMN=1, REAL8=0, and AUTO_ARRAYS=1) on a wide variety of platforms (including gcc/g77 on Win32 via the Cygnus port).
• Win32 (Windows 3.1/win32s, Windows 95, and Windows NT/x86) using either the Digital or the Absoft Fortran-90 compilers and the Microsoft C compiler (for Absoft and DVF5, this requires that the Fortran pre-processing be done in advance on a UNIX system; FLDMN=1, REAL8=0, and AUTO_ARRAYS=1 work for many of these combinations if you specify UNIX calling conventions; hard-coded support has also been added for DVF6 using the so-called Microsoft STDCALL calling conventions.);
• Windows NT/Alpha using the Digital compilers (also requires Fortran pre-processing n advance; FLDMN=1, REAL8=0, and AUTO_ARRAYS=1);
• <suggestions? Do you want to donate us a machine (or machine-time) for it?>

## File Extensions for KF Event Data

The Kain-Fritsch prameterization for convective clouds (as found in the MM5 and Eta meteorological models, and as adapted for air quality models by McHenry) generates data which does not fit the basic I/O API model of data which always occurs on a regular time step sequence [<date>:<time>:<time-step>:<record-count>] for all the cells in an entire array. Instead, convective cloud events happen on a cell-by-cell basis, each event having its own starting location [<column>,<row>] on a grid, as well as its own [<starting-date>:<starting-time>:<duration>] which define its lifetime. We have constructed an additional file type KFEVNT3 with data structures appropriate for this data, together with additional operations KFOPEN(), KFINDX(), KFREAD(), and KFWRITE() in the I/O API to store, index, and retrieve this kind of data.

## Modeling Conventions, September 1996

A number of changes are being made in modeling conventions for the September 1996 freeze/release of the I/O API and related models and analysis/visualization software. Most of these have to do with strict adherence to usage of MKS (SI) units.

### Vertical Coordinates

• File header attribute VGLVS3D(0:MXLAYS3) in FDESC3.EXT now uses 0-based subscripting rather than 1-based.

This change does not affect the files themselves (since the size of the VGLVS3D array does not change, but only affects FDESC3.EXT and the interpretation of subscripts to the array VGLVS3D contained in it.

• Layer 1 is the bottom layer of the model.

• For sigma-P coordinates (both hydrostatic and nonhydrostatic), file header attribute VGTOP3D in FDESC3.EXT should be specified in Pascals rather than millibars.

• For pressure coordinates (VGTYP3D=VGPRES3), pressure levels VGLVS3D(0:MXLAYS3) should be specified in Pascals rather than millibars.

• For sigma-Z (VGTYP3D=VGSIGZ3), the formula gives zero on the "model bottom" and 1 on the "model top":

sigma = (z - h) H / ( H - h )

where h = h(x,y) is the terrain height and H is the height of the model top.

Diagrams showing the relationship of the grid and its layers to the header attributes VGLVS3D, etc., are available in Postscript, X bitmap, JPEG, and GIF image formats.

### Horizontal Coordinates

• (A reiteration, not a change:) Horizontal grid descriptions are cell-based, rather than point-based. In particular, (XORIG,YORIG) are the coordinates of the lower-left corner of the (1,1)-cell.

• (The model is an MKS model:) Except in Lat-Lon coordinate systems, grid-descriptive parameters XORIG, YORIG, XCELL, and YCELL are specified in meters.

### Temporal Coordinates

(A reiteration, not a change:) Dates and times in I/O API files are assumed to be in Greenwich mean time.

### Standard Environment Variables

Environment variable "IOAPI_LOG_WRITE" controls whether each successful call to WRITE3() generates a log message or not. The default value of this environment variable is "Y", indicating that log messages will be written (compatible with the previous behavior).

### Sample programs

A set of sample programs are now available, demonstrating some useful ways to use the I/O API, how the modeling conventions work, and how the two fit together. The programs were designed not only to be demonstrative, but also to do some useful work:
• Program LATLON computes single-layer time-independent gridded and/or boundary files containing variables "LAT" and "LON", atitudes and longitudes at cell centers for the specified grid and/or its boundary.
• Program PRESZ optionally reads in a file containing gridded terrain heights, prompts the user for a (hydrostatic) vertical grid, then constructs a time-independent layered gridded file with variables "ZH", "ZF", "PRESH", and "PRESF".
• Program SFCMET reads in an ASCII hourly surface meteorology observation file and puts out its data in an ID-referenced-type hourly time-stepped I/O API file.

## New Concepts

You must use the new INCLUDE files rather than keeping the old versions. Source code can presently be found in the directories:
/pub/storage/env/proj/ioapi on the EPA workstation cluster; and
/home/xcc/m3io on sequoia.

C bindings: There are now C include-files and C wrappers around the public I/O API routines and almost all of the utility routines .

The public routines have been changed to permit name arguments to be CHARACTER*(*) with actual length at most 16, for files and variables (internally, the I/O API copies the actual name arguments to its own CHARACTER*16 buffers). This makes the API more robust (you no longer need to pad to exactly length-16), as well as friendlier (you can use immediate-mode strings -- e.g. use 'SO4' as a variable-name).

For metadata tracking (suggested by Becky Bagdasarian): the I/O API will look for environment variable "EXECUTION_ID" , to be stored in file headers to identify the exact program execution which produced the file. For files opened for writing, it will record the execution-ID (as a CHARACTER*80 string in the file header, and will report it appropriately. It can be retrieved by getting the file description using routine DESC3() and then examining EXECN3D.

We introduce a new timestep-structure "circular-buffer" (or "restart") for files. The circular-buffer time step structure allows you to minimize the disk space consumed while at the same time ensuring that enough data is stored to disk to allow you to restart a computation. It is defined as follows:

• For circular-buffer files, the header-timestep attribute TSTEP3D is negative, and is the additive inverse of the actual time-step.
• The files keep two active time steps -- the "even" and the "odd" time step.

We introduce "BUFFERED" virtual files to provide a mechanism that is safer and more robust than COMMON blocks for sharing data among modules within the same program. These "files" are actually an in-memory mechanism for sharing data between modules in the same program; they are created and read from and written to just as ordinary files are. Only two active time steps are kept in memory (as two active disk records are kept for circular-buffer files, above); memory allocation, etc., is handled behind the scenes by the I/O API when these virtual files are created.

You cause a file with say logical name FOO to be BUFFERED by the way you assign the logical name: setenv FOO BUFFERED instead of setenv FOO <file path name> . Since READ3() and INTERP3() check the date and time associated with the data they retrieve, the I/O API will catch and report instances when you attempt to use data in one module before it has been generated in another (unlike COMMON blocks, which will blithely let you attempt to use variables that haven't been set yet). Since the decision as to whether a file is BUFFERED or is a real disk-file is made at program-launch, on the basis of setenvs in the script, the calling program doesn't know (nor need to know) whether a file is BUFFERED or not. This provides the opportunity to save -- at will -- a program's intermediate values to disk for further analysis.

New basic data type options and descriptions are now available: individual variables may now be arrays of integers, reals, or double-precision, instead of real only. The basic data type of each variable is indicated by the VTYPE3D array in file description data structures in the FDESC3.EXT INCLUDE-file; "magic-number" values M3INT, M3REAL, and M3DBLE, respectively (defined in PARMS.EXT), indicate variables of types INTEGER, REAL, and DOUBLE PRECISION, respectively.

New data structure-type SMATRX3 for sparse matrices used in new emissions modeling. The sparse matrices are stored in the so-called "skyline-transpose" representation. For these matrices, the interpretation of dimensioning attributes, memory layout, and multiplication with vectors V is as follows:

    NROWS3 = number of matrix rows
NCOLS3 = max number of nonzero columns in a row
NLAYS3 = 1
NVARS3 = 1 (or do we want to allow for the possibility
of multiple matrices using the same indexing
scheme?  -- i.e., one INDX but multiple COEF's
in the memory layout below.)
SINDX3 maps into variable-index for NMAX below
LINDX3 maps into variable-index for INDX below.

INTEGER  NMAX( NROWS3D )
INTEGER  INDX( NCOLS3D, NROWS3D )
REAL     COEF( NCOLS3D, NROWS3D )
COMMON / ASPARSE /  NMAX, INDX, COEF        !  memory layout

P( j ) = \sum_{i=1}^{NMAX(j)} COEF(i,j) V( INDX( i,j ) )

(Internal change at the request of Kathy Pearson): internal implementation-flag array "TFLAG" |~~> "TIMESTAMP" has values which are 2-vectors containing components for the date and time for the corresponding record (using Models-3 date and conventions -- TIMESTAMP( var, rec) = (YYYYDDD,HHMMSS).

## PARMS3.EXT: Dimensioning and Constants

New or changed dimensioning parameters
• MXVARS3 maximum number of variables per file changes from 60 to 120
• MXDESC3 maximum number of description lines changes from 20 to 60
• MXLAYS3 new maximum number of layers parameter is 100; it is used to dimension the vertical grid description in FDESC3.EXT

New missing-value parameters in PARMS3.EXT:

• AMISS3 = -9.0E36,
• IMISS3 = -9999, and
• CMISS3 = '????????????????' = 16*'?'
the intent is to use BADVAL3, IMISS, and CMISS as the standard REAL, INTEGER, and CHARACTER-string "missing" values and always to test for BADVAL as X < AMISS3. Note that BADVAL3 and AMISS3 are generally-unused values, safely in range of floating-point arithmetic for all M3/EDSS machines, and BADVAL3 < AMISS3 on all such machines (i.e., the test is roundoff-safe on any reasonable hardware).
• SMATRX3 sparse matrix data type
• FSRDWR3 file opening mode: old, read-write
• FSNEW3 file opening mode: new, read-write
• FSUNKN3 file opening mode: unknown, read-write
• FSCREA3 file opening mode: truncate/create new file for read-write (remove file if it already exists)
• VGSGPH3 vertical coordinate type: hydrostatic sigma-P
• VGSGPN3 vertical coordinate type: nonhydrostatic sigma-P
• VGSIGZ3 vertical coordinate type: sigma-Z
• VGPRES3 vertical coordinate type: pressure (mb)
• VGZVAL3 vertical coordinate type: Z (m above terraim)
• VGHVAL3 vertical coordinate type: H (m above sea level)
• UTMGRD3 horizontal coordinate type: universal transverse mercator (UTM)
• M3INT basic data type for variables: integer
• M3REAL basic data type: real
• M3DBLE basic data type: double precision

## IODECL3.EXT: Declaration of routines

IODECL3.EXT now declares routines INTERP3() and DDTVAR3(); it no longer declares the obsolete routine CREATE3().

## FDESC3.EXT: Grid and File Descriptions

Grid description definitions were changed extensively. For horizontal grids and coordinates, two new description parameters, (XCENT3D,YCENT3D) were added to FDESC3.EXT. These describe the (Lat-Lon) or standard-UTM (for offset-UTM) coordinates for the center of the Cartesian coordinate system (i.e., Cartesian (0,0) has these as its Lat-Lon or UTM coordinates. The complete vertical grid description (previously not specified in file descriptions) was also added. Vertical grid descriptions provide the following information:
• vertical coordinate type VGTYP3D which takes as values the following "magic numbers" defined in PARMS3.EXT
• VGSGPH3 = 1 hydrostatic sigma-P
• VGSGPN3 = 2 nonhydrostatic sigma-P
• VGSIGZ3 = 3 sigma-Z
• VGPRES3 = 4 pressure (mb)
• VGZVAL3 = 5 Z (m above sea level)
• VGHVAL3 = 6 H (m above ground)
• AMISS3 = -9999 "other" or "not applicable"
• the full-level boundary values VGLVS3D( 0:NLAYS3D ); and
• (relevant for sigma-coordinates only) the model top VGTOP3D .

A new maximum time step number attribute MXREC3D for files was added to FDESC3.EXT . It allows, for example, an analysis program to determine easily not only the beginning (as it could do earlier, in terms of SDATE3D:STIME3D) but also the end of the time period for which a file contains data.

Individual variables may now be arrays of basic data type INTEGER, REAL, or DOUBLE-PRECISION, instead of real only. Which such type each variable has is indicated by the VTYPE3D array in file description data structures in the FDESC3.EXT INCLUDE-file; it takes the "magic-number" values (defined in PARMS.EXT):

• M3INT = 4 for INTEGER variables
• M3REAL = 5 for REAL variables, and
• M3DBLE = 6 for DOUBLE PRECISION variables

## OPEN3()/CREATE3()

The OPEN3()/CREATE3() changes semantics are as follows:

CREATE3() goes away.

OPEN3( FNAME, FMODE, PGNAME ) takes a new argument, FMODE, (replacing the READ-ONLY/READ-WRITE flag) which takes the following magic numbers defined in PARMS3.EXT as its values:

FSRDWR3 = 2 for "old, read-write (update)"
FSNEW3 = 3 for "new, read-write"
FSUNKN3 = 4 for "unknown read-write" (create if necessary; otherwise perform consistency-check with the supplied definition).
FSCREA3 = 5 for "create/truncate read-write" (remove any existing file and create new file with the supplied definition).

For files opened "old", the file must already exist, or else OPEN3() will return FALSE (which matches the previous behavior of OPEN3()).

For files opened "new", the behavior matches the previous CREATE3(): the file must NOT exist; the caller must have supplied a file description in the FDESC3.EXT commons. for use by OPEN3(), which then constructs the new file according to the caller-supplied description.

For files opened "unknown", the file may or may not exist; the caller must have supplied a file description in the FDESC3.EXT commons; and the behavior depends upon whether the file exists or not: if it does, the file is opened and the description from the file's header is checked for consistency with the description supplied by the caller. If these are consistent, OPEN3() returns TRUE; if not, it closes the file again and returns FALSE. If the file does not exist, OPEN3() will create a new file according to the caller-supplied description (just as it would if the mode had been "new".

For files opened "create/truncate", the caller must have supplied a file description in the FDESC3.EXT commons. OPEN3() first checks validity of this description (returning FALSE if IOAPI_CHECK_HEADERS is set and the file description is not consistent), then closes the file if it is already open. If the file exists, it deletes it, and then creates a new file according to the supplied file description. NOTE: Joan Novak (EPA) and Ed Bilicki (MCNC) have declared as a software standard that modeling programs may not use FSCREA3 as the mode for opening files. FSCREA3 is reserved for analysis/data extraction programs only.

OPEN3() now writes significant portions of a file's description to the program log upon success at opening a file.

## WRITE3()

The granularity of WRITE3() has been changed to permit write-granularity at the level of time steps of individual variables for GRIDDED , BOUNDARY , and CUSTOM files. The argument list now looks like:
    WRITE3( <filename>, <variable-name>, <date>, <time>, <buffer > )

If the file type is GRIDDED, BOUNDARY, or CUSTOM, then the variable-name argument may be either a valid variable name (in which case it will write exactly that variable from the buffer to the file), or ALLVAR3 (defined to be 'ALL' in PARMS3.EXT (in which case the behavior of WRITE3() is as defined in the previous version, i.e., to write an entire time step ofdata from the buffer, interpreted as an array of all the variables, to the file). If the file is of any other type, the variable-name argument must be 'ALL' (and the behavior is as defined earlier).

## CHECK3()

The change to WRITE3() changes the semantics of CHECK3() , so that it must have argument-list
    CHECK3( <filename>, <variable-name>, <date>, <time> )

and the semantics is that .TRUE. is returned iff the indicated time step is available for the indicated variable. Note that 'ALL' is accepted as a variable-name; in that case, CHECK3() returns TRUE iff all variables are present for the indicated date and time. This means it still returns FALSE even if some variables are available for that date and time, but others are not.

## New I/O API Function CLOSE3()

A new I/O API function CLOSE3() , has been requested, so that open/close operations are more symmetric. It has argument-list
    CLOSE3( <filename> )

and the semantics is that .TRUE. is returned iff the file was successfully flushed to disk and closed.

## New I/O API Function DDTVAR3()

For GRIDDED, BOUNDARY, or CUSTOM files, DDTVAR3() returns the mean time derivative (per second) for the indicated variable for the time step containing the indicated date and time. Note that for time independent files this derivative is of course zero.

## New Utility Routines and Support Structures

In order to keep track of horizontal grids and coordinate systems , and to make their definitions easily available to programs without the necessity to recompile them every time a new grid is defined, we introduce a "grid-and-coordinate-description" file GRIDDESC , and a family of utility routines as follows:
GRIDDESC is the logical name for a text file with two segments. Each segment has a 1-line header (which by convention provides titles for the columns in the data records), a sequence of data records, and a terminal record with name field blank ( i.e. ' '). The first segment is the coordinate system description segment and consists of text records giving coordinate-system name and descriptive parameters P_ALP, P_BET, P_GAM, XCENT, and YCENT. The second segment is the grid-description segment, and consists of text records giving grid name, related coordinate-system name and descriptive parameters XORIG, YORIG, XCELL, YCELL, NCOLS, NROWS, and NTHIK. Each data record is list-formatted (i.e., items are separated by either blanks or commas, where names are quoted strings, and consists of three lines, as appropriate:

COORD-NAME
P_ALP, P_BET, P_GAM
XCENT, YCENT

or

GRID-NAME
COORD-NAME, XORIG, YORIG, XCELL, YCELL
NCOLS, NROWS, NTHIK

There are at most 32 coordinate systems and 256 grids listed in one of these files. These files are small enough to be archived easily with a study, and have a sufficiently simple format that new ones can easily be constructed "by hand."

Logical function DSCGRID() manages access to GRIDDESC (in fact, serves as an operational definition of the GRIDDESC file format), and gets grid and coordinate system descriptive parameters COORDNAME, COORDTYPE, P_ALP, P_BET, P_GAM, XCENT, YCENT, XORIG, YORIG, XCELL, YCELL, NCOLS, NROWS, and NTHIK for the specified grid name. Returns TRUE iff the requested grid is found in the GRIDDESC file. LOGICAL ENTRY DSCOORD() of DSCGRID() gets coordinate-system descriptive parameters P_ALP, P_BET, P_GAM, XCENT, and YCENT for the specified coordinate system name (also returning TRUE iff the coordinate system is found in the GRIDDESC file).

New date-and-time functions

DAYMON: find month and day-of-month for <jdate>
DT2STR: Construct string "HH:MM:SS Month DD, YYYY" for <jdate-&-time>
GETDTTIME: get current wall-clock date and time
HHMMSS: construct string "HH:MM:SS" for <time>
JULIAN: find Julian day number for <month> <day> <year>
MMDDYY: construct string "Month DD, YYYY" for <jdate>
WKDAY: get day-of-week (1...7) for <jdate>
New utility functions
DSCOORD: get description of named coordinate system
DSCGRID: get description of named grid
ENVINT: get INTEGER value of logical name from the environment
ENVREAL: get REAL value of logical name from the environment
ENVSTR: get CHARACTER-STRING value of logical name from the environment
ENVYN: get LOGICAL value of logical name from the environment
FIND1, FIND2, FIND3, FIND4: find integer key-tuple in sorted keytuple-table
GCD: greatest common divisor function
GETDFILE: open and return unit number for direct access Fortran file with specified logical name
GETEFILE: open and return unit number sequential Fortran file with specified logical name
GETDBLE: prompt user for DOUBLE and get response, with default and range checking
GETNUM: prompt user for INTEGER, etc.
GETREAL: prompt user for REAL, etc.
GETYN: prompt user for "Yes-No" answer, etc.
GRIDOPS: select and compute various comparison operations
INDEX1: unsorted-name-table lookup for character-string key
JUNIT: return a "safe" Fortran unit number
LEN2: number of leading blanks in string
M3ERR: warning message; or error message with SHUT3() and CALL EXIT( 2 )
M3EXIT: exit message with SHUT3() and CALL EXIT( <status> )
NAMEVAL: get value of environment variable (for Fortran)
POLY: degree-d polynomial interpolation function
TRIMLEN: string length, not counting trailing blanks
UPCASE: make string into ALL CAPS