" -v" to the value of the file's logical name in order to tell the
I/O API to perform disk-synch operations before every input
and after every output operation on that file:
...
setenv QUX "/tmp/mydir/volatiledata.mymodel -v"
NetCDF attempts the I/O optimization of not writing a file's header -- needed in order to interpret the file's contents -- out to disk until either a "synch" operation is performed, or until the file is closed. This has the effect of making non-volatile output files unreadable until the program that writes them does a SHUT3() or M3EXIT() (or if it crashes unexpectedly). This extra "synch" operation does cause some (usually small) performance penalty, but it allows other programs to read I/O API files while they are still being written, and prevents data loss upon program crashes.
NOTE: There is a bug in the interaction of netCDF 3.4 and the SGI IRIX6 operating system NFS implementation that sometimes causes NFS-based data exchange between concurrently-running programs using volatile files to fail. If you plan to do this, either go back to netCDF 3.3.1 or before, or go to netCDF 3.5-beta2 or later.
COMMONs - we've detected
some obscure use-before-calculate bugs by replacing
COMMONs with BUFFERED virtual files.)
To set up a buffered virtual file, setenv the value of
the file's logical name to the
value BUFFERED (instead of to the pathname of a real
physical file), as given below:
...
#
# myprogram uses "qux" for internal data sharing:
#
setenv qux BUFFERED
...
/user/mydir/myprogram
...
Restrictions:
M3REAL.
M3REAL.
setenv FOO "virtual BAR"
setenv IOAPI_KEEP_NSTEPS 3
declares that FOO is the logical name of a virtual
file whose physical name (in terms of PVM mailbox names) is
BAR. The additional environment variable
IOAPI_KEEP_NSTEPS determines the number of time
steps to keep in PVM mailbox buffers -- if it is 3 (as here), and
there are already 3 timesteps of variable QUX in the
mailboxes for virtual file FOO, then writing a fourth
time step of QUX to FOO causes the earliest
time step of QUX to be erased, leaving only timesteps 2,
3, and 4. This is necessary, so that the coupled modeling system
does not require an infinite amount of memory for its sustained
operation. If not set, IOAPI_KEEP_NSTEPS defaults
to 2 (the minimum needed to support INTERP3()'s
double-buffering).
The (UNIX) environments in which the modeler launches multiple models each of which reads or writes from a virtual file must all agree on its physical name (usually achieved by sourcing some script that contains the relevant setenv commands).
For models exchanging data via virtual files of the I/O API's coupling mode, the I/O API schedules the various processes on the basis of data availability:
OPEN3() calls for read-access to virtual
files that haven't yet been opened for write access by
some other process put the caller to sleep until the
file is opened; and
READ3(), INTERP3(), or
DDTVAR3() calls for virtual-file data which
has not yet been written put the reading process to sleep
until the data arrives, at which point the reader is
awakened and given the data it requested.
setenv <logical name> BIN:<path name>Note that this assignment is on a file-by-file basis, so that a program may use several different I/O API files with different modes for different purposes. As a special case, this allows ordinary "I/O API M3TOOLS" programs such as M3CPLE to serve as translators back and forth between I/O API native binary and I/O API netCDF.
Introduction
This section describes the structure of the files for a new underlying ("BINFIL3") binary mode for the EDSS/Models-3 I/O API, to supplement the existing (and default) netCDF-file mode, the in-memory BUFFERED mode, and the PVM-based virtual mode.Since this mode uses native machine binary representation for its data as its underlying data representation layer, it should offer somewhat greater performance than the machine independent lower layers (
netCDF,PVM) do, for applications where I/O performance is critical. On the other hand, it is very desirable to keep the header metadata in a portable format, so that user-level programs can still read the data on binary-incompatible platforms and perform the appropriate data conversion themselves. For this reason, header metadata is stored in the portable formats, as described below.The sequence of data structures in these files is modeled somewhat after the structure of netCDF files, although the implementation mechanisms to store some of the metadata in a machine independent fashion are to some extent borrowed from ideas found in other formats, e.g., GRIB.
Implementation Considerations: Restrictions and Limitations
- Initially, the supported platforms are ones with UNIXoid Fortrans (as listed below), but not Win32 nor Cray. Of these latter, Cray is the more difficult (made more difficult by the fact that I don't have access to one of their systems any more...)
- OSF/Alpha from DEC^H^H^HCompaq^H^H^H^H^H^H HP
- HP/UX
- IBM AIX
- Sun
- SGI
- Linux
- x86 with gcc/g77, gcc/lf95, pgcc/pgf90, gcc/pgf90, or icc/ifc;
- Alpha with gcc/g77 or cc/fort;
- ia64 with gcc/g77 or ecc/efc;
- [PPC970 with either gcc/g77, gcc andAbsoft f90, or IBM xlc/xlf should not be difficult but hasn't been done yet]
- [Mac OS-X with either gcc/g77 or xlc/xlf should not be difficult but hasn't been done yet either, AFAIK]
- Initially, the supported data types are those needed for current air quality modeling (and excluding the grid-nest and stream-hydrology data types):
CUSTOM3GRDDED3BNDARY3IDDATA3PROFIL3SMATRX3
- Initially, the following (as far as I know, unused) two I/O routines are not supported:
READ4DWRITE4D
Implementation Strategy
- Implementation is in C, interfacing to Fortran in the same manner as the rest of the I/O API C code.
- Uses C
stdio, and particularly usesfseeko()for seeks (instead offseek()), in order to interoperate with large file systems (implies Linuxglibcversion > 2.0).
- Implementation is in file
iobin3.c.
INIT3callsINITBIN3
FLUSH3calls and other required disk synchronizations use new routineSYNCFIDthat unifies calls toFLUSHBIN3andNF_SYNC
- For
BINFIL3files,
CRTFIL3callsCRTBIN3OPNFIL3callsOPNBIN3RDTFLAGcallsRDBFLAGWRTFLAGcallsWRBFLAGRDVARScallsRDBVARSWRVARScallsWRBVARSXTRACT3callsXTRBIN3CLOSE3callsCLOSEBIN3
OPNLOG3(called fromOPEN3) now logs the implementation-layer used
SHUT3does a sequence ofCLOSEBIN3calls
Metadata Format
The following representations of primitive data types of significance to the I/O API are used to store metadata in a portable fashion (so that the metadata can be interpreted on platforms other than the originating platform) in I/O APIBINFIL3files. In principle, this lets the application programmer use theBINFIL3layer of the I/O API to read the data on any platform, determine the transformations necessary to interpret it on his platform, and then perform the transformations on the data and use it.
INT4- represented by a 4-byte string, in little-Endian order:
BYTE_0(X)contains(unsigned char)(X&&255), i.e., the least significant byte of X
BYTE_1(X)contains(unsigned char)((X/256)&&255)
BYTE_2(X)contains(unsigned char)((X/65536)&&255)
BYTE_3(X)contains(unsigned char)((X/16777216)&&255)
REAL- represented by a character string formatted with format equivalent to the Fortran
FORMAT 1PE15.9, followed by a trailing ASCII NULL
DOUBLE- represented by a character string formatted as
1PD27.19, followed by a trailing ASCII NULL
NAME- Equivalent to a Fortran
CHARACTER*16type (fixed-length 16-byte string, padded on the right by blanks; not nul-terminated as a C string would be.)
LINE- Equivalent to a Fortran
CHARACTER*80type (fixed-length 80-byte string, padded on the right by blanks)
STRING- Equivalent to the Mac Fortran internal representation of a Fortran
CHARACTER*(*)variable (with blank-padding on the right), i.e., as a C "struct hack"struct{
INT4 length;
char contents[ length ];
} ;
File Data Structure Design
The structure of aBINFIL3file is as follows:
Header SectionBack to ContentsINT4 IOAPI_VRSN: I/O API VersionMachine/Compiler Architecture Metadata
INT4 BYTE_ORDER: Byte order, i.e., the C subscripts at which BYTE_0, BYTE_1, BYTE_2, BYTE_3 would occur if we think of an integer as a C union:union{ int idata; char cdata[4] } ;INT4 INTSIZE: size of Fortran "INTEGER"
INT4 REALSIZE: size of Fortran "REAL"
INT4 DBLESIZE: size of Fortran "DOUBLE PRECISION"
Per-File Metadata
NAME GRIDNAME: grid nameNAME UPDATE_NAME: name of the last program writing to file
LINE EXECUTION: value of environment variable EXECUTION_ID
LINE FILE_DESC[ MXDESC3=60 ]: array containing file description (set by programmer during OPEN3())
LINE UPDATE_DESC[ MXDESC3=60 ]: array containing run description, from file with logical name SCENFILE
Dimension/Type Metadata
INT4 FTYPE: File data typeCUSTOM3, GRDDED3, BNDARY3, IDDATA3, PROFIL3, or SMATRX3INT4 GDTYP: map projection type
LATGRD3=1 (Lat-Lon),
LAMGRD3=2 (Lambert conformal conic),
MERGRD3=3 (general tangent Mercator),
STEGRD3=4 (general tangent stereographic),
UTMGRD3=5 (UTM, a special case of Mercator),
POLGRD3=6 (polar secant stereographic),
EQMGRD3=7 (equatorial secant Mercator), or
TRMGRD3=8 (transverse secant Mercator)
INT4 VGTYP: vertical coordinate type
VGSGPH3=1 (hydrostatic sigma-P),INT4 NCOLS: number of grid columns
VGSGPN3=2 (nonhydrostatic sigma-P),
VGSIGZ3=3 (sigma-Z),
VGPRES3=4 (pressure (mb)),
VGZVAL3=5 (Z (m above sea lvl), or
VGHVAL3=6 (H (m above ground))INT4 NROWS: number of grid rows
INT4 NLAYS: number of layers
INT4 NTHIK:
for BNDARY3 files, perimeter thickness (cells), or for SMATRX3 files, number of matrix-columns (unused for other file types)Temporal Metadata
INT4 SDATE: starting date, coded YYYYDDD according to Models-3 conventionsINT4 STIME: starting time, coded HHMMSS according to Models-3 conventions
INT4 TSTEP: time step, coded HHMMSS according to Models-3 conventions
INT4 NRECS: current number of time step records in the file (1-based Fortran-style counting)
Spatial Metadata
DOUBLE P_ALPHA: first map projection descriptive parameterDOUBLE P_BETA: second map projection descriptive parameter
DOUBLE P_GAMMA: third map projection descriptive parameter
DOUBLE X_CENTER: Longitude of the Cartesian map projection coordinate-origin (location where X=Y=0)
DOUBLE Y_CENTER: Latitude of the Cartesian map projection coordinate origin (map units)
DOUBLE X_ORIGIN: Cartesian X-coordinate of the lower left corner of the (1,1) grid cell (map units)
DOUBLE Y_ORIGIN: Cartesian Y-coordinate of the lower left corner of the (1,1) grid cell (map units)
DOUBLE X_CELLSIZE: X-coordinate cell dimension (map units)
DOUBLE Y_CELLSIZE: Y-coordinate cell dimension (map units)
REAL VGTOP: model-top, for sigma vertical-coordinate types
REAL VGLEVELS[0:NLAYS+1]: array of vertical coordinate level values; level 1 of the grid goes from vertical coordinate VGLEVELS[0] to VGLEVELS[1], etc.
Per-Variable Metadata
NAME VNAME[ NVARS ]: array of variable namesNAME UNITS[ NVARS ]: array of units or 'none'
LINE VDESC[ NVARS ]: array of array of variable descriptions
INT4 VTYPE[ NVARS ]: array of variable types:
M3BYTE = 1
M3INT = 4
M3REAL = 5
M3DBLE = 6Additional attributes
Not implemented at this time.Eventually: TBD, as necessary for the WRF extensions placed in I/O API Version 2.2. At this point, we anticipate that the implementation will be in terms of a sequence of <name-type-value> triplets
Data Section
sequence of time step records
Time Step Header
INT4 FLAGS[2,NVARS]: array of data-availability flags (with Fortran-style left-major, 1-based subscripting):FLAGS[1,V] are the dates for the data record, encoded YYYYDDDFLAGS[2,V] are the times for the data record, encoded HHMMSS
FLAGS[1,V] and FLAGS[2,V] are in consecutive memory/disk locations.
(NOTE: This amount of data is not functionally necessary; however, it is included for the historical reasons involving the convenience of visualization-system programmers.)
Time step Contents:
array of data records, subscripted by variable 1, ..., NVARS:
<type> array of data for this variable and time step. Data is in native machine binary format.
Previous Section: Variables and Layers and Time StepsTo: Models-3/EDSS I/O API: The Help Pages