TARG flags than do 7.3.x and before. For example,
for an Origin 3800 (where hinv reports
24 400 MHZ IP35 Processors
CPU: MIPS R12000 Processor Chip Revision: 3.5
...
one would use the following sets of ARCHFLAGS compiler
flags in Makeinclude.${BIN} with the different
Fortran-90 compiler versions:
-
-TARG:platform=ip35,processor=r12000
for 7.3.x and before
-
-TARG:platform=ip35 -TARG:processor=r12000
for 7.3.x and before
There are a number of problems with both the I/O API and netCDF
with the newer (version 7.4) SGI compilers:
Added 12/18/2003
SGI claims to have fixed this in the latest patch for F90
version 7.4.1 (bug # 895393); I haven't had time to test
it yet, though. -- CJC
- NetCDF and IRIX 7.4 compilers:
Experience indicates that the IRIX 7.4 compilers
will not correctly build the netCDF library used by the
I/O API. Although the make
seems to succeed on that platform,
make test fails almost immediately;
attempts to use the libnetcdf.a that
was built will also lead to program crashes.
At present, the only workaround we have is to use a
libnetcdf.a built using IRIX 7.3 or
earlier compilers.
- I/O API and IRIX 7.4
f90:
The IRIX 7.4
f90 compiler refuses to recognize
industry-standard practice for linking
BLOCK DATA subprograms from libraries.
For the upcoming I/O API Version 3, we have
put into place a workaround-hack that puts a
conditionally-compiled non-Fortran-conforming SGI-only
CALL INITBLK3
at the start of subroutine INIT3.
The IRIX 7.4 f90 compiler also
thoroughly mangles the buffering of log-output in ways
that we have not yet managed to decipher completely,
much less repair. The outcome is that log output will
show up in scrambled order. (Note that
industry-standard mapping of WRITE(*,...)
onto unbuffered UNIX standard output still happens with
version 7.3 and must be preserved, but fails with
version 7.4.)
Back to "Troubleshooting" Contents
NetCDF Error Troubleshooting
- NetCDF and IRIX 7.4 and later compilers
Experience indicates that the IRIX 7.4 compilers
will not correctly build the netCDF library used by the
I/O API. Although the make
seems to succeed on that platform,
make test fails almost immediately;
attempts to use the
libnetcdf.a that
was built will also lead to program crashes.
- NetCDF Error Troubleshooting Generalities:
All the netCDF "magic numbers" are defined
in the I/O API NETCDF.EXT file (which
is the I/O API name for the file netCDF calls
src/fortran/netcdf.inc. Errors defined
in netCDF 2.x have positive values in the range 1...32
(except for NCSYSERR which is -1); errors
newly defined for netCDF 3.x are in the range -60...-1.
General methodology: find the error-number and then try
to figure out what's wrong from the name of the
corresponding PARAMETER.
Note that UCAR re-defined some of these errors between
versions 3.3.1 and 3.4 of netCDF (while leaving the
various library versions link-compatible), so you may
have to look at the src/fortran/netcdf.inc
for the version of the netCDF libnetcdf.a
you are linking with, if this is different from the
version used to build your libioapi.a
-
I get "netCDF error -1"
This is NCSYSERR, meaning the system wouldn't
give you permission for what you wanted to do. Most
probably it means you need to check permissions on either
the file you're trying to create or access, or on the
directories in its directory path.
-
I get "netCDF error 2"
"Not a netcdf id", which can happen both if the
file honestly isn't a netCDF file, and also if it
is a netCDF file, but wasn't shut correctly. (unless you've
declared a file "volatile" by
setenv <file> <path> -v,
netCDF doesn't update the file header until you call
SHUT3() or M3EXIT().)
-
I get "netCDF error 4"
"Invalid Argument", but almost certainly this
means you're using netCDF library 2.x with an I/O API
library built for netCDF version 3.x (NCAR accidentally
changed one of the "magic numbers" used in
opening files when they upgraded netCDF from 2.x to 3.x).
-
I get "netCDF error -31"
This is a variant of the system permission problem.
A directory spec of with an extra nonexistent
component, e.g., /foo/bar/qux/zorp when
you really mean /foo/bar/zorp and the
/foo/bar/qux doesn't exist seems to cause
Error -31. Can also happen by trying to open too many
netCDF files simultaneously (although the I/O API
has additional traps around this).
Or on a Cray vector machine, this may mean you're
running up against your memory limit. (On Crays, netCDF v3.x
dynamically-allocates a fairly large buffer to optimize
I/O for each file; this allocation may well push you over
your (interactive or queue) memory limit. For netCDF v3.4,
there are tricks you can play with environment variables
to manipulate these buffer sizes. This error also has
turned up with some of the more obscure file-permission
problems.
-
I get "netCDF error -40"
Probably means you tried to read data past the last
date-and-time on the file (the I/O API runs
netCDF in "verbose mode", so that netCDF
will always print all error messages,
including this one. Also can happen when the calling
program is running in parallel, but a non-MP-enabled
version of the I/O API library was linked in.
- List of netCDF 3.3.1 errors,
with attempted annotations
- ncnoerr = nf_noerr = 0: :
no error has been detected at this time.
- ncenfile = nc_syserr = -31:
see above
- ncebadid = nf_ebadid = -33:
not a netcdf ID (might indicate a bug in
I/O API internals, or attempt to use a
coupling-mode virtual file in a program linked
to an I/O API library without
coupling-mode enabled)
- nceexist = nf_eexist = -35:
attempting to create a new file when the file
already exists (from
OPEN3() with
status argument FSNEW3)
- nceinval = nf_einval = -36:
invalid argument (see above about
"incompatible netCDF and I/O API
versions")
- nceperm = nf_eperm = -37:
attempted write to a read only file
- nf_enotindefine = -38:
operation not allowed in data mode (would indicate
a bug in I/O API internals)
- nceindef = nf_eindefine = -39:
operation not allowed in define mode (would
indicate a bug in I/O API internals)
- ncecoord = nf_einvalcoords = -40:
coordinates out of range -- probably, attempt
to read past the last date-and-time on the
file. Can also be caused by running a program
in parallel with the non-MP-enabled version of
the I/O API library. (Otherwise, would
indicate a bug in I/O API internals)
- ncemaxds = nf_emaxdims = -41:
maxncdims exceeded (would indicate a bug in
I/O API internals)
- ncename = nf_enameinuse = -42:
string match to name in use: indicates that
you're trying to have two different variables
with the same name when creating a file
- ncenoatt = nf_enotatt = -43:
attribute not found: would indicate that a
file is not a correct I/O API file,
because it is missing some of the required
FDESC3 header-components
- ncemaxat = nf_emaxatts = -44:
maxncattrs exceeded (would indicate
a bug in I/O API internals)
- ncebadty = nf_ebadtype = -45:
not a netcdf data type: you are trying to
create a file for which some value of
VGTYP3D(<variable>) in
FDESC3 is not one of
M3INT. M3REAL, or
M3DBLE
- ncebadd = nf_ebaddim = -46:
invalid dimension ID (would indicate a bug in
I/O API internals)
- nceunlim = nf_eunlimpos = -47:
ncunlimited in the wrong index: Could be
caused by incorrectly-set (or un-set) grid
dimensions
NCOLS3D,
NROWS3D, NLAYS3D, or
NTHIK3D (else would indicate a bug
in I/O API internals).
- ncemaxvs = nf_emaxvars = -48:
maxncvars exceeded (would indicate a bug in
I/O API internals--probably means somebody
changed INCLUDE-file
PARMS3.EXT
inappropriately for the target machine.)
- ncenotvr = nf_enotvar = -49:
variable not found (attempt to read or write a
variable not actually in the file; would
indicate a bug in I/O API internals)
- ncenotvr = nf_eglobal = -50:
action prohibited on ncglobal varid (would
indicate a bug in I/O API internals)
- ncenotnc = nf_enotnc = -51:
not a netcdf file: File not recognized as a
netCDF file (possibly empty; possibly not
closed properly (e.g., no
SHUT3()
or M3EXIT()).
- ncests = nf_ests = -52:
In Fortran, string too short (shouldn't happen
with I/O API)
- ncentool = nf_emaxname = -53:
variable-name or attribute-name too long (would
indicate a bug in I/O API internals)
- nf_eunlimit = -54:
something went wrong with the time dimension in
a file; might indicate a bug in I/O API
internals
- nf_enorecvars = -55:
attempting to time-step a time-independent file;
would indicate a bug in I/O API internals
- nf_echar = -56:
Attempt to convert between text & numbers (would
indicate a bug in I/O API internals)
- nf_eedge = -57:
subscript out-of-bounds error (would indicate
a bug in I/O API internals)
- nf_estride = -58:
illegal stride (won't happen with I/O API)
- nf_ebadname = -59:
variable name contains illegal characters
- nf_erange = -60:
math result not representible (could not convert from
native machine floating-point format to XDR/IEEE
floating-point format; should be Cray PVP-only)
- NC_ENOMEM = -61:
internal netCDF memory allocation failure
- ncfoobar = 32:
Something is messed up, and netCDF doesn't have
an error number for it, or doesn't understand how/why
the messup happened
- other errors: should be
OS errors, as defined in the system's
/usr/include/sys/errno.h
Back to "Troubleshooting" Contents
Other Problems
- Why does something with log output, netCDF
crashes, netCDF failures, etc. happen with the SGI
Version 7.4 compilers?
- Why does the I/O API "hang" inside
env*() calls on my Linux box, using the
Portland compilers?
Analysis due to Robert Elleman, Dept of Atmospheric
Sciences, University of Washington: When programs
are compiled with the Portland compilers, without
the -mp flag (as is the default
for mcip) but the I/O API is compiled
with this flag (as is the I/O API default),
the program will hang (i.e., appear to freeze, consuming
all available computational resources but making no
evident progress).
Solution: either use the
-mp compile flag for all
compiles -- both program and library, or use it for
neither.
General principle: Make sure the program
compile-flags and the I/O API compile-flags (and
the netCDF compile-flags!) are consistent!
- "Why do I have trouble with my
LOGFILE
on my SGI?"
There is a problem with SGI f90
Version 7.4 and initialization of COMMON
blocks. The Fortran language standard specifies that
COMMON blocks must be initialized by
BLOCK DATA subprograms, but (since the
actual operations of compiling and linking are not
covered by the language standard, which considers them
"implementation details") does not specify
just how to ensure that the BLOCK DATA
subprogram is linked in with the rest of the executable.
Usual and customary industry practice is that the use
of a statement
EXTERNAL FOOBLOCK
in either the main program or in other subroutines that
are called should ensure that BLOCK DATA FOOBLOCK
is linked into the final executable. This does not
happen with SGI f90 Version 7.4,
even in very simple test cases.
Note that BLOCK DATA INITBLK3 is needed to
initialize I/O API internal data structures,
including the unit number for LOGFILE and
the number of I/O API files currently open;
fortuitously, the latter seems to be initialized to zero
(which is correct); the former is not initialized
correctly, leading to failures to open and use a
LOGFILE when you try to specify one.
Note that this error does not seem to happen with
SGI f90 Version 7.3 or earlier.
I have submitted this problem to SGI in an error
report. Their reply is to suggest the use of
non-standard
CALL DATA INITBLK3, which would
need to be done by every internal I/O API
routine that references the STATE3 internal
data structures.
--CJC
- "Why do I get messages about unresolved symbols
with names like
__mp_getlock,
__mp_unlock, or something else with
_mp in it?"
This probably means that you are using a version of the
libioapi.a that is enabled for
OpenMP parallel
usage, but have not activated the system parallel
librairies. On SGI, this means that you need to add a
directive -lmp at the end of the library
list in the final compile command that links the
executable (and similarly for other compilers).
See the variable OMPLIBS defined in
your machine/compiler's Makeinclude.${BIN}.
- "Why are my program log and my Fortran-style files
missing or screwed up? And where did these
fort.<nn> files come from?"
On some systems (notably Sun and SGI), there are
incompatibilities in run-time libraries between
f77 and f90.
The upshot is that on these systems, you can link
together Fortran-77 and C using f77,
or Fortran-90 and C using f90, but
you can't link together Fortran-77 and Fortran-90.
The default I/O API distribution is built using
f77 and runs into this problem when your
model code is built using f90. The
solution is to rebuild the I/O API using
f90.
- Problems with RedHat 7.0 Linux (thanks to Zion Wang for
chasing this down):
RH7 uses quite-nonstandard gcc v2.96 and
glibc versions; there are patches available at URL
http://www.redhat.com/support/errata/rh7-errata-bugfixes.html
RH7's gcc v2.96 does not work with the standard
edition of the Portland Group F90 compiler; there is a
version which does work; see URL
http://www.pgroup.com/faq.htm: (UPDATE on: RED HAT 7.0
and 3.2 RELEASE COMPILERS!)
- "My program does a
segmentation fault
on the OPEN3 call when I attempt to create a
new file!"
Probably the file description was not completely filled
in. This has been observed, for example, when one of
the variable names VNAME3D(I) in
FDESC3.EXT was not set correctly. (What
actually happens is that the FDESC3.EXT data
structures are initialized to zero by the linker; then
the netCDF internals don't handle strings that contain
just ASCII zeros correctly).
- "My program wrote the data out but I can't
read/
ncdump/PAVE it now!"
Probably the file wasn't shut correctly. (Unless
you've declared a file "volatile" by
setenv <file> <path> -v,
netCDF doesn't update the file header until you call
SHUT3() or M3EXIT().)
- "The log says OPEN3() could not open the file,
and specifies the logical name rather than
the physical file name."
This usually means that the script which ran your program
failed to execute correctly the setenv to define
the logical name for the file. Try using the env
command in the script before you run the program, in order
to get started debugging your script.
- "Why does the linker say
ncabor_
or open3_ (etc.) is an undefined symbol?"
There are three probable causes we've been observing:
- Link command-line order:
Probably, the command line that links your
program has -lnetcdf
before -lioapi
instead of after. (Most UNIX linkers only try
to resolve things in terms of libraries yet to
be defined, and don't go backwards. E.g., if
you have
!!! INCORRECT !!! f77 -o foo foo.o ...
-lnetcdf -lioapi
the linker
won't know where to go to find netCDF
functions that are called in the I/O API;
instead, if you use
!! CORRECT: f77 -o foo foo.o ... -lioapi -lnetcdf
then
the linker will scan "-lnetcdf" to
find functions called in "-lioapi"
Another possibility is that you are doing
multilingual programming, and
using maybe "cc" or "g++"
or something else to do the link step. If so,
you need to explicitly list the libraries that
f77 would include. The list of these is vendor
dependent but frequently looks something like
... -lf77 -lU77 -lm
One way to find out is to try to use
f77 in verbose mode
(f77 -v ... on most UNIX
systems) to do the linking: it may not find
the needed C++ libraries, but it will tell
you what libraries it needed for the Fortran
part of the I/O API and you can then
modify your original link command to use them.
- Compiler name inconsistencies
Compilers "mangle" the names of
Fortran COMMON blocks, functions,
and subroutines in various ways (usually turn
them into lower case, and then prefix or
postfix them by one (or, for
gcc/g77, sometimes two)
underscores. This will be a problem when
you use the Intel or Portland Group compilers
on Linux systems that come with a
system-installed libnetcdf.a
(which will have been built with
gcc/g77).
The precise mangling behavior depends upon the
compiler, your system defaults file for the
compiler, and the compile/link command lines
themselves. (It can also happen that
netCDF was built without the
expected Fortran or C++ support thay your
model was expecting. A useful UNIX utility for
diagnosing these problems is nm,
which reports what linker visible symbols are
present in binary executable, object
(.o), or library
(.so and .a) files.
So if you see a linker error message like
symbol foo_ not found (referenced in bar.o)
then do the following sorts of things:
nm foo.o | grep -i foo
nm libnetcdf.a | grep -i foo
nm libioapi.a | grep -i foo
etc, and maybe
man -k foo
to try to find which program-component has the
differently-mangled symbol that the linker
needs. Then go back and review the compiler
flags used in the build-process for that
component.
The latest release of
I/O API Version 2.2 has a script
nm_test.csh to help you with this:
run
nm_test.csh <obj-file>
<lib-file> <symbol>
- Bad compiler
installation/configuration
Sometimes you'll find that the missing
symbol was in a system routine that the
compiler should have known about but somehow
(maybe bad compiler-installation) didn't.
That one happened to me earlier this week
(as I write this May 3, 2002) on an HP
system.
- "PAVE reports bad values -- -9.xxE37 or something!"
This is a PAVE bug, not an I/O API bug: the original
person who wrote the file-reader for PAVE couldn't be
bothered to use the I/O API, but instead used raw
netCDF reads without proper data-structure and error
checking. NetCDF fills in "holes" in its files
with a particular fill-value that you are seeing, and this
is an indication that the data for that variable and time
step was never written to the file. This happens, for
example, at the starting time for an
MM5/MCPL run, for some of the variables which aren't
calculated until after the run is in progress.
This is fixed in Pave Version 2 and later.
- "I get an error message that looks something like"
>>> WARNING in subroutine CHKFIL3 <<<
Inconsistent file attribute NVARS for file FOO
Value from file: 6
Value from caller: 9
This means that
- File
FOO already exists
- You are trying to open it as "unknown"
(
FSTATUS=FSUNKN3) in the call
to OPEN3
- The file description from within file
FOO's header does not match the
file description you have supplied in the
FDESC3 COMMONs.
For the I/O API, you can't change a file's definition
once it has already been created. What you probably
want to do is to delete the existing file (or move it
somewhere else), and re-run your program--this time
creating a new file according to the description you
supply.
Back to "Troubleshooting" Contents
To: Models-3/EDSS I/O API: The Help Pages
Send comments to
Carlie J. Coats, Jr.
carlie.coats@baronams.com