I/O API Troubleshooting

Contents

Back to the I/O API User Manual

NOTE

If you run into troubles with I/O API related programs, it is useful to know the versions of all the software components. The CVS-related program ident can report to you versioning keywords in the various components of (binary) object, library, or executable files. For example, I can run the following sequence of commands on my desktop machine to find out the versioning information of various binary components:
% cd $HOME/apps/$BIN
% ident init3.o
    —reports the INIT3 version:  init3.F 87 2015-01-07 17:37:58Z coats
% ident libioapi.a
    —reports the INIT3 and M3UTILIO versions
% ident m3stat
    —reports the INIT3, M3UTILIO, and netCDF versions
    
Each I/O API source file will have its version embedded in the file's header-comment, e.g.
    !! Version "$Id: ERRORS.html 102 2018-04-09 16:01:49Z coats $"
    


Errors after retrofitting MODULE M3UTILIO

Since MODULE M3UTILIO itself INCLUDEs the standard I/O  include-files and also has INTERFACE-blocks for (almost all of) the public I/O API functions, when you retrofit USE M3UTILIO into an old code, you must remove these INCLUDE-statements and declarations and EXTERNAL statements for the public I/O API functions. If you missed some of these, you may see compile errors like the following
...
/home/coats/ioapi-3.2/ioapi/PARMS3.EXT(66): error #6401: The attributes of this name conflict with those made accessible by a USE statement.   [NAMLEN3]
...
/home/coats/ioapi-3.2/m3tools/m3tproc.f90(102): error #6401: The attributes of this name conflict with those made accessible by a USE statement.   [GETNUM]
...
    
or
...
Error: Symbol 'getnum' at (1) conflicts with symbol from module 'm3utilio', use-associated at (2)
...
    
or... Back to "Troubleshooting" Contents

To fix these errors, remove the corresponding INCLUDE-statements, function-declarations, and EXTERNAL statements.

Back to "Troubleshooting" Contents


Missing-symbol issues

Problem: When I try to link, I get "missing symbol" errors.

If the symbols begin with nc_ or nf_, this may be a netCDF3-netCDF4 or netCDF-library oreder problem.

Otherwise, especially if the symbols have a mp somewhere in them, you probably need to add the appropriate OpenMP fiags to the compile and link commands used to build your program.

Note that for two decades, the I/O API has supported safe OpenMP parallelism—in both models and in certain of the m3tools programs. That means that to build programs, you need to use the compile- and link-options appropriate to this fact: see the variables OMPFLAGS (for the compile-stage) and OMPLIBS (for the link-stage) from the Makeinclude.${BIN} used to build your I/O API library, in order to see what is needed for your compiler.

Back to "Troubleshooting" Contents


Compiler/system-library compatibility issues

In general, you are best off if you can build the whole modeling system (libnetcdf.a, libpvm3.a, libioapi.a, and your model(s) CMAQ, SMOKE, etc. with a common compiler set and common set of compile-flags. When this is not done, there are a number of compatibility issues with mixed compiler sets, and with the GNU 3.x-4.x compiler set these get worse. Some of these problems show up at link time; others at run-time. In particular, the following are known to have problems:
Back to "Troubleshooting" Contents


Warning messages with recent Intel compilers

Starting with their Version 16 compilers, Intel has introduced a new compiler directive -qopenmp to enable OpenMP, and has deprecated the previous -openmp. This previous-version flag now results in a "deprecated flag" warning from the compiler. Changing the Makeinclude.*ifort* to match this compiler-change can eliminate this compile-warning for the latest set of Intel compilers at the cost of making, for example, makes, Makeincludes, etc. incompatible with Intel-16 ones.

Back to "Troubleshooting" Contents


"Internal compiler error" Problems

At least some versions of the Intel compilers icc and ifort cannot handle the internal complexity of some routines (usually iobin3.c) when compiling with full optimization: one will see error messages like the following when running make for the I/O API (where I've used backslashes to fold the compile-line to make it readable):
cd /nas01/depts/ie/cempd/apps/CMAQ/v5.1/Linux2_x86_64ifortopenmpi;     \
  icc -c -DIOAPI_PNCF=1 -DAUTO_ARRAYS=1 -DF90=1 -DFLDMN=1 -DFSTR_L=int \
  -DIOAPI_NO_STDOUT=1 -DAVOID_FLUSH=1 -DBIT32=1  -O3 -traceback -xHost \
  -DVERSION='3.2-nocpl' /nas01/depts/ie/cempd/apps/CMAQ/v5.1/ioapi/iobin3.c
/nas01/depts/ie/cempd/apps/CMAQ/v5.1/ioapi/iobin3.c(1111) (col. 29): internal error: 0_1529

compilation aborted for /nas01/depts/ie/cempd/apps/CMAQ/v5.1/ioapi/iobin3.c (code 4)
make: *** [iobin3.o] Error 4
    

A measure that generally works is to re-do the last compile-command manually, but with a lower optimization, and then re-do the make. It is useful to cut-and-paste the last command into a sub-shell (enclosing the command by parentheses), with the "-O3" eliminated or reduced to "-O", as in the following example:

( cd /nas01/depts/ie/cempd/apps/CMAQ/v5.1/Linux2_x86_64ifortopenmpi;   \
  icc -c -DIOAPI_PNCF=1 -DAUTO_ARRAYS=1 -DF90=1 -DFLDMN=1 -DFSTR_L=int \
  -DIOAPI_NO_STDOUT=1 -DAVOID_FLUSH=1 -DBIT32=1 -traceback -xHost      \
  -DVERSION='3.2-nocpl' /nas01/depts/ie/cempd/apps/CMAQ/v5.1/ioapi/iobin3.c )
! make
    

This trick is also useful when trying to do highly-optimized builds of other models that contain large, complex routines (WRF, CMAQ, ...)

Back to "Troubleshooting" Contents


Link errors with pgf90 and ifort on Linux

General Principle: various Fortran compilers "mangle" subroutine names (etc.) for the linker in various ways.

Note: 64-bit mode under Linux adds further issues.

Note added 2/24/2009:: Aparna Vemuri of EPRI reports troubles with recent gcc compiler systems and Portland Group pgf90: the gcc Fortran name mangling system has changed, requiring a change in compile flags. For mixed pgf90/gcc builds, one can either remove the -Msecond_underscore flag from FOPTFLAGS in the Makeinclude.Linux2_x86_64pg_gcc* or else change the line CC = pgcc to CC = gcc in the Makeinclude.Linux2_x86_64pg_pgcc* files. These modifications have been made to the 2/24/2009 release of the Makeinclude.Linux2_x86_64pg_gcc*, with the older flags commented out, for use by those who need them.

In particular, Gnu Fortrans (g77 and g95) have different name mangling behavior than is the default with Portland Group pgf90. Vendor supplied NetCDF librararies libnetcdf.a always use the Gnu Fortran conventions, and as such are incompatible with the default compilation flags for SMOKE or CMAQ. For the Linux/Portland Group/SMOKE or CMAQ combination, you have two choices:

This Portland Group inconsistency is exactly why the I/O API is supplied with multiple /Makeinclude.Linux2_x86pg* files in the first place... Note that the I/O API supplies a script nm_test.csh and a make target

make nametest
to help you identify such problems.

Back to "Troubleshooting" Contents


gcc/g77 on x86_64 problems

Added 4/4/2005

Internal compiler errors have shown with gcc/g77 on the some Linux distributions for x86_64, particularly with Fedora Core 3 and Red Hat Enterprise Linux Version 3 for x86_64: the symptom is a sequence of messages such as the following:

error: unable to find a register to spill in
class `AREG'
/work/IOAPI/ioapi/currec.f:93: error: this is the insn:
(insn:HI 145 171 170 8 (parallel [
            (set (reg:SI 3 bx [95])
                (div:SI (reg/v:SI 43 r14 [orig:67 secs ] [67])
                    (reg/v:SI 2 cx [orig:68 step ] [68])))
            (set (reg:SI 1 dx [96])
                (mod:SI (reg/v:SI 43 r14 [orig:67 secs ] [67])
                    (reg/v:SI 2 cx [orig:68 step ] [68])))
            (clobber (reg:CC 17 flags))
        ]) 264 {*divmodsi4_cltd} (insn_list:REG_DEP_ANTI 92
(insn_list:REG_DEP_OUTPUT 91 (insn_list 140 (insn_list 84
(insn_list:REG_DEP_ANTI 139 (nil))))))
    (expr_list:REG_DEAD (reg/v:SI 43 r14 [orig:67 secs ] [67])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (expr_list:REG_UNUSED (reg:SI 1 dx [96])
                (nil)))))
...confused by earlier errors, bailing out
    
A workaround is to weaken architecture/optimization flags for binary type Linux2_x86_64 to get around this compiler bug -- eliminating the -fschedule-insns and -march=opteron optimization flags from "Makeinclude.Linux2_x86_64" will tend to get rid of the problem. Note that this same compiler bug will bite you when trying to build lots of other stuff (TCL/TK, plplot, NCAR graphics), on FC3/gcc/g77 systems, and the same fix seems to work for many other problems as well.

Back to "Troubleshooting" Contents


relocation error link issues for x86_64 Linux

If sizes of individual arrays or of COMMON blocks exceed 2GB on the x86_64 platforms, Intel ifort and icc will give you failures, with messages about relocation errors at link-time. The problem is that the default "memory model" doesn't support huge arrays and huge code-sets properly. The "medium" memory model supports huge arrays, and the "medium" memory model supports both huge arrays and huge code-sets. To get around this, you will need to add
-mcmodel=medium -shared-intel
to your compile and link flags (for the medium model), and then recompile everything including libioapi.a and libnetcdf.a using these flags. Note that this generates a new binary type that should not be mixed with the default-model binaries. There is a new binary type BIN=Linux2_x86_64ifort_medium for this binary type, and a is a sample Makeinclude file for it, to demonstrate these flags:
Makeinclude.Linux2_x86_64ifort_medium

Other compilers and other non-Linux x86_64 platforms will have similar problems, but the solutions are compiler specific.

Back to "Troubleshooting" Contents


Error: Program Exception - illegal instruction on Linux

Thanks to Christopher G. Nolte, Ph.D., US EPA Office of Research and Development for his M3USER mailing-list comments this one.

This is probably the result of compiling either the library or the model (or both) for a different processor-model than you are running it on.

Background:: As it creates new processor generations, Intel adds new and more powerful arithmetic instructions with each generation. Code that uses these new instructions will give the above-shown error message when you attempt to run it on an old processor. There were four major iterations of the SSE family of instructions (starting with the Pentium II generation of processors and continuing through the Westmere; each was mildly faster than its predecessor. SandyBridge introduced the much more powerful AVX family of instructions and Haswell the AVX2. Note that each processor generation does support all the previous generations of instructions (but not, of course, vice versa).

Various compilers give access to these instructions, using compiler directives specific to that compiler. Some examples are:

Intel ifort/icc:
-x... directives:
-xHost: Use all the instructions for this machine
-xSSE4.2: Nehalem or later
-xAVX: SandyBridge or later
-xAVX2: Haswell or later

GNU gfortran/gcc
-march=... -mtune=... directives: the first of these governs instruction set use; the second controls how the optimizer uses it
-march=native -mtune=native: this machine's architecture
-march=corei7 -mtune=corei7: Nehalem or later (SSE4.2)
-march=corei7-avx -mtune=corei7-avx: SanyBridge or later (AVX)
-march=corei7-avx2 -mtune=corei7-avx2: Haswell or later (AVX2)

Portland Group ifort/icc
Default is this machine's architecture (dangerous if you have multiple different-generation machines!)
-tp=nehalem: Nehalem or later (SSE4.2)
-tp=sandybridge: SanyBridge or later (AVX)
-tp=haswell: Haswell or later (AVX2)
Well designed modeling codes will get approximately a 20-25% performance boost for using SSE4.2 instructions, a further 70-80% boost for AVX, and a further 25-30% for AVX2. Because of its sloppy coding, WRF will get less than half that much speedup, and CMAQ even less than that (due to the fact that these codes are so bottlenecked by main-memory operations that improving the arithmetic doesn't help much) . Note that Intel and AMD have also improved the memory systems of the various processor generations, giving a further 5-10% performance boost per processor generation for that reason (independent of which instruction set you're using).

See https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions and https://en.wikipedia.org/wiki/Advanced_Vector_Extensions for more information about the SSE and AVX families of new instructions.

Back to "Troubleshooting" Contents


IRIX 7.4 Problems

Added 12/18/2003

SGI F90 compiler-flag problems: It seems that SGI version 7.4 and later Fortran compilers demand a different set of TARG flags than do 7.3.x and before. For example, for an Origin 3800 (where hinv reports

24 400 MHZ IP35 Processors CPU: MIPS R12000 Processor Chip Revision: 3.5 ...
one would use the following sets of ARCHFLAGS compiler flags in Makeinclude.${BIN} with the different Fortran-90 compiler versions:

There are a number of problems with both the I/O API and netCDF with the newer (version 7.4) SGI compilers:

Added 12/18/2003
SGI claims to have fixed this in the latest patch for F90 version 7.4.1 (bug # 895393); I haven't had time to test it yet, though. -- CJC
Back to "Troubleshooting" Contents


NetCDF Error Troubleshooting

Back to "Troubleshooting" Contents


Other Problems

Back to "Troubleshooting" Contents


To: Models-3/EDSS I/O API: The Help Pages

Send comments to

Carlie J. Coats, Jr.
carlie@jyarborough.com