PARALLEL_NOTES 6 Oct 2004 The following are some observations related to running the CMAQ CCTM in a Linux MPICH cluster environment where the various computational nodes are connected via Network File System (NFS). Running the CCTM for the first day of the 32km tutorial in a parallel cluster with 8 processors took 1.5 minutes wall time, whereas running it on a single processor took 4.3 minutes. The reason for the poor scaling of the parallel run has to do with the size of the problem (domain) in relation to the proportion of I/O processing, which is inherently serial. In addition, the relative inefficiency of NFS in sending internet packets across the local network fast enough to keep the CPUs on the participating nodes busy may contribute. Clusters that have local high-bandwidth interconnects or are isolated from general network traffic (e.g. Scyld Beowulf) will not suffer this problem as much. That is, even relatively small problem sizes will still go faster in parallel than in serial. We infrequently experienced the situation where a run on 8 processors (one local, 3 remote dual cpu Intel XEON boxes) would start and hang. But re- launching the run succeeded. This may be due to network latency problems. We had difficulties running in the MPICH cluster associated with the standard practice of automouting directories. It was necessary to "hard mount" the data directory to which we wrote the CMAQ outputs. To make our MPICH linux cluster work, we had to put all the machines that we wanted to use in the ~/.rhosts file. Otherwise, we would get a "permission denied" when you launch mpirun. One operational note: In addition to the main log file, produced by the "processor 0" task, the runs produce ancillary log files for each of the other tasks. If you re-execute a particular run that intends to write to logs that have the same name as these ancillary files, the run will probably hang. You must dispose of these ancillary log files first.