UCVM on Frontier

From SCECpedia
Jump to navigationJump to search

Current Target Specs - Size of CyberShake Meshes

Dimensions: 460.8 x 774.4 x 50.56 km
Rotation angle: 36 degrees counterclockwise
Corner points:
      -126.18649, 39.75063 (W)
      -121.85281, 42.27791 (N)
      -116.72395, 36.56162 (E)
      -120.90294, 34.22243 (S)

* Sample meshes for s3446 are 5760 x 9680 x 632 ~ 35B points.  
* Run on 96 nodes (56 cores/node), and it takes around 10 minutes.

80m spacing
5760 * 80 = 460,800m
9680 * 80 = 774,400m
632 *  80 =  50,560m

It's a rotated volume, so it's 5760 in the E/W direction but then rotated counter-clockwise 36 degrees.

ucvm2mesh specs

UCVM Rebuild

git pull
./ucvm_setup.py -r -a -d -p $UCVM_INSTALL_PATH

Testing the UCVM installation on Frontier

We are implementing tests of the development v3ersion of UCVM used for CyberShake NorCal. At the end of the install, we expect the following model to be available:

  • CCA
  • sf1d

Building Process

Building on the head node is very slow. We req Looks like we need to build on compute node. But compute node not network accessible. So do git clone, and largefile downloads on head node, then when ready to make, request a compute node.


UCVM_INSTALL_PATH /lustre/orion/proj-shared/geo156/pmaech/scratch/TARGET_UCVM_SFCVM/ucvm_install

Setup Frontier Modules

[login03.frontier ~]$ module list

Currently Loaded Modules:
  1) craype-x86-trento                       7) cray-dsmml/0.2.2       13) darshan-runtime/3.4.0
  2) craype-network-ofi                      8) cray-libsci/  14) hsi/default
  3) perftools-base/22.12.0                  9) PrgEnv-cray/8.3.3      15) lfs-wrapper/0.0.1
  4) xpmem/2.6.2-2.5_2.22__gd067c3f.shasta  10) cray-python/   16) DefApps/default
  5) cray-pmi/6.1.8                         11) libfabric/     17) libtool/2.4.6
  6) craype/2.7.19                          12) gcc/10.3.0             18) cray-mpich/8.1.23

This is typically built by keeping the default modules plus these:

  • module load cray-python
  • module load libtool/2.4.6
  • module load libfabric
  • module load gcc/10.3.0

Testing with two account showed this built ucvm binaries and tests(except CCA) passed.

Example Installation

Code is built a UCVM installtion on Frontier at : /ccs/home/mei/scratch/TARGET_UCVM_SFCVM/ucvm_install

   source conf/ucvm_env.sh
   which ucvm_query
   ucvm_query -H

Next is to run test/run_testing to run some basic unit testing

Install Script on Frontier

hn=`hostname -d`

export MY_TOP=$ppwd/scratch

export UCVM_SALLOC_ENV="-A geo156 -q debug"
export LD_LIBRARY_PATH=/opt/cray/libfabric/$LD_LIBRARY_PATH
export LIBRARY_PATH=/opt/cray/libfabric/$LIBRARY_PATH


git clone https://github.com/SCECcode/ucvm.git -b withSFCVM UCVM

cd $UCVM_SRC_PATH/largefiles
./get_largefiles.py -m sfcvm,cca,cvmsi,cvms

cd $UCVM_SRC_PATH/largefiles; ./stage_largefiles.py

./ucvm_setup.py -d -a -p $UCVM_INSTALL_PATH &> ucvm_setup_install.log

cd $UCVM_SRC_PATH; make check &> make_check.log

echo "..EXITING.."

Interactive session to run Build UCVM or run Acceptance Tests

salloc -A geo156 -N 1 -t 1:30:00 -J UCVM_Tests -q debug

Frontier library not loading

This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by UCVM configure 22.7.0, which was
generated by GNU Autoconf 2.69.  Invocation command line was

  $ ./configure --enable-silent-rules --with-fftw-include-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/fftw/include --with-fftw-lib-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/fftw/lib --with-etree-include-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/euclid3/include --with-etree-lib-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/euclid3/lib --with-hdf5-include-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/hdf5/include --with-hdf5-lib-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/hdf5/lib --with-openssl-include-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/openssl/include --with-openssl-lib-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/openssl/lib --with-tiff-include-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/tiff/include --with-tiff-lib-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/tiff/lib --with-sqlite-include-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/sqlite/include --with-sqlite-lib-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/sqlite/lib --with-curl-include-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/curl/include/curl --with-curl-lib-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/curl/lib --with-proj-include-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/proj/include --with-proj-lib-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/lib/proj/lib --enable-model-cca --with-cca-lib-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/model/cca/lib --with-cca-include-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/model/cca/include --enable-model-cvms --with-cvms-include-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/model/cvms/include --with-cvms-lib-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/model/cvms/lib --with-cvms-model-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/model/cvms/src --enable-model-cvmsi --with-cvmsi-lib-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/model/cvmsi/lib --with-cvmsi-include-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/model/cvmsi/include --with-cvmsi-model-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/model/cvmsi/model/i26 --enable-model-sfcvm --with-sfcvm-lib-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/model/sfcvm/lib --with-sfcvm-include-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/model/sfcvm/include --with-sfcvm-model-path=/lustre/orion/geo156/scratch/dean316/build_UCVM/build/model/sfcvm/src --prefix=/lustre/orion/geo156/scratch/dean316/build_UCVM/build

## --------- ##
## Platform. ##
## --------- ##

hostname = login13
uname -m = x86_64
uname -r = 5.14.21-150400.24.46_12.0.83-cray_shasta_c
uname -s = Linux
uname -v = #1 SMP Tue May 23 03:16:47 UTC 2023 (c6cda89)

/usr/bin/uname -p = x86_64
/bin/uname -X     = unknown

/bin/arch              = x86_64
/usr/bin/arch -k       = unknown
/usr/convex/getsysinfo = unknown
/usr/bin/hostinfo      = unknown
/bin/machine           = unknown
/usr/bin/oslevel       = unknown
/bin/universe          = unknown

PATH: /sw/frontier/lfs-wrapper/0.0.1/bin/lfs
PATH: /opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/bin
PATH: /opt/cray/pe/mpich/8.1.23/bin
PATH: /sw/sources/hpss/bin
PATH: /sw/frontier/spack-envs/base/opt/cray-sles15-zen3/gcc-10.3.0/darshan-runtime-3.4.0-g5tkbmgrfje7vnnh7ppfb6s5b7frivrl/bin
PATH: /opt/cray/pe/gcc/10.3.0/bin
PATH: /opt/cray/libfabric/
PATH: /sw/frontier/spack-envs/base/opt/cray-sles15-zen3/gcc-10.3.0/libtool-2.4.6-4kukgkkoovpfysxbav23pbrddwn7kjbm/bin
PATH: /opt/cray/pe/python/
PATH: /opt/conda/bin
PATH: /opt/clmgr/sbin
PATH: /opt/clmgr/bin
PATH: /opt/sgi/sbin
PATH: /opt/sgi/bin
PATH: /sw/frontier/bin
PATH: /ccs/home/dean316/.local/bin
PATH: /usr/local/bin
PATH: /usr/bin
PATH: /bin
PATH: /opt/bin
PATH: /opt/c3/bin
PATH: /usr/lib/mit/bin
PATH: /opt/puppetlabs/bin
PATH: /sbin

## ----------- ##
## Core tests. ##
## ----------- ##

configure:2426: checking for a BSD-compatible install
configure:2494: result: /usr/bin/install -c
configure:2505: checking whether build environment is sane
configure:2560: result: yes
configure:2711: checking for a thread-safe mkdir -p
configure:2750: result: /usr/bin/mkdir -p
configure:2757: checking for gawk
configure:2773: found /usr/bin/gawk
configure:2784: result: gawk
configure:2795: checking whether make sets $(MAKE)
configure:2817: result: yes
configure:2846: checking whether make supports nested variables
configure:2863: result: yes
configure:3032: checking for ranlib
configure:3048: found /usr/bin/ranlib
configure:3059: result: ranlib
configure:3087: checking build system type
configure:3101: result: x86_64-pc-linux-gnu
configure:3121: checking host system type
configure:3134: result: x86_64-pc-linux-gnu
configure:3213: checking for style of include used by make
configure:3241: result: GNU
configure:3267: checking whether to compile using MPI
configure:3274: result: yes
configure:3330: checking for mpicc
configure:3346: found /opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/bin/mpicc
configure:3357: result: mpicc
configure:3431: checking for gcc
configure:3458: result: mpicc
configure:3687: checking for C compiler version
configure:3696: mpicc --version >&5
gcc (GCC) 10.3.0 20210408 (Cray Inc.)
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO

configure:3707: $? = 0
configure:3696: mpicc -v >&5
mpicc for MPICH version 8.1.23
Using built-in specs.
Target: x86_64-suse-linux
Configured with: ../cpe-gcc-10.3.0-202104220029.0777bcc28ac1d/configure --prefix=/opt/cray/pe/gcc/10.3.0/snos --disable-nls --libdir=/opt/cray/pe/gcc/10.3.0/snos/lib --enable-languages=c,c++,fortran --with-gxx-include-dir=/opt/cray/pe/gcc/10.3.0/snos/include/g++ --with-slibdir=/opt/cray/pe/gcc/10.3.0/snos/lib --with-system-zlib --enable-shared --enable-__cxa_atexit --build=x86_64-suse-linux --with-ppl --with-cloog --disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.3.0 20210408 (Cray Inc.) (GCC) 
configure:3707: $? = 0
configure:3696: mpicc -V >&5
gcc: error: unrecognized command-line option '-V'
configure:3707: $? = 1
configure:3696: mpicc -qversion >&5
gcc: error: unrecognized command-line option '-qversion'; did you mean '--version'?
configure:3707: $? = 1
configure:3727: checking whether the C compiler works
configure:3749: mpicc    conftest.c  >&5
/usr/bin/ld: warning: libfabric.so.1, needed by /opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib/libmpi_gnu_91.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: /opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib/libmpi_gnu_91.so: undefined reference to `fi_version@FABRIC_1.0'
/usr/bin/ld: /opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib/libmpi_gnu_91.so: undefined reference to `fi_dupinfo@FABRIC_1.3'
/usr/bin/ld: /opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib/libmpi_gnu_91.so: undefined reference to `fi_strerror@FABRIC_1.0'
/usr/bin/ld: /opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib/libmpi_gnu_91.so: undefined reference to `fi_freeinfo@FABRIC_1.3'
/usr/bin/ld: /opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib/libmpi_gnu_91.so: undefined reference to `fi_fabric@FABRIC_1.1'
/usr/bin/ld: /opt/cray/pe/mpich/8.1.23/ofi/gnu/9.1/lib/libmpi_gnu_91.so: undefined reference to `fi_getinfo@FABRIC_1.3'
collect2: error: ld returned 1 exit status
configure:3753: $? = 1
configure:3791: result: no
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_TARNAME "ucvm"
| #define PACKAGE_VERSION "22.7.0"
| #define PACKAGE_STRING "UCVM 22.7.0"
| #define PACKAGE_BUGREPORT "software@scec.org"
| #define PACKAGE_URL ""
| #define PACKAGE "ucvm"
| #define VERSION "22.7.0"
| /* end confdefs.h.  */
| int
| main ()
| {
|   ;
|   return 0;
| }
configure:3796: error: in `/lustre/orion/geo156/scratch/dean316/build_UCVM/UCVM':
configure:3798: error: C compiler cannot create executables
See `config.log' for more details
Problem that we are seeing on Frontier.   This is the work around,

cat -10  config.log > r

edit r to just have the configure command call

./r   to run the command by hand

make install


./ucvm_setup.py -a -r -d -p YOUR_UCVM_INSTALL_PATH 

check if 

/conf  directory has ucvm_env.sh

Example or Typical Configuration Errors

srun -N8 -n512 block:cyclic ${BIN_DIR}/ucvm2mesh_mpi -f ./norcal_ucvm2mesh.conf
srun: error: Unable to create step for job 1875534: More processors requested than permitted
srun -N8 -c -m block:cyclic ${BIN_DIR}/ucvm2mesh_mpi -f ./norcal_ucvm2mesh.conf
srun: error: Invalid numeric value "-m" for --cpus-per-task.
srun -N8 -c --ntasks-per-core=1 -m block:cyclic ${BIN_DIR}/ucvm2mesh_mpi -f ./norcal_ucvm2mesh.conf
srun: error: Invalid numeric value "--ntasks-per-core=1" for --cpus-per-task.
srun -N8 -c --ntasks-per-node=64 --ntasks-per-core=1 -m block:cyclic ${BIN_DIR}/ucvm2mesh_mpi -f ./norcal_ucvm2mesh.conf
srun: error: Invalid numeric value "--ntasks-per-node=64" for --cpus-per-task.
srun -N8 -n256 -m block:cyclic ${BIN_DIR}/ucvm2mesh_mpi -f ./norcal_ucvm2mesh.conf
#PMPI_Type_create_darray(448): Invalid argument array_of_psizes
srun -N8 -n512 -m block:cyclic ${BIN_DIR}/ucvm2mesh_mpi -f ./norcal_ucvm2mesh.conf
srun: error: Unable to create step for job 1871180: More processors requested than permitted

[0]   expected 1000(processes) divisible by 2000(core count)

Related Entries