Portable High Performance Linpack (HPL) 2.2 from http://www.netlib.org/benchmark/hpl/ .
'HPL' here does not seem to stand for 'Highly Pallalel Linpack' used in TOP500, etc.
Modified for NEC SX Aurora Tsubasa. Created Make.Aurora_mpi from './makes/Make.Linux_Intel64_ncc'
Note)
-
The benchmark result is not final or formal as:
- Code is just out-of-box. The source is un-modified. No special complieoption option is added.
-
Genral consideration about Linpack for SX Aurora Tsubasa:
- Linpack is designed to measure peak performance not susptained performance in real applicationss.
- SX Aurora Tsubasa is designed to extract good performance in sparce matrix calicuration
or random data access to large data set supported
by its hardware (fewer numbers of core, large vector registers, and sophisticated memory pipelie, etc)
and vector compilers.
Along with program optimization or applying heuristics, applications lose parallelizm. Their data access pattern becomes random. It is getting harder to achieving high hit rate on instruction/data cache. - In Linpack, computation is done in blas (Basic Linear Algebra Subroutines) library:
https://en.wikipedia.org/wiki/LINPACK
We may say we are just evaluating the library not vector complier.
We are using:
/opt/nec/ve/nlc/1.0.0/lib/libblas_sequential.a
The Library path is set by:
/opt/nec/ve/mpi/1.1.0/bin/necmpivars.sh
Compile and execution is done on NEC Aurora SX server with NEC MPI installed.
- 'mpincc' : Compiler
- 'mpirun' : Run the compiled executable.
NEC MPI running envirionment is set in the following shell script. Please run it before running mpi tools.
$ source /opt/nec/ve/mpi/[version]/bin/necmpivars.sh
The line is included in shell scripts for compile/run refered below.
$ sh ./compile.sh
$ cd bin/Aurora_mpi
$ sh ./run.sh
Linpack Tera Flops is shown in stdout after 5 minutes of execution.
- Add -proginf flag to 'mpincc'.
- export VE_PROGINF before 'mpirun'.
$ export VE_PROGINF=YES
Proginf shows statistic information of total program including prolog (setup) and epilog (finalize) codes.
Its result might be slower due to overheads from other codes than core loop.
Show execution time distribution of subroutines.
NOTE) Source code compiled with FTRACE Flags may run very slow.
- Add -ftrace flag to 'mpincc'.
- Run the executable with mpirun as usual, then ftrace.out.* are created in the current directly.
- Convert the ftrace.out.* with 'ftrace' command. Result is printed to stdout:
$ /opt/nec/ve/bin/ftrace –f ftrace.out.*
- 'nreadelf' : to check executable if compiled for NEC SX Aurora Tsubasa.
$ /opt/nec/ve/bin/nreadelf -h a.out
If you see 'NEC VE architecture ' in the machine line, 'a.out' is compiled for NEC SX Aurura Tsubasa.
- '/opt/nec/ve/bin/ps' : displays processes on VE (Vector Engine)s.
$ export -n VE_NODE_NUMBER; /opt/nec/ve/bin/ps -ef
Please check help for other options.
/opt/nec/ve/bin/ps --help