-
Notifications
You must be signed in to change notification settings - Fork 229
LIKWID and SLURM
When running jobs on a HPC system, it is nowadays commonly managed through the job scheduler SLURM. If you want access to the performance monitoring features of your system, some cluster-specific flags might be required when submitting a job. When a job is started, it is commonly restricted to the requested resources which might interfere with the execution of LIKWID (and other tools). This page contains some helpful hints for users as well as configuration ideas for administrators.
Check with your compute center if they are running some sort of job-specific monitoring that might interfere with your reading hardware-performance counters. Usually, there will be the possibility to disable the job-specific monitoring for individual jobs with additional parameters during job submission.
To avoid possible security and privacy concerns it is advisable to set the paranoid value (see likwid-perfctr) to 0 if a compute job has allocated a compute node exclusively. An exclusive usage also avoids contention issues in local shared resources during benchmarking.
A suitable way for HPC clusters with Slurm is to configure a prolog that detects if a job is running exclusively on a node and then sets /proc/sys/kernel/perf_event_paranoid to 0. Correspondingly, an epilog is needed that sets it back to the default value of 2.
Reading of GPU performance counter by non-admin users is often disabled on HPC clusters due to security concerns. Trying to read them you will get a message referring you to the Nvidia-documentation about ERR_NVGPUCTRPERM.
A suitable way for HPC clusters with Slurm is to configure a prolog that detects if a job is running exclusively on a node and then
- stops all systemd services accessing the GPU devices, for example nvidia-persistenced and nvidia-persistenced,
- then unloads all relevant nvidia kernel modules, for example
modprobe -r nvidia_uvm nvidia_drm nvidia_modeset nv_peer_mem nvidia
, - then reloads the nvidia kernel module with the current parameter,
modprobe nvidia NVreg_RestrictProfilingToAdminUsers=0
- and finally starts the services again.
A corresponding epilog also needs to be created where modprobe nvidia NVreg_RestrictProfilingToAdminUsers=1
is used instead.
Be warned that such a prolog and epilog increase the job start/end duration because especially the restart of the nvidia systemd-services can take some time, likely up to one minute. A workaround would be to add a SPANK plugin that makes enabling the access to performance counters optional via a job submission parameter.
Integrating hardware performance counting into SLURM is a two-step process. If there is some system monitoring in place that also requires hardware performance counter access, you can allow it for distinct jobs as long as they are using the nodes exclusively. We at NHR@FAU use the SLURM contraint hwperf
for jobs that require access and otherwise let the system monitoring get the counts. At first, at job submission, a SLURM submit filter checks whether the nodes are requested exclusively in case of hwperf
. Afterwards in the prologue/epilogue scripts, we set the permissions and ownerships to allow LIKWID and other perf_event-based tools.
Accessing hardware performance counters should only be allowed in user-exclusive environments due to security reasons.
-- all jobs with constraint hwperf need to allocate node exclusively
for feature in string.gmatch(job_desc.features or "", "[^,]*") do
if ( feature == "hwperf" and job_desc.shared ~= 0 ) then
slurm.log_info("slurm_job_submit: job from uid %u with constraint hwperf but not exclusive", job_desc.user_id )
slurm.user_msg("--constraint=hwperf only available for node-exclusive jobs with --exclusive")
return 2029 --- slurm.ERROR ESLURM_INVALID_FEATURE
end
end
Here is a combined version for CPU profiling. For Nvidia GPU profiling, you have to add the relevant calls. It is helpful to check with a SLURM submit script that only jobs with exclusive node usage can set the hwperf
flag used here:
if [[ "$SLURM_JOB_CONSTRAINTS" =~ "hwperf" ]] ; then
chown $SLURM_JOB_USER /var/run/likwid.lock
# Also grant permission to use performance counters via perf interface (e.g. with vtune)
echo 0 > /proc/sys/kernel/perf_event_paranoid
fi
if [[ "$SLURM_JOB_CONSTRAINTS" =~ "hwperf" ]] ; then
chown $MONITORING_USER /var/run/likwid.lock
# Also disable permission to use performance counters via perf interface (e.g. with vtune)
echo 2 > /proc/sys/kernel/perf_event_paranoid
fi
You could also use the latest paranoid setting of 4 but it is only provided by some Linux distributions yet.
-
Applications
-
Config files
-
Daemons
-
Architectures
- Available counter options
- AMD
- Intel
- Intel Atom
- Intel Pentium M
- Intel Core2
- Intel Nehalem
- Intel NehalemEX
- Intel Westmere
- Intel WestmereEX
- Intel Xeon Phi (KNC)
- Intel Silvermont & Airmont
- Intel Goldmont
- Intel SandyBridge
- Intel SandyBridge EP/EN
- Intel IvyBridge
- Intel IvyBridge EP/EN/EX
- Intel Haswell
- Intel Haswell EP/EN/EX
- Intel Broadwell
- Intel Broadwell D
- Intel Broadwell EP
- Intel Skylake
- Intel Coffeelake
- Intel Kabylake
- Intel Xeon Phi (KNL)
- Intel Skylake X
- Intel Cascadelake SP/AP
- Intel Tigerlake
- Intel Icelake
- Intel Icelake X
- Intel SappireRapids
- Intel GraniteRapids
- Intel SierraForrest
- ARM
- POWER
-
Tutorials
-
Miscellaneous
-
Contributing