-
Notifications
You must be signed in to change notification settings - Fork 229
AddARMSupport
The ARM support in LIKWID is based on the perf_event Linux interface. This required less work than for x86 chips were also a non-perf_event backend is required. If your system does not provide any perf_event units (ls /sys/bus/event_source/devices/
), you cannot use LIKWID on those machines.
At first, LIKWID requires some IDs from the hardware to identify the platform. Although the hwloc library gathers most information, LIKWID additionally reads /proc/cpuinfo
and reads the following fields:
-
CPU architecture
: Probably7
for ARMv7 or8
for ARMv8 -
CPU implementer
: Differentiate the chips by implementors like Marvell, NVIDIA, ... -
CPU part
: Differentiate the chips of a vendor -
CPU variant
: Rarely used but sometimes reflects number of cores, size of L3 cache or similar -
CPU revision
: Just for completeness
When you have these IDs at hand (in hexadecimal format), you add them to src/includes/topology.h
. Here is a snippet of this file:
#define ARMV7_FAMILY 0x7U
#define ARMV8_FAMILY 0x8U
/* ARM vendors */
#define DEFAULT_ARM 0x41U
#define NVIDIA_ARM 0x4EU
[...]
/* ARM */
#define ARM_CORTEX_A57 0xD07U
#define CAV_THUNDERX 0x0A0U
[...]
The IDs in FAMILY
correspond to CPU architecture
, the ARM vendors
IDs to CPU implementer
and ARM
IDs to CPU part
. Please use reasonable abbrevations.
With this information, the chip can be identified but LIKWID adds some more data for the user like chip architecture description and a short name. The description is only used for output but the short name is used later in the performance monitoring part. Both information has to be added to src/topology.c
. Snippet:
[...]
static char* cavium_thunderx_str = "Cavium Thunder X (ARMv8)";
static char* arm_cortex_a57 = "ARM Cortex A57 (ARMv8)";
static char* arm_cortex_a53 = "ARM Cortex A53 (ARMv8)";
[...]
static char* short_arm8 = "arm8";
static char* short_arm8_cav_tx2 = "arm8_tx2";
static char* short_arm8_cav_tx = "arm8_tx";
[...]
So add a nice string here and a short name. If the vendor publishes a short name for the chip, please use them. Intel provides long and short names like Intel Cascadelake X
and CLX
. I failed at using them, so please, be better than me ;)
The file src/topology.c
contains a function topology_setName()
which contains a set of switch-case statements based on the IDs we have added to src/includes/topology.h
before. Search for ARMV7
or ARMV8
because the function is quite long. There you add the description and short name for the new chip. Here is a snippet of it:
switch ( cpuid_info.family )
{
case ARMV8_FAMILY:
switch (cpuid_info.vendor)
{
case DEFAULT_ARM:
switch (cpuid_info.part)
{
case ARM_CORTEX_A57:
cpuid_info.name = arm_cortex_a57;
cpuid_info.short_name = short_arm8;
break;
[...]
}
break;
[...]
}
break;
[...]
}
With these settings, you should be able to run likwid-topology
and get proper output (except cache information). Currently, there is no interface where cache information can be gathered from on ARM platforms.
In order to get performance monitoring support for your chip, three new files are required:
-
perfmon_<name>.h
: Main header for the chip -
perfmon_<name>_counters.h
: Counter/register definitions -
perfmon_<name>_events.txt
: Event definitions
No underscores or similar allowed in name but please name them reasonably to find it again later.
The first file should be perfmon_<name>_counters.h
. It commonly consists of 3 tables: list of counters, list of units (set of counters using the same device) and a table which maps LIKWID types to the perf_event units. The first entry in the following tables is the template, the second line an example:
- List of counters:
#define NUM_COUNTERS_<UPPERCASE_NAME> X
static RegisterMap <name>_counter_map[NUM_COUNTERS_<UPPERCASE_NAME>] = {
{COUNTERNAME, UNIQUE_ID, UNIT, CONFIG_REG, COUNTER_REG1, COUNTER_REG2, DEVICE_ID, OPTION_MASK, }
{"PMC0", PMC0, PMC, 0x0, 0x0, 0, 0, 0x0}, // for ARM only the COUNTERNAME, UNIQUE_ID and UNIT are of interest
};
- List of units (Units are defined in
src/includes/register_types.h
but adding new types it not recommended):
static BoxMap <name>_box_map[NUM_UNITS] = {
[UNITNAME] = {CONTROL_REG, STATUS_REG, CLEAR_REG, STATUS_REG_OFFSET, IS_PCI, DEVICE_ID, COUNTER_WIDTH}
[PMC] = {0, 0, 0, 0, 0, 0, 48}, // for ARM only the COUNTER_WIDTH is of interest
};
- Translation map:
static char* <name>_translate_types[NUM_UNITS] = {
[UNITNAME] = "path_to_perf_event_directory_containing_the_'type'_file_and_'format'_folder",
[PMC] = "/sys/bus/event_source/devices/cpu",
};
The most tedious work when adding a new chip is typing down/copying/parsing the list of supported events. But you are lucky, you want to add an ARM chip and these chips provide a common set of events. The list of events is a plain text file and transformed into a header during compilation.
The format for the events is fixed:
EVENT_<EVENTNAME> <EVENT_ID> <USABLE_COUNTERS>
UMASK_<EVENTNAME> <UMASK>
An example for ARM platforms for the event INST_RETIRED
:
EVENT_INST_RETIRED 0x08 PMC
UMASK_INST_RETIRED 0x00
The <USABLE_COUNTERS>
is compared to the counter names and only the beginning has to match, so PMC
matches for PMC0
, PMC1
, ... It depends how you named the counters in perfmon_<name>_counters.h
's list of counters.
This file is quite simple for ARM platforms as it contains only four lines:
#include <perfmon_<name>_events.h>
#include <perfmon_<name>_counters.h>
static int perfmon_numCounters<UPPERCASE_NAME> = NUM_COUNTERS_<UPPERCASE_NAME>;
static int perfmon_numArchEvents<UPPERCASE_NAME> = NUM_ARCH_EVENTS_<UPPERCASE_NAME>;
The performance module is defined in src/perfmon.c
.
At first, add the main header: #include <perfmon_<name>.h>
. The next step is comparable to the topology_setName()
function. The name of the function is perfmon_init_maps()
and also contains a set of nested switch-case statements. Search for ARMV7
or ARMV8
as the function is quite long. Here we register for a CPU family, vendor and part the lists/tables we have defined before in perfmon_<name>_counters.h
and perfmon_<name>_counters.h
.
Here is a snippet:
switch ( cpuid_info.family )
{
[...]
case ARMV8_FAMILY:
switch ( cpuid_info.vendor)
{
case DEFAULT_ARM:
switch (cpuid_info.part)
{
case ARM_CORTEX_A57: // the define in src/includes/topology.h
eventHash = a57_arch_events; // <name>_arch_events generated at compilation
perfmon_numArchEvents = perfmon_numArchEventsA57; // defined by you in perfmon_<name>.h
perfmon_numCounters = perfmon_numCountersA57; // defined by you in perfmon_<name>.h
counter_map = a57_counter_map; // <name>_counter_map defined in perfmon_<name>_counters.h
box_map = a57_box_map; // <name>_box_map defined in perfmon_<name>_counters.h
translate_types = a57_translate_types; // <name>_translate_types defined in perfmon_<name>_counters.h
break;
[...]
}
[...]
}
[...]
}
As a final step, add your chip to the print_supportedCPUs()
function in src/topology.c
Add the new chip to README.md
in section https://github.com/RRZE-HPC/likwid/blob/master/README.md#supported-architectures
The LIKWID wiki contains one page per supported archictecture with tables of available counters, restrictions and further information. Unfortunately, I had to use HTML tables instead of Markdown tables. Copy one already existing ARM architecture file to get the structure and add all information.
-
Applications
-
Config files
-
Daemons
-
Architectures
- Available counter options
- AMD
- Intel
- Intel Atom
- Intel Pentium M
- Intel Core2
- Intel Nehalem
- Intel NehalemEX
- Intel Westmere
- Intel WestmereEX
- Intel Xeon Phi (KNC)
- Intel Silvermont & Airmont
- Intel Goldmont
- Intel SandyBridge
- Intel SandyBridge EP/EN
- Intel IvyBridge
- Intel IvyBridge EP/EN/EX
- Intel Haswell
- Intel Haswell EP/EN/EX
- Intel Broadwell
- Intel Broadwell D
- Intel Broadwell EP
- Intel Skylake
- Intel Coffeelake
- Intel Kabylake
- Intel Xeon Phi (KNL)
- Intel Skylake X
- Intel Cascadelake SP/AP
- Intel Tigerlake
- Intel Icelake
- Intel Icelake X
- Intel SappireRapids
- Intel GraniteRapids
- Intel SierraForrest
- ARM
- POWER
-
Tutorials
-
Miscellaneous
-
Contributing