Perf Documentation
Perfs leverage Hardware Performance Counters (HPC) to collect low-level hardware metrics from CPUs. These profiler provide valuable insights into memory access patterns, kernel performance, and overall system efficiency.
The profiling workflow in MemSysExplorer consists of two core actions, as provided by the main interface:
Profiling (`profiling`) – Captures runtime execution metrics by specifying the required executable.
Metric Extraction (`extract_metrics`) – Analyzes generated reports to extract memory and performance-related metrics.
When using the both action, profiling and metric extraction are performed sequentially.
Important
MemSysExplorer GitHub Repository
Refer to the codebase for the latest update: https://github.com/duca181/MemSysExplorer/tree/apps_dev/apps/profilers/perf
To learn more about license terms and third-party attribution, refer to the 6. Licensing and Attribution page.
Required Arguments
To execute Nsight Profilers, specific arguments are required based on the chosen action. The necessary arguments are defined in the code as follows: Perf can analyze three lavels of memory: l1, l2, and l3. The level of memory to be analyzed is specified using the level argument.
@classmethod
def required_profiling_args(cls):
"""
Return required arguments for the profiling method.
"""
return ["executable", "level"]
@classmethod
def required_extract_args(cls, action):
"""
Return required arguments for the extract_metrics method.
"""
if action == "extract_metrics":
return ["report_file"]
else:
return []
Example Usage
Below are three examples of how to execute the profiling tool with different actions:
Profiling the application:
python main.py --profiler perf --action profiling --level l1 --executable ./executable
Extracting metrics from an existing report:
python main.py --profiler perf --action extract_metrics --level l1 --report_file ./report_file.ncu-rep
Performing both profiling and metric extraction:
python main.py --profiler perf --action both --level l1 --executable ./executable
Note
Perf is orgninally designed to work on Intel CPUs, so come metrics might be unavailable on other architectures.
Sample Output
This profiler generates output traces that follow the standardized format defined by the MemSysExplorer Application Interface.
Troubleshooting
If you encounter issues while running or extracting metrics using the Perf profiler, consider the following checks:
Ensure `perf` is installed and available in your environment.
You can verify this by running:
which perf perf --version
Check whether hardware performance counters are accessible.
On many Linux systems, user access to counters is restricted by default. You may need to reduce the kernel’s perf event restriction level:
sudo sh -c 'echo -1 > /proc/sys/kernel/perf_event_paranoid'
Alternatively, configure access with:
sudo sysctl -w kernel.perf_event_paranoid=-1
Ensure you are running on a supported architecture. MemSysExplorer’s perf integration is designed and tested primarily on Intel CPUs. Some counters may be missing or unsupported on AMD, ARM, or virtualized environments.
Check for compatibility with your Linux kernel version and `perf` version.
MemSysExplorer assumes compatibility with perf versions is above 6.x. Run:
uname -r perf --version
to check kernel and perf versions.
If the profiler fails silently or skips metrics, it’s likely due to unsupported or inaccessible counters. Consider testing a different memory level (–level l1, l2, or l3) or switching to another compatible platform.