4. Tools
The MemSysExplorer application framework includes a collection of analysis and utility tools designed to parse, process, and visualize profiler output data. These tools enable researchers to extract insights from memory traces, compute cache behavior metrics, and manage profiling metadata.
Note
Some of the tools in this framework are developed as part of ongoing research efforts. In particular, working set size estimation and memory characterization are our active area of scope. We explore different techniques for estimating memory bandwidth while minimizing storage overhead, including sampling-based approaches and compact trace representations. For contributors interested in extending these methods, please refer to our common library (see Common Profiler Library Documentation).
4.1 Profiler Output Parsers
These tools parse binary protobuf (.pb) files generated by profilers and convert them to human-readable formats.
4.1.1 trace_parser.py
Purpose: Parse binary protobuf memory trace files (memtrace_<pid>.pb) and convert them to JSON, CSV, or summary format.
Location: tools/trace_parser.py
Usage:
# Print summary to stdout (default)
python3 tools/trace_parser.py memtrace_12345.pb
# Export as JSON
python3 tools/trace_parser.py memtrace_12345.pb --format json
# Export to CSV file
python3 tools/trace_parser.py memtrace_12345.pb --format csv --output trace.csv
# Filter by thread ID
python3 tools/trace_parser.py memtrace_12345.pb --thread 12345 --format csv
# Limit number of events
python3 tools/trace_parser.py memtrace_12345.pb --limit 1000 --format json
Command-Line Options:
Flag |
Default |
Description |
|---|---|---|
|
|
Output format: |
|
stdout |
Output file path |
|
None |
Filter events by thread ID |
|
None |
Maximum number of events to include |
|
2 |
JSON indentation level |
Output Fields:
timestamp: Event timestampthread_id: Thread ID for the memory accessaddress: Memory address (hexadecimal)mem_op: Operation type (READorWRITE)hit_miss: Cache result (HITorMISS)
4.1.2 timeseries_parser.py
Purpose: Parse binary protobuf time-series WSS metrics files (timeseries_<pid>.pb) and convert them to JSON, CSV, or summary format.
Location: tools/timeseries_parser.py
Usage:
# Print summary to stdout (default)
python3 tools/timeseries_parser.py timeseries_ls_12345.pb
# Export as JSON
python3 tools/timeseries_parser.py timeseries_ls_12345.pb --format json
# Export to CSV file
python3 tools/timeseries_parser.py timeseries_ls_12345.pb --format csv --output output.csv
# Filter by thread ID
python3 tools/timeseries_parser.py timeseries_ls_12345.pb --thread 12345 --format csv
Command-Line Options:
Flag |
Default |
Description |
|---|---|---|
|
|
Output format: |
|
stdout |
Output file path |
|
None |
Filter samples by thread ID |
|
2 |
JSON indentation level |
Output Fields (per sample):
window_number: Sampling window indexthread_id: Thread IDread_count: Number of read operations in windowwrite_count: Number of write operations in windowtotal_refs: Total memory referenceswss_exact: Exact working set size (unique addresses)wss_approx: Approximate WSS (HyperLogLog estimate)timestamp: Sample timestampread_size_histogram: Distribution of read sizes (1, 2, 4, 8, 16, 32, 64, other bytes)write_size_histogram: Distribution of write sizes
4.1.3 timeparser_plot.py
Purpose: Generate visualization plots from time-series protobuf data.
Location: tools/timeparser_plot.py
Usage:
# Display plot interactively
python3 tools/timeparser_plot.py timeseries_ls_12345.pb
# Save plot to file
python3 tools/timeparser_plot.py timeseries_ls_12345.pb --output my_plot.png
# Filter by thread ID
python3 tools/timeparser_plot.py timeseries_ls_12345.pb --thread 12345 --output thread_plot.png
Command-Line Options:
Flag |
Default |
Description |
|---|---|---|
|
None |
Output image file (if not provided, displays GUI) |
|
None |
Filter by thread ID |
Generated Plots:
The tool creates a multi-grid figure with 5 subplots:
Read Count - Total reads with size breakdown (1B, 2B, 4B, 8B, etc.)
Write Count - Total writes with size breakdown
WSS Exact - Exact working set size over time
WSS Approx - Approximate WSS (HyperLogLog) over time
WSS Absolute Error - Difference between exact and approximate WSS
4.2 Memory Analysis Tools
4.2.1 reuse_distance.py
Purpose: Calculate reuse distance for cache behavior analysis. Reuse distance is the number of unique addresses accessed between consecutive accesses to the same address. The tool is currently in legacy mode.
Location: tools/reuse_distance.py
Usage:
# Calculate reuse distance (output to reuse_<input_name>.txt)
python3 tools/reuse_distance.py trace.csv
# Specify output file
python3 tools/reuse_distance.py trace.csv --output reuse_results.txt
# Use windowed tracking (memory-efficient for large traces)
python3 tools/reuse_distance.py trace.csv --window-size 100000
Command-Line Options:
Flag |
Default |
Description |
|---|---|---|
|
|
Output file for reuse distances |
|
-1 (unlimited) |
Memory window size (-1 for unlimited, >0 for windowed tracking) |
Input Format:
Supports two trace formats:
CSV:
timestamp,addr,op,sizeLegacy:
timestamp address operation size(space-separated)
Output Format:
Each line contains an address and its list of reuse distances:
0x7fff5a3c1000: [5, 12, 3, 8]
0x7fff5a3c1008: [2, 45, 7]
4.3 Metadata Extraction Tools
These tools extract build and environment metadata for profiling context.
4.3.1 environment_capture
Purpose: Capture environment variables and system information for profiling context. Provides both a C library and Python wrapper.
File Locations:
Component |
Path |
|---|---|
C Header |
|
C Source |
|
Python Wrapper |
|
C Library API
Data Structure:
typedef struct {
char *hostname; // Machine hostname
char *os_name; // Operating system name
char *os_version; // OS version string
char *architecture; // CPU architecture (e.g., "x86_64")
char *working_directory; // Current working directory
char **env_names; // Array of environment variable names
char **env_values; // Array of environment variable values
size_t env_count; // Number of captured variables
} system_environment_t;
Functions:
Function |
Description |
|---|---|
|
Create and populate a system environment structure. Returns |
|
Free the environment structure and all allocated memory. |
|
Get a specific environment variable value. Returns |
|
Print environment information to stdout (for debugging). |
|
Get current timestamp in nanoseconds. |
|
Get current process ID. |
C Usage Example:
#include "environment_capture.h"
#include <stdio.h>
int main() {
system_environment_t* env = environment_capture_create();
if (env == NULL) {
fprintf(stderr, "Failed to capture environment\n");
return 1;
}
printf("Hostname: %s\n", env->hostname);
printf("OS: %s %s\n", env->os_name, env->os_version);
printf("Architecture: %s\n", env->architecture);
const char* user = environment_capture_get_var(env, "USER");
if (user) {
printf("User: %s\n", user);
}
environment_capture_destroy(env);
return 0;
}
Python Wrapper API
Properties:
Property |
Description |
|---|---|
|
System hostname |
|
Operating system name (e.g., “Linux”) |
|
OS version/kernel release |
|
CPU architecture (e.g., “x86_64”) |
|
Current working directory at capture time |
|
Current process ID |
|
Capture timestamp in nanoseconds |
Methods:
Method |
Description |
|---|---|
|
Get a single environment variable value. Returns |
|
Get dictionary of all environment variables. |
|
Convert entire capture to a dictionary (for JSON serialization). |
Python Usage Example:
from tools.environment_capture import EnvironmentCapture
import json
env = EnvironmentCapture()
# Access system properties
print(f"Hostname: {env.hostname}")
print(f"OS: {env.os_name} {env.os_version}")
print(f"Architecture: {env.architecture}")
print(f"Working Directory: {env.working_directory}")
# Get specific environment variables
user = env.get_variable('USER')
home = env.get_variable('HOME')
# Get all environment variables
all_vars = env.get_all_variables()
# Export to dictionary (for JSON)
metadata = env.to_dict()
with open('environment_metadata.json', 'w') as f:
json.dump(metadata, f, indent=2)
Integration with BaseMetadata
The EnvironmentCapture class integrates with BaseMetadata.py to provide unified metadata collection across all profilers.
Fields populated by EnvironmentCapture:
Metadata Field |
Source |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
4.3.2 makefile_parser.py
Purpose: Extract build metadata from Makefiles, including variables, targets, dependencies, and compiler settings.
Location: tools/makefile_parser.py
Python Usage:
from tools.makefile_parser import MakefileParser, get_profiler_build_metadata
# Parse a single Makefile
parser = MakefileParser()
metadata = parser.parse_makefile('/path/to/Makefile')
print(f"Variables: {metadata['variables']}")
print(f"Targets: {metadata['targets']}")
print(f"Compiler Info: {metadata['compiler_info']}")
print(f"Version Info: {metadata['version_info']}")
# Parse all profiler Makefiles
profiler_metadata = get_profiler_build_metadata('/path/to/profilers')
Extracted Information:
variables: Makefile variable assignmentstargets: Build targets and their dependenciesphony_targets: List of .PHONY targetsincludes: Included Makefile pathsversion_info: Version-related variablescompiler_info: Compiler settings (CC, CXX, etc.)paths: Path-related variables
4.3.3 profiler_flag_parser.py
Purpose: Extract profiler-specific command-line flags from profiler source code.
Location: tools/profiler_flag_parser.py
Python Usage:
from tools.profiler_flag_parser import ProfilerFlagParser, extract_all_flags
# Parse a single profiler
parser = ProfilerFlagParser()
flags = parser.extract_flags('/path/to/profiler_dir')
print(f"Command Flags: {flags['command_flags']}")
print(f"Base Commands: {flags['base_commands']}")
print(f"Environment Variables: {flags['environment_variables']}")
# Extract flags from all profilers
all_flags = extract_all_flags('/path/to/profilers')
Extracted Information:
command_flags: Profiler-specific CLI flagsbase_commands: Base command executables (e.g.,drrun,ncu,perf)configuration_options: Configuration settingsenvironment_variables: Required environment variables
4.4 Quick Reference
Tool |
Input |
Output |
Use Case |
|---|---|---|---|
|
|
JSON, CSV, summary |
Parse memory traces |
|
|
JSON, CSV, summary |
Parse WSS time-series |
|
|
PNG image |
Visualize WSS trends |
|
CSV or legacy trace |
Text file |
Cache behavior analysis |
|
System state |
Python dict |
Capture system metadata |
|
Makefile |
Python dict |
Extract build config |
|
Profiler source |
Python dict |
Extract CLI flags |
4.5 Common Workflows
4.5.1 Analyze Memory Behavior from Protobuf Output
This workflow demonstrates how to analyze memory behavior from DynamoRIO profiler output.
Step 1: Generate protobuf files from DynamoRIO
Run your workload with the DynamoRIO profiler to generate .pb output files.
Step 2: Parse time-series data for WSS trends
# Get summary statistics
python3 tools/timeseries_parser.py output/timeseries_myworkload_12345.pb
# Export to CSV for further analysis
python3 tools/timeseries_parser.py output/timeseries_myworkload_12345.pb \
--format csv --output wss_data.csv
Step 3: Visualize WSS over time
python3 tools/timeparser_plot.py output/timeseries_myworkload_12345.pb \
--output wss_plot.png
Step 4: Parse memory trace for detailed analysis
# Get trace summary
python3 tools/trace_parser.py output/memtrace_12345.pb
# Export first 10000 events to CSV
python3 tools/trace_parser.py output/memtrace_12345.pb \
--format csv --limit 10000 --output trace_sample.csv
Step 5: Calculate reuse distance (optional)
python3 tools/reuse_distance.py trace_sample.csv --output reuse_analysis.txt
4.5.2 Generate Complete Metadata
This workflow shows how to combine metadata extraction tools.
from tools.environment_capture import EnvironmentCapture
from tools.makefile_parser import get_single_profiler_build_metadata
from tools.profiler_flag_parser import ProfilerFlagParser
import json
# Capture environment
env = EnvironmentCapture()
# Get build metadata
build_meta = get_single_profiler_build_metadata('profilers/dynamorio')
# Get profiler flags
parser = ProfilerFlagParser()
flags = parser.extract_flags('profilers/dynamorio')
# Combine into complete metadata
complete_metadata = {
'environment': env.to_dict(),
'build': build_meta,
'profiler_flags': flags
}
# Save to JSON
with open('complete_metadata.json', 'w') as f:
json.dump(complete_metadata, f, indent=2)
4.6 Installation Notes
Protobuf Parsers
The protobuf parsers (trace_parser.py and timeseries_parser.py) require the generated Python protobuf modules. These are auto-generated when building the common library:
cd profilers/common
make
This generates memory_trace_pb2.py and timeseries_metrics_pb2.py in the profilers/common/proto/ directory.
Visualization Tool
The timeparser_plot.py tool requires matplotlib:
pip3 install matplotlib
Metadata Tools
The metadata extraction tools (environment_capture.py, makefile_parser.py, profiler_flag_parser.py) have no additional dependencies beyond Python’s standard library.
For issues or feature requests, please refer to the project’s GitHub repository.