4. Tools

The MemSysExplorer application framework includes a collection of analysis and utility tools designed to parse, process, and visualize profiler output data. These tools enable researchers to extract insights from memory traces, compute cache behavior metrics, and manage profiling metadata.

Note

Some of the tools in this framework are developed as part of ongoing research efforts. In particular, working set size estimation and memory characterization are our active area of scope. We explore different techniques for estimating memory bandwidth while minimizing storage overhead, including sampling-based approaches and compact trace representations. For contributors interested in extending these methods, please refer to our common library (see Common Profiler Library Documentation).

4.1 Profiler Output Parsers

These tools parse binary protobuf (.pb) files generated by profilers and convert them to human-readable formats.

4.1.1 trace_parser.py

Purpose: Parse binary protobuf memory trace files (memtrace_<pid>.pb) and convert them to JSON, CSV, or summary format.

Location: tools/trace_parser.py

Usage:

# Print summary to stdout (default)
python3 tools/trace_parser.py memtrace_12345.pb

# Export as JSON
python3 tools/trace_parser.py memtrace_12345.pb --format json

# Export to CSV file
python3 tools/trace_parser.py memtrace_12345.pb --format csv --output trace.csv

# Filter by thread ID
python3 tools/trace_parser.py memtrace_12345.pb --thread 12345 --format csv

# Limit number of events
python3 tools/trace_parser.py memtrace_12345.pb --limit 1000 --format json

Command-Line Options:

Flag	Default	Description
`--format`	`summary`	Output format: `json`, `csv`, or `summary`
`--output`, `-o`	stdout	Output file path
`--thread`	None	Filter events by thread ID
`--limit`	None	Maximum number of events to include
`--indent`	2	JSON indentation level

Output Fields:

timestamp: Event timestamp
thread_id: Thread ID for the memory access
address: Memory address (hexadecimal)
mem_op: Operation type (READ or WRITE)
hit_miss: Cache result (HIT or MISS)

4.1.2 timeseries_parser.py

Purpose: Parse binary protobuf time-series WSS metrics files (timeseries_<pid>.pb) and convert them to JSON, CSV, or summary format.

Location: tools/timeseries_parser.py

Usage:

# Print summary to stdout (default)
python3 tools/timeseries_parser.py timeseries_ls_12345.pb

# Export as JSON
python3 tools/timeseries_parser.py timeseries_ls_12345.pb --format json

# Export to CSV file
python3 tools/timeseries_parser.py timeseries_ls_12345.pb --format csv --output output.csv

# Filter by thread ID
python3 tools/timeseries_parser.py timeseries_ls_12345.pb --thread 12345 --format csv

Command-Line Options:

Flag	Default	Description
`--format`	`summary`	Output format: `json`, `csv`, or `summary`
`--output`, `-o`	stdout	Output file path
`--thread`	None	Filter samples by thread ID
`--indent`	2	JSON indentation level

Output Fields (per sample):

window_number: Sampling window index
thread_id: Thread ID
read_count: Number of read operations in window
write_count: Number of write operations in window
total_refs: Total memory references
wss_exact: Exact working set size (unique addresses)
wss_approx: Approximate WSS (HyperLogLog estimate)
timestamp: Sample timestamp
read_size_histogram: Distribution of read sizes (1, 2, 4, 8, 16, 32, 64, other bytes)
write_size_histogram: Distribution of write sizes

4.1.3 timeparser_plot.py

Purpose: Generate visualization plots from time-series protobuf data.

Location: tools/timeparser_plot.py

Usage:

# Display plot interactively
python3 tools/timeparser_plot.py timeseries_ls_12345.pb

# Save plot to file
python3 tools/timeparser_plot.py timeseries_ls_12345.pb --output my_plot.png

# Filter by thread ID
python3 tools/timeparser_plot.py timeseries_ls_12345.pb --thread 12345 --output thread_plot.png

Command-Line Options:

Flag	Default	Description
`--output`, `-o`	None	Output image file (if not provided, displays GUI)
`--thread`	None	Filter by thread ID

Generated Plots:

The tool creates a multi-grid figure with 5 subplots:

Read Count - Total reads with size breakdown (1B, 2B, 4B, 8B, etc.)
Write Count - Total writes with size breakdown
WSS Exact - Exact working set size over time
WSS Approx - Approximate WSS (HyperLogLog) over time
WSS Absolute Error - Difference between exact and approximate WSS

4.2 Memory Analysis Tools

4.2.1 reuse_distance.py

Purpose: Calculate reuse distance for cache behavior analysis. Reuse distance is the number of unique addresses accessed between consecutive accesses to the same address. The tool is currently in legacy mode.

Location: tools/reuse_distance.py

Usage:

# Calculate reuse distance (output to reuse_<input_name>.txt)
python3 tools/reuse_distance.py trace.csv

# Specify output file
python3 tools/reuse_distance.py trace.csv --output reuse_results.txt

# Use windowed tracking (memory-efficient for large traces)
python3 tools/reuse_distance.py trace.csv --window-size 100000

Command-Line Options:

Flag	Default	Description
`--output`, `-o`	`reuse_<input>.txt`	Output file for reuse distances
`--window-size`	-1 (unlimited)	Memory window size (-1 for unlimited, >0 for windowed tracking)

Input Format:

Supports two trace formats:

CSV: timestamp,addr,op,size
Legacy: timestamp address operation size (space-separated)

Output Format:

Each line contains an address and its list of reuse distances:

0x7fff5a3c1000: [5, 12, 3, 8]
0x7fff5a3c1008: [2, 45, 7]

4.3 Metadata Extraction Tools

These tools extract build and environment metadata for profiling context.

4.3.1 environment_capture

Purpose: Capture environment variables and system information for profiling context. Provides both a C library and Python wrapper.

File Locations:

Component	Path
C Header	`profilers/common/include/environment_capture.h`
C Source	`profilers/common/src/environment_capture.c`
Python Wrapper	`tools/environment_capture.py`

C Library API

Data Structure:

typedef struct {
    char *hostname;           // Machine hostname
    char *os_name;            // Operating system name
    char *os_version;         // OS version string
    char *architecture;       // CPU architecture (e.g., "x86_64")
    char *working_directory;  // Current working directory

    char **env_names;         // Array of environment variable names
    char **env_values;        // Array of environment variable values
    size_t env_count;         // Number of captured variables
} system_environment_t;

Functions:

Function	Description
`environment_capture_create()`	Create and populate a system environment structure. Returns `NULL` on failure.
`environment_capture_destroy(env)`	Free the environment structure and all allocated memory.
`environment_capture_get_var(env, name)`	Get a specific environment variable value. Returns `NULL` if not found.
`environment_capture_print(env)`	Print environment information to stdout (for debugging).
`environment_capture_timestamp_ns()`	Get current timestamp in nanoseconds.
`environment_capture_process_id()`	Get current process ID.

C Usage Example:

#include "environment_capture.h"
#include <stdio.h>

int main() {
    system_environment_t* env = environment_capture_create();
    if (env == NULL) {
        fprintf(stderr, "Failed to capture environment\n");
        return 1;
    }

    printf("Hostname: %s\n", env->hostname);
    printf("OS: %s %s\n", env->os_name, env->os_version);
    printf("Architecture: %s\n", env->architecture);

    const char* user = environment_capture_get_var(env, "USER");
    if (user) {
        printf("User: %s\n", user);
    }

    environment_capture_destroy(env);
    return 0;
}

Python Wrapper API

Properties:

Property	Description
`hostname`	System hostname
`os_name`	Operating system name (e.g., “Linux”)
`os_version`	OS version/kernel release
`architecture`	CPU architecture (e.g., “x86_64”)
`working_directory`	Current working directory at capture time
`process_id`	Current process ID
`timestamp_ns`	Capture timestamp in nanoseconds

Methods:

Method	Description
`get_variable(name)`	Get a single environment variable value. Returns `None` if not found.
`get_all_variables()`	Get dictionary of all environment variables.
`to_dict()`	Convert entire capture to a dictionary (for JSON serialization).

Python Usage Example:

from tools.environment_capture import EnvironmentCapture
import json

env = EnvironmentCapture()

# Access system properties
print(f"Hostname: {env.hostname}")
print(f"OS: {env.os_name} {env.os_version}")
print(f"Architecture: {env.architecture}")
print(f"Working Directory: {env.working_directory}")

# Get specific environment variables
user = env.get_variable('USER')
home = env.get_variable('HOME')

# Get all environment variables
all_vars = env.get_all_variables()

# Export to dictionary (for JSON)
metadata = env.to_dict()
with open('environment_metadata.json', 'w') as f:
    json.dump(metadata, f, indent=2)

Integration with BaseMetadata

The EnvironmentCapture class integrates with BaseMetadata.py to provide unified metadata collection across all profilers.

Fields populated by EnvironmentCapture:

Metadata Field	Source
`hostname`	`env.hostname`
`os_name`	`env.os_name`
`os_version`	`env.os_version`
`architecture`	`env.architecture`
`working_directory`	`env.working_directory`
`capture_timestamp`	`env.timestamp_ns`

4.3.2 makefile_parser.py

Purpose: Extract build metadata from Makefiles, including variables, targets, dependencies, and compiler settings.

Location: tools/makefile_parser.py

Python Usage:

from tools.makefile_parser import MakefileParser, get_profiler_build_metadata

# Parse a single Makefile
parser = MakefileParser()
metadata = parser.parse_makefile('/path/to/Makefile')

print(f"Variables: {metadata['variables']}")
print(f"Targets: {metadata['targets']}")
print(f"Compiler Info: {metadata['compiler_info']}")
print(f"Version Info: {metadata['version_info']}")

# Parse all profiler Makefiles
profiler_metadata = get_profiler_build_metadata('/path/to/profilers')

Extracted Information:

variables: Makefile variable assignments
targets: Build targets and their dependencies
phony_targets: List of .PHONY targets
includes: Included Makefile paths
version_info: Version-related variables
compiler_info: Compiler settings (CC, CXX, etc.)
paths: Path-related variables

4.3.3 profiler_flag_parser.py

Purpose: Extract profiler-specific command-line flags from profiler source code.

Location: tools/profiler_flag_parser.py

Python Usage:

from tools.profiler_flag_parser import ProfilerFlagParser, extract_all_flags

# Parse a single profiler
parser = ProfilerFlagParser()
flags = parser.extract_flags('/path/to/profiler_dir')

print(f"Command Flags: {flags['command_flags']}")
print(f"Base Commands: {flags['base_commands']}")
print(f"Environment Variables: {flags['environment_variables']}")

# Extract flags from all profilers
all_flags = extract_all_flags('/path/to/profilers')

Extracted Information:

command_flags: Profiler-specific CLI flags
base_commands: Base command executables (e.g., drrun, ncu, perf)
configuration_options: Configuration settings
environment_variables: Required environment variables

4.4 Quick Reference

Tool	Input	Output	Use Case
`trace_parser.py`	`.pb` (protobuf)	JSON, CSV, summary	Parse memory traces
`timeseries_parser.py`	`.pb` (protobuf)	JSON, CSV, summary	Parse WSS time-series
`timeparser_plot.py`	`.pb` (protobuf)	PNG image	Visualize WSS trends
`reuse_distance.py`	CSV or legacy trace	Text file	Cache behavior analysis
`environment_capture.py`	System state	Python dict	Capture system metadata
`makefile_parser.py`	Makefile	Python dict	Extract build config
`profiler_flag_parser.py`	Profiler source	Python dict	Extract CLI flags

4.5 Common Workflows

4.5.1 Analyze Memory Behavior from Protobuf Output

This workflow demonstrates how to analyze memory behavior from DynamoRIO profiler output.

Step 1: Generate protobuf files from DynamoRIO

Run your workload with the DynamoRIO profiler to generate .pb output files.

Step 2: Parse time-series data for WSS trends

# Get summary statistics
python3 tools/timeseries_parser.py output/timeseries_myworkload_12345.pb

# Export to CSV for further analysis
python3 tools/timeseries_parser.py output/timeseries_myworkload_12345.pb \
    --format csv --output wss_data.csv

Step 3: Visualize WSS over time

python3 tools/timeparser_plot.py output/timeseries_myworkload_12345.pb \
    --output wss_plot.png

Step 4: Parse memory trace for detailed analysis

# Get trace summary
python3 tools/trace_parser.py output/memtrace_12345.pb

# Export first 10000 events to CSV
python3 tools/trace_parser.py output/memtrace_12345.pb \
    --format csv --limit 10000 --output trace_sample.csv

Step 5: Calculate reuse distance (optional)

python3 tools/reuse_distance.py trace_sample.csv --output reuse_analysis.txt

4.5.2 Generate Complete Metadata

This workflow shows how to combine metadata extraction tools.

from tools.environment_capture import EnvironmentCapture
from tools.makefile_parser import get_single_profiler_build_metadata
from tools.profiler_flag_parser import ProfilerFlagParser
import json

# Capture environment
env = EnvironmentCapture()

# Get build metadata
build_meta = get_single_profiler_build_metadata('profilers/dynamorio')

# Get profiler flags
parser = ProfilerFlagParser()
flags = parser.extract_flags('profilers/dynamorio')

# Combine into complete metadata
complete_metadata = {
    'environment': env.to_dict(),
    'build': build_meta,
    'profiler_flags': flags
}

# Save to JSON
with open('complete_metadata.json', 'w') as f:
    json.dump(complete_metadata, f, indent=2)

4.6 Installation Notes

Protobuf Parsers

The protobuf parsers (trace_parser.py and timeseries_parser.py) require the generated Python protobuf modules. These are auto-generated when building the common library:

cd profilers/common
make

This generates memory_trace_pb2.py and timeseries_metrics_pb2.py in the profilers/common/proto/ directory.

Visualization Tool

The timeparser_plot.py tool requires matplotlib:

pip3 install matplotlib

Metadata Tools

The metadata extraction tools (environment_capture.py, makefile_parser.py, profiler_flag_parser.py) have no additional dependencies beyond Python’s standard library.

For issues or feature requests, please refer to the project’s GitHub repository.