4. Tools

The MemSysExplorer application framework includes a collection of analysis and utility tools designed to parse, process, and visualize profiler output data. These tools enable researchers to extract insights from memory traces, compute cache behavior metrics, and manage profiling metadata.

Note

Some of the tools in this framework are developed as part of ongoing research efforts. In particular, working set size estimation and memory characterization are our active area of scope. We explore different techniques for estimating memory bandwidth while minimizing storage overhead, including sampling-based approaches and compact trace representations. For contributors interested in extending these methods, please refer to our common library (see Common Profiler Library Documentation).

4.1 Profiler Output Parsers

These tools parse binary protobuf (.pb) files generated by profilers and convert them to human-readable formats.

4.1.1 trace_parser.py

Purpose: Parse binary protobuf memory trace files (memtrace_<pid>.pb) and convert them to JSON, CSV, or summary format.

Location: tools/trace_parser.py

Usage:

# Print summary to stdout (default)
python3 tools/trace_parser.py memtrace_12345.pb

# Export as JSON
python3 tools/trace_parser.py memtrace_12345.pb --format json

# Export to CSV file
python3 tools/trace_parser.py memtrace_12345.pb --format csv --output trace.csv

# Filter by thread ID
python3 tools/trace_parser.py memtrace_12345.pb --thread 12345 --format csv

# Limit number of events
python3 tools/trace_parser.py memtrace_12345.pb --limit 1000 --format json

Command-Line Options:

Flag

Default

Description

--format

summary

Output format: json, csv, or summary

--output, -o

stdout

Output file path

--thread

None

Filter events by thread ID

--limit

None

Maximum number of events to include

--indent

2

JSON indentation level

Output Fields:

  • timestamp: Event timestamp

  • thread_id: Thread ID for the memory access

  • address: Memory address (hexadecimal)

  • mem_op: Operation type (READ or WRITE)

  • hit_miss: Cache result (HIT or MISS)

4.1.2 timeseries_parser.py

Purpose: Parse binary protobuf time-series WSS metrics files (timeseries_<pid>.pb) and convert them to JSON, CSV, or summary format.

Location: tools/timeseries_parser.py

Usage:

# Print summary to stdout (default)
python3 tools/timeseries_parser.py timeseries_ls_12345.pb

# Export as JSON
python3 tools/timeseries_parser.py timeseries_ls_12345.pb --format json

# Export to CSV file
python3 tools/timeseries_parser.py timeseries_ls_12345.pb --format csv --output output.csv

# Filter by thread ID
python3 tools/timeseries_parser.py timeseries_ls_12345.pb --thread 12345 --format csv

Command-Line Options:

Flag

Default

Description

--format

summary

Output format: json, csv, or summary

--output, -o

stdout

Output file path

--thread

None

Filter samples by thread ID

--indent

2

JSON indentation level

Output Fields (per sample):

  • window_number: Sampling window index

  • thread_id: Thread ID

  • read_count: Number of read operations in window

  • write_count: Number of write operations in window

  • total_refs: Total memory references

  • wss_exact: Exact working set size (unique addresses)

  • wss_approx: Approximate WSS (HyperLogLog estimate)

  • timestamp: Sample timestamp

  • read_size_histogram: Distribution of read sizes (1, 2, 4, 8, 16, 32, 64, other bytes)

  • write_size_histogram: Distribution of write sizes

4.1.3 timeparser_plot.py

Purpose: Generate visualization plots from time-series protobuf data.

Location: tools/timeparser_plot.py

Usage:

# Display plot interactively
python3 tools/timeparser_plot.py timeseries_ls_12345.pb

# Save plot to file
python3 tools/timeparser_plot.py timeseries_ls_12345.pb --output my_plot.png

# Filter by thread ID
python3 tools/timeparser_plot.py timeseries_ls_12345.pb --thread 12345 --output thread_plot.png

Command-Line Options:

Flag

Default

Description

--output, -o

None

Output image file (if not provided, displays GUI)

--thread

None

Filter by thread ID

Generated Plots:

The tool creates a multi-grid figure with 5 subplots:

  1. Read Count - Total reads with size breakdown (1B, 2B, 4B, 8B, etc.)

  2. Write Count - Total writes with size breakdown

  3. WSS Exact - Exact working set size over time

  4. WSS Approx - Approximate WSS (HyperLogLog) over time

  5. WSS Absolute Error - Difference between exact and approximate WSS

4.2 Memory Analysis Tools

4.2.1 reuse_distance.py

Purpose: Calculate reuse distance for cache behavior analysis. Reuse distance is the number of unique addresses accessed between consecutive accesses to the same address. The tool is currently in legacy mode.

Location: tools/reuse_distance.py

Usage:

# Calculate reuse distance (output to reuse_<input_name>.txt)
python3 tools/reuse_distance.py trace.csv

# Specify output file
python3 tools/reuse_distance.py trace.csv --output reuse_results.txt

# Use windowed tracking (memory-efficient for large traces)
python3 tools/reuse_distance.py trace.csv --window-size 100000

Command-Line Options:

Flag

Default

Description

--output, -o

reuse_<input>.txt

Output file for reuse distances

--window-size

-1 (unlimited)

Memory window size (-1 for unlimited, >0 for windowed tracking)

Input Format:

Supports two trace formats:

  • CSV: timestamp,addr,op,size

  • Legacy: timestamp address operation size (space-separated)

Output Format:

Each line contains an address and its list of reuse distances:

0x7fff5a3c1000: [5, 12, 3, 8]
0x7fff5a3c1008: [2, 45, 7]

4.3 Metadata Extraction Tools

These tools extract build and environment metadata for profiling context.

4.3.1 environment_capture

Purpose: Capture environment variables and system information for profiling context. Provides both a C library and Python wrapper.

File Locations:

Component

Path

C Header

profilers/common/include/environment_capture.h

C Source

profilers/common/src/environment_capture.c

Python Wrapper

tools/environment_capture.py

C Library API

Data Structure:

typedef struct {
    char *hostname;           // Machine hostname
    char *os_name;            // Operating system name
    char *os_version;         // OS version string
    char *architecture;       // CPU architecture (e.g., "x86_64")
    char *working_directory;  // Current working directory

    char **env_names;         // Array of environment variable names
    char **env_values;        // Array of environment variable values
    size_t env_count;         // Number of captured variables
} system_environment_t;

Functions:

Function

Description

environment_capture_create()

Create and populate a system environment structure. Returns NULL on failure.

environment_capture_destroy(env)

Free the environment structure and all allocated memory.

environment_capture_get_var(env, name)

Get a specific environment variable value. Returns NULL if not found.

environment_capture_print(env)

Print environment information to stdout (for debugging).

environment_capture_timestamp_ns()

Get current timestamp in nanoseconds.

environment_capture_process_id()

Get current process ID.

C Usage Example:

#include "environment_capture.h"
#include <stdio.h>

int main() {
    system_environment_t* env = environment_capture_create();
    if (env == NULL) {
        fprintf(stderr, "Failed to capture environment\n");
        return 1;
    }

    printf("Hostname: %s\n", env->hostname);
    printf("OS: %s %s\n", env->os_name, env->os_version);
    printf("Architecture: %s\n", env->architecture);

    const char* user = environment_capture_get_var(env, "USER");
    if (user) {
        printf("User: %s\n", user);
    }

    environment_capture_destroy(env);
    return 0;
}

Python Wrapper API

Properties:

Property

Description

hostname

System hostname

os_name

Operating system name (e.g., “Linux”)

os_version

OS version/kernel release

architecture

CPU architecture (e.g., “x86_64”)

working_directory

Current working directory at capture time

process_id

Current process ID

timestamp_ns

Capture timestamp in nanoseconds

Methods:

Method

Description

get_variable(name)

Get a single environment variable value. Returns None if not found.

get_all_variables()

Get dictionary of all environment variables.

to_dict()

Convert entire capture to a dictionary (for JSON serialization).

Python Usage Example:

from tools.environment_capture import EnvironmentCapture
import json

env = EnvironmentCapture()

# Access system properties
print(f"Hostname: {env.hostname}")
print(f"OS: {env.os_name} {env.os_version}")
print(f"Architecture: {env.architecture}")
print(f"Working Directory: {env.working_directory}")

# Get specific environment variables
user = env.get_variable('USER')
home = env.get_variable('HOME')

# Get all environment variables
all_vars = env.get_all_variables()

# Export to dictionary (for JSON)
metadata = env.to_dict()
with open('environment_metadata.json', 'w') as f:
    json.dump(metadata, f, indent=2)

Integration with BaseMetadata

The EnvironmentCapture class integrates with BaseMetadata.py to provide unified metadata collection across all profilers.

Fields populated by EnvironmentCapture:

Metadata Field

Source

hostname

env.hostname

os_name

env.os_name

os_version

env.os_version

architecture

env.architecture

working_directory

env.working_directory

capture_timestamp

env.timestamp_ns

4.3.2 makefile_parser.py

Purpose: Extract build metadata from Makefiles, including variables, targets, dependencies, and compiler settings.

Location: tools/makefile_parser.py

Python Usage:

from tools.makefile_parser import MakefileParser, get_profiler_build_metadata

# Parse a single Makefile
parser = MakefileParser()
metadata = parser.parse_makefile('/path/to/Makefile')

print(f"Variables: {metadata['variables']}")
print(f"Targets: {metadata['targets']}")
print(f"Compiler Info: {metadata['compiler_info']}")
print(f"Version Info: {metadata['version_info']}")

# Parse all profiler Makefiles
profiler_metadata = get_profiler_build_metadata('/path/to/profilers')

Extracted Information:

  • variables: Makefile variable assignments

  • targets: Build targets and their dependencies

  • phony_targets: List of .PHONY targets

  • includes: Included Makefile paths

  • version_info: Version-related variables

  • compiler_info: Compiler settings (CC, CXX, etc.)

  • paths: Path-related variables

4.3.3 profiler_flag_parser.py

Purpose: Extract profiler-specific command-line flags from profiler source code.

Location: tools/profiler_flag_parser.py

Python Usage:

from tools.profiler_flag_parser import ProfilerFlagParser, extract_all_flags

# Parse a single profiler
parser = ProfilerFlagParser()
flags = parser.extract_flags('/path/to/profiler_dir')

print(f"Command Flags: {flags['command_flags']}")
print(f"Base Commands: {flags['base_commands']}")
print(f"Environment Variables: {flags['environment_variables']}")

# Extract flags from all profilers
all_flags = extract_all_flags('/path/to/profilers')

Extracted Information:

  • command_flags: Profiler-specific CLI flags

  • base_commands: Base command executables (e.g., drrun, ncu, perf)

  • configuration_options: Configuration settings

  • environment_variables: Required environment variables

4.4 Quick Reference

Tool

Input

Output

Use Case

trace_parser.py

.pb (protobuf)

JSON, CSV, summary

Parse memory traces

timeseries_parser.py

.pb (protobuf)

JSON, CSV, summary

Parse WSS time-series

timeparser_plot.py

.pb (protobuf)

PNG image

Visualize WSS trends

reuse_distance.py

CSV or legacy trace

Text file

Cache behavior analysis

environment_capture.py

System state

Python dict

Capture system metadata

makefile_parser.py

Makefile

Python dict

Extract build config

profiler_flag_parser.py

Profiler source

Python dict

Extract CLI flags

4.5 Common Workflows

4.5.1 Analyze Memory Behavior from Protobuf Output

This workflow demonstrates how to analyze memory behavior from DynamoRIO profiler output.

Step 1: Generate protobuf files from DynamoRIO

Run your workload with the DynamoRIO profiler to generate .pb output files.

Step 2: Parse time-series data for WSS trends

# Get summary statistics
python3 tools/timeseries_parser.py output/timeseries_myworkload_12345.pb

# Export to CSV for further analysis
python3 tools/timeseries_parser.py output/timeseries_myworkload_12345.pb \
    --format csv --output wss_data.csv

Step 3: Visualize WSS over time

python3 tools/timeparser_plot.py output/timeseries_myworkload_12345.pb \
    --output wss_plot.png

Step 4: Parse memory trace for detailed analysis

# Get trace summary
python3 tools/trace_parser.py output/memtrace_12345.pb

# Export first 10000 events to CSV
python3 tools/trace_parser.py output/memtrace_12345.pb \
    --format csv --limit 10000 --output trace_sample.csv

Step 5: Calculate reuse distance (optional)

python3 tools/reuse_distance.py trace_sample.csv --output reuse_analysis.txt

4.5.2 Generate Complete Metadata

This workflow shows how to combine metadata extraction tools.

from tools.environment_capture import EnvironmentCapture
from tools.makefile_parser import get_single_profiler_build_metadata
from tools.profiler_flag_parser import ProfilerFlagParser
import json

# Capture environment
env = EnvironmentCapture()

# Get build metadata
build_meta = get_single_profiler_build_metadata('profilers/dynamorio')

# Get profiler flags
parser = ProfilerFlagParser()
flags = parser.extract_flags('profilers/dynamorio')

# Combine into complete metadata
complete_metadata = {
    'environment': env.to_dict(),
    'build': build_meta,
    'profiler_flags': flags
}

# Save to JSON
with open('complete_metadata.json', 'w') as f:
    json.dump(complete_metadata, f, indent=2)

4.6 Installation Notes

Protobuf Parsers

The protobuf parsers (trace_parser.py and timeseries_parser.py) require the generated Python protobuf modules. These are auto-generated when building the common library:

cd profilers/common
make

This generates memory_trace_pb2.py and timeseries_metrics_pb2.py in the profilers/common/proto/ directory.

Visualization Tool

The timeparser_plot.py tool requires matplotlib:

pip3 install matplotlib

Metadata Tools

The metadata extraction tools (environment_capture.py, makefile_parser.py, profiler_flag_parser.py) have no additional dependencies beyond Python’s standard library.

For issues or feature requests, please refer to the project’s GitHub repository.