4. Tools ======== The MemSysExplorer application framework includes a collection of analysis and utility tools designed to parse, process, and visualize profiler output data. These tools enable researchers to extract insights from memory traces, compute cache behavior metrics, and manage profiling metadata. .. note:: Some of the tools in this framework are developed as part of ongoing research efforts. In particular, working set size estimation and memory characterization are our active area of scope. We explore different techniques for estimating memory bandwidth while minimizing storage overhead, including sampling-based approaches and compact trace representations. For contributors interested in extending these methods, please refer to our common library (see :doc:`profilers/common`). 4.1 Profiler Output Parsers --------------------------- These tools parse binary protobuf (``.pb``) files generated by profilers and convert them to human-readable formats. 4.1.1 trace_parser.py ~~~~~~~~~~~~~~~~~~~~~ **Purpose:** Parse binary protobuf memory trace files (``memtrace_.pb``) and convert them to JSON, CSV, or summary format. **Location:** ``tools/trace_parser.py`` **Usage:** .. code-block:: bash # Print summary to stdout (default) python3 tools/trace_parser.py memtrace_12345.pb # Export as JSON python3 tools/trace_parser.py memtrace_12345.pb --format json # Export to CSV file python3 tools/trace_parser.py memtrace_12345.pb --format csv --output trace.csv # Filter by thread ID python3 tools/trace_parser.py memtrace_12345.pb --thread 12345 --format csv # Limit number of events python3 tools/trace_parser.py memtrace_12345.pb --limit 1000 --format json **Command-Line Options:** .. list-table:: :header-rows: 1 :widths: 20 15 65 * - Flag - Default - Description * - ``--format`` - ``summary`` - Output format: ``json``, ``csv``, or ``summary`` * - ``--output``, ``-o`` - stdout - Output file path * - ``--thread`` - None - Filter events by thread ID * - ``--limit`` - None - Maximum number of events to include * - ``--indent`` - 2 - JSON indentation level **Output Fields:** - ``timestamp``: Event timestamp - ``thread_id``: Thread ID for the memory access - ``address``: Memory address (hexadecimal) - ``mem_op``: Operation type (``READ`` or ``WRITE``) - ``hit_miss``: Cache result (``HIT`` or ``MISS``) 4.1.2 timeseries_parser.py ~~~~~~~~~~~~~~~~~~~~~~~~~~ **Purpose:** Parse binary protobuf time-series WSS metrics files (``timeseries_.pb``) and convert them to JSON, CSV, or summary format. **Location:** ``tools/timeseries_parser.py`` **Usage:** .. code-block:: bash # Print summary to stdout (default) python3 tools/timeseries_parser.py timeseries_ls_12345.pb # Export as JSON python3 tools/timeseries_parser.py timeseries_ls_12345.pb --format json # Export to CSV file python3 tools/timeseries_parser.py timeseries_ls_12345.pb --format csv --output output.csv # Filter by thread ID python3 tools/timeseries_parser.py timeseries_ls_12345.pb --thread 12345 --format csv **Command-Line Options:** .. list-table:: :header-rows: 1 :widths: 20 15 65 * - Flag - Default - Description * - ``--format`` - ``summary`` - Output format: ``json``, ``csv``, or ``summary`` * - ``--output``, ``-o`` - stdout - Output file path * - ``--thread`` - None - Filter samples by thread ID * - ``--indent`` - 2 - JSON indentation level **Output Fields (per sample):** - ``window_number``: Sampling window index - ``thread_id``: Thread ID - ``read_count``: Number of read operations in window - ``write_count``: Number of write operations in window - ``total_refs``: Total memory references - ``wss_exact``: Exact working set size (unique addresses) - ``wss_approx``: Approximate WSS (HyperLogLog estimate) - ``timestamp``: Sample timestamp - ``read_size_histogram``: Distribution of read sizes (1, 2, 4, 8, 16, 32, 64, other bytes) - ``write_size_histogram``: Distribution of write sizes 4.1.3 timeparser_plot.py ~~~~~~~~~~~~~~~~~~~~~~~~ **Purpose:** Generate visualization plots from time-series protobuf data. **Location:** ``tools/timeparser_plot.py`` **Usage:** .. code-block:: bash # Display plot interactively python3 tools/timeparser_plot.py timeseries_ls_12345.pb # Save plot to file python3 tools/timeparser_plot.py timeseries_ls_12345.pb --output my_plot.png # Filter by thread ID python3 tools/timeparser_plot.py timeseries_ls_12345.pb --thread 12345 --output thread_plot.png **Command-Line Options:** .. list-table:: :header-rows: 1 :widths: 20 15 65 * - Flag - Default - Description * - ``--output``, ``-o`` - None - Output image file (if not provided, displays GUI) * - ``--thread`` - None - Filter by thread ID **Generated Plots:** The tool creates a multi-grid figure with 5 subplots: 1. **Read Count** - Total reads with size breakdown (1B, 2B, 4B, 8B, etc.) 2. **Write Count** - Total writes with size breakdown 3. **WSS Exact** - Exact working set size over time 4. **WSS Approx** - Approximate WSS (HyperLogLog) over time 5. **WSS Absolute Error** - Difference between exact and approximate WSS 4.2 Memory Analysis Tools ------------------------- 4.2.1 reuse_distance.py ~~~~~~~~~~~~~~~~~~~~~~~ **Purpose:** Calculate reuse distance for cache behavior analysis. Reuse distance is the number of unique addresses accessed between consecutive accesses to the same address. The tool is currently in legacy mode. **Location:** ``tools/reuse_distance.py`` **Usage:** .. code-block:: bash # Calculate reuse distance (output to reuse_.txt) python3 tools/reuse_distance.py trace.csv # Specify output file python3 tools/reuse_distance.py trace.csv --output reuse_results.txt # Use windowed tracking (memory-efficient for large traces) python3 tools/reuse_distance.py trace.csv --window-size 100000 **Command-Line Options:** .. list-table:: :header-rows: 1 :widths: 20 20 60 * - Flag - Default - Description * - ``--output``, ``-o`` - ``reuse_.txt`` - Output file for reuse distances * - ``--window-size`` - -1 (unlimited) - Memory window size (-1 for unlimited, >0 for windowed tracking) **Input Format:** Supports two trace formats: - **CSV:** ``timestamp,addr,op,size`` - **Legacy:** ``timestamp address operation size`` (space-separated) **Output Format:** Each line contains an address and its list of reuse distances: .. code-block:: text 0x7fff5a3c1000: [5, 12, 3, 8] 0x7fff5a3c1008: [2, 45, 7] 4.3 Metadata Extraction Tools ----------------------------- These tools extract build and environment metadata for profiling context. 4.3.1 environment_capture ~~~~~~~~~~~~~~~~~~~~~~~~~ **Purpose:** Capture environment variables and system information for profiling context. Provides both a C library and Python wrapper. **File Locations:** .. list-table:: :header-rows: 1 :widths: 30 70 * - Component - Path * - C Header - ``profilers/common/include/environment_capture.h`` * - C Source - ``profilers/common/src/environment_capture.c`` * - Python Wrapper - ``tools/environment_capture.py`` C Library API ^^^^^^^^^^^^^ **Data Structure:** .. code-block:: c typedef struct { char *hostname; // Machine hostname char *os_name; // Operating system name char *os_version; // OS version string char *architecture; // CPU architecture (e.g., "x86_64") char *working_directory; // Current working directory char **env_names; // Array of environment variable names char **env_values; // Array of environment variable values size_t env_count; // Number of captured variables } system_environment_t; **Functions:** .. list-table:: :header-rows: 1 :widths: 40 60 * - Function - Description * - ``environment_capture_create()`` - Create and populate a system environment structure. Returns ``NULL`` on failure. * - ``environment_capture_destroy(env)`` - Free the environment structure and all allocated memory. * - ``environment_capture_get_var(env, name)`` - Get a specific environment variable value. Returns ``NULL`` if not found. * - ``environment_capture_print(env)`` - Print environment information to stdout (for debugging). * - ``environment_capture_timestamp_ns()`` - Get current timestamp in nanoseconds. * - ``environment_capture_process_id()`` - Get current process ID. **C Usage Example:** .. code-block:: c #include "environment_capture.h" #include int main() { system_environment_t* env = environment_capture_create(); if (env == NULL) { fprintf(stderr, "Failed to capture environment\n"); return 1; } printf("Hostname: %s\n", env->hostname); printf("OS: %s %s\n", env->os_name, env->os_version); printf("Architecture: %s\n", env->architecture); const char* user = environment_capture_get_var(env, "USER"); if (user) { printf("User: %s\n", user); } environment_capture_destroy(env); return 0; } Python Wrapper API ^^^^^^^^^^^^^^^^^^ **Properties:** .. list-table:: :header-rows: 1 :widths: 25 75 * - Property - Description * - ``hostname`` - System hostname * - ``os_name`` - Operating system name (e.g., "Linux") * - ``os_version`` - OS version/kernel release * - ``architecture`` - CPU architecture (e.g., "x86_64") * - ``working_directory`` - Current working directory at capture time * - ``process_id`` - Current process ID * - ``timestamp_ns`` - Capture timestamp in nanoseconds **Methods:** .. list-table:: :header-rows: 1 :widths: 35 65 * - Method - Description * - ``get_variable(name)`` - Get a single environment variable value. Returns ``None`` if not found. * - ``get_all_variables()`` - Get dictionary of all environment variables. * - ``to_dict()`` - Convert entire capture to a dictionary (for JSON serialization). **Python Usage Example:** .. code-block:: python from tools.environment_capture import EnvironmentCapture import json env = EnvironmentCapture() # Access system properties print(f"Hostname: {env.hostname}") print(f"OS: {env.os_name} {env.os_version}") print(f"Architecture: {env.architecture}") print(f"Working Directory: {env.working_directory}") # Get specific environment variables user = env.get_variable('USER') home = env.get_variable('HOME') # Get all environment variables all_vars = env.get_all_variables() # Export to dictionary (for JSON) metadata = env.to_dict() with open('environment_metadata.json', 'w') as f: json.dump(metadata, f, indent=2) Integration with BaseMetadata ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The ``EnvironmentCapture`` class integrates with ``BaseMetadata.py`` to provide unified metadata collection across all profilers. **Fields populated by EnvironmentCapture:** .. list-table:: :header-rows: 1 :widths: 30 70 * - Metadata Field - Source * - ``hostname`` - ``env.hostname`` * - ``os_name`` - ``env.os_name`` * - ``os_version`` - ``env.os_version`` * - ``architecture`` - ``env.architecture`` * - ``working_directory`` - ``env.working_directory`` * - ``capture_timestamp`` - ``env.timestamp_ns`` 4.3.2 makefile_parser.py ~~~~~~~~~~~~~~~~~~~~~~~~ **Purpose:** Extract build metadata from Makefiles, including variables, targets, dependencies, and compiler settings. **Location:** ``tools/makefile_parser.py`` **Python Usage:** .. code-block:: python from tools.makefile_parser import MakefileParser, get_profiler_build_metadata # Parse a single Makefile parser = MakefileParser() metadata = parser.parse_makefile('/path/to/Makefile') print(f"Variables: {metadata['variables']}") print(f"Targets: {metadata['targets']}") print(f"Compiler Info: {metadata['compiler_info']}") print(f"Version Info: {metadata['version_info']}") # Parse all profiler Makefiles profiler_metadata = get_profiler_build_metadata('/path/to/profilers') **Extracted Information:** - ``variables``: Makefile variable assignments - ``targets``: Build targets and their dependencies - ``phony_targets``: List of .PHONY targets - ``includes``: Included Makefile paths - ``version_info``: Version-related variables - ``compiler_info``: Compiler settings (CC, CXX, etc.) - ``paths``: Path-related variables 4.3.3 profiler_flag_parser.py ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Purpose:** Extract profiler-specific command-line flags from profiler source code. **Location:** ``tools/profiler_flag_parser.py`` **Python Usage:** .. code-block:: python from tools.profiler_flag_parser import ProfilerFlagParser, extract_all_flags # Parse a single profiler parser = ProfilerFlagParser() flags = parser.extract_flags('/path/to/profiler_dir') print(f"Command Flags: {flags['command_flags']}") print(f"Base Commands: {flags['base_commands']}") print(f"Environment Variables: {flags['environment_variables']}") # Extract flags from all profilers all_flags = extract_all_flags('/path/to/profilers') **Extracted Information:** - ``command_flags``: Profiler-specific CLI flags - ``base_commands``: Base command executables (e.g., ``drrun``, ``ncu``, ``perf``) - ``configuration_options``: Configuration settings - ``environment_variables``: Required environment variables 4.4 Quick Reference ------------------- .. list-table:: :header-rows: 1 :widths: 25 25 25 25 * - Tool - Input - Output - Use Case * - ``trace_parser.py`` - ``.pb`` (protobuf) - JSON, CSV, summary - Parse memory traces * - ``timeseries_parser.py`` - ``.pb`` (protobuf) - JSON, CSV, summary - Parse WSS time-series * - ``timeparser_plot.py`` - ``.pb`` (protobuf) - PNG image - Visualize WSS trends * - ``reuse_distance.py`` - CSV or legacy trace - Text file - Cache behavior analysis * - ``environment_capture.py`` - System state - Python dict - Capture system metadata * - ``makefile_parser.py`` - Makefile - Python dict - Extract build config * - ``profiler_flag_parser.py`` - Profiler source - Python dict - Extract CLI flags 4.5 Common Workflows -------------------- 4.5.1 Analyze Memory Behavior from Protobuf Output ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This workflow demonstrates how to analyze memory behavior from DynamoRIO profiler output. **Step 1: Generate protobuf files from DynamoRIO** Run your workload with the DynamoRIO profiler to generate ``.pb`` output files. **Step 2: Parse time-series data for WSS trends** .. code-block:: bash # Get summary statistics python3 tools/timeseries_parser.py output/timeseries_myworkload_12345.pb # Export to CSV for further analysis python3 tools/timeseries_parser.py output/timeseries_myworkload_12345.pb \ --format csv --output wss_data.csv **Step 3: Visualize WSS over time** .. code-block:: bash python3 tools/timeparser_plot.py output/timeseries_myworkload_12345.pb \ --output wss_plot.png **Step 4: Parse memory trace for detailed analysis** .. code-block:: bash # Get trace summary python3 tools/trace_parser.py output/memtrace_12345.pb # Export first 10000 events to CSV python3 tools/trace_parser.py output/memtrace_12345.pb \ --format csv --limit 10000 --output trace_sample.csv **Step 5: Calculate reuse distance (optional)** .. code-block:: bash python3 tools/reuse_distance.py trace_sample.csv --output reuse_analysis.txt 4.5.2 Generate Complete Metadata ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This workflow shows how to combine metadata extraction tools. .. code-block:: python from tools.environment_capture import EnvironmentCapture from tools.makefile_parser import get_single_profiler_build_metadata from tools.profiler_flag_parser import ProfilerFlagParser import json # Capture environment env = EnvironmentCapture() # Get build metadata build_meta = get_single_profiler_build_metadata('profilers/dynamorio') # Get profiler flags parser = ProfilerFlagParser() flags = parser.extract_flags('profilers/dynamorio') # Combine into complete metadata complete_metadata = { 'environment': env.to_dict(), 'build': build_meta, 'profiler_flags': flags } # Save to JSON with open('complete_metadata.json', 'w') as f: json.dump(complete_metadata, f, indent=2) 4.6 Installation Notes ---------------------- **Protobuf Parsers** The protobuf parsers (``trace_parser.py`` and ``timeseries_parser.py``) require the generated Python protobuf modules. These are auto-generated when building the common library: .. code-block:: bash cd profilers/common make This generates ``memory_trace_pb2.py`` and ``timeseries_metrics_pb2.py`` in the ``profilers/common/proto/`` directory. **Visualization Tool** The ``timeparser_plot.py`` tool requires matplotlib: .. code-block:: bash pip3 install matplotlib **Metadata Tools** The metadata extraction tools (``environment_capture.py``, ``makefile_parser.py``, ``profiler_flag_parser.py``) have no additional dependencies beyond Python's standard library. For issues or feature requests, please refer to the project's GitHub repository.