4. Tools
========

The MemSysExplorer application framework includes a collection of analysis and utility tools designed to parse, process, and visualize profiler output data. These tools enable researchers to extract insights from memory traces, compute cache behavior metrics, and manage profiling metadata.

.. note::

   Some of the tools in this framework are developed as part of ongoing research efforts.
   In particular, working set size estimation and memory characterization are
   our active area of scope. We explore different techniques for estimating memory bandwidth
   while minimizing storage overhead, including sampling-based approaches and compact trace
   representations. For contributors interested in extending these methods, please refer to
   our common library (see :doc:`profilers/common`).

4.1 Profiler Output Parsers
---------------------------

These tools parse binary protobuf (``.pb``) files generated by profilers and convert them to human-readable formats.

4.1.1 trace_parser.py
~~~~~~~~~~~~~~~~~~~~~

**Purpose:** Parse binary protobuf memory trace files (``memtrace_<pid>.pb``) and convert them to JSON, CSV, or summary format.

**Location:** ``tools/trace_parser.py``

**Usage:**

.. code-block:: bash

   # Print summary to stdout (default)
   python3 tools/trace_parser.py memtrace_12345.pb

   # Export as JSON
   python3 tools/trace_parser.py memtrace_12345.pb --format json

   # Export to CSV file
   python3 tools/trace_parser.py memtrace_12345.pb --format csv --output trace.csv

   # Filter by thread ID
   python3 tools/trace_parser.py memtrace_12345.pb --thread 12345 --format csv

   # Limit number of events
   python3 tools/trace_parser.py memtrace_12345.pb --limit 1000 --format json

**Command-Line Options:**

.. list-table::
   :header-rows: 1
   :widths: 20 15 65

   * - Flag
     - Default
     - Description
   * - ``--format``
     - ``summary``
     - Output format: ``json``, ``csv``, or ``summary``
   * - ``--output``, ``-o``
     - stdout
     - Output file path
   * - ``--thread``
     - None
     - Filter events by thread ID
   * - ``--limit``
     - None
     - Maximum number of events to include
   * - ``--indent``
     - 2
     - JSON indentation level

**Output Fields:**

- ``timestamp``: Event timestamp
- ``thread_id``: Thread ID for the memory access
- ``address``: Memory address (hexadecimal)
- ``mem_op``: Operation type (``READ`` or ``WRITE``)
- ``hit_miss``: Cache result (``HIT`` or ``MISS``)

4.1.2 timeseries_parser.py
~~~~~~~~~~~~~~~~~~~~~~~~~~

**Purpose:** Parse binary protobuf time-series WSS metrics files (``timeseries_<pid>.pb``) and convert them to JSON, CSV, or summary format.

**Location:** ``tools/timeseries_parser.py``

**Usage:**

.. code-block:: bash

   # Print summary to stdout (default)
   python3 tools/timeseries_parser.py timeseries_ls_12345.pb

   # Export as JSON
   python3 tools/timeseries_parser.py timeseries_ls_12345.pb --format json

   # Export to CSV file
   python3 tools/timeseries_parser.py timeseries_ls_12345.pb --format csv --output output.csv

   # Filter by thread ID
   python3 tools/timeseries_parser.py timeseries_ls_12345.pb --thread 12345 --format csv

**Command-Line Options:**

.. list-table::
   :header-rows: 1
   :widths: 20 15 65

   * - Flag
     - Default
     - Description
   * - ``--format``
     - ``summary``
     - Output format: ``json``, ``csv``, or ``summary``
   * - ``--output``, ``-o``
     - stdout
     - Output file path
   * - ``--thread``
     - None
     - Filter samples by thread ID
   * - ``--indent``
     - 2
     - JSON indentation level

**Output Fields (per sample):**

- ``window_number``: Sampling window index
- ``thread_id``: Thread ID
- ``read_count``: Number of read operations in window
- ``write_count``: Number of write operations in window
- ``total_refs``: Total memory references
- ``wss_exact``: Exact working set size (unique addresses)
- ``wss_approx``: Approximate WSS (HyperLogLog estimate)
- ``timestamp``: Sample timestamp
- ``read_size_histogram``: Distribution of read sizes (1, 2, 4, 8, 16, 32, 64, other bytes)
- ``write_size_histogram``: Distribution of write sizes

4.1.3 timeparser_plot.py
~~~~~~~~~~~~~~~~~~~~~~~~

**Purpose:** Generate visualization plots from time-series protobuf data.

**Location:** ``tools/timeparser_plot.py``

**Usage:**

.. code-block:: bash

   # Display plot interactively
   python3 tools/timeparser_plot.py timeseries_ls_12345.pb

   # Save plot to file
   python3 tools/timeparser_plot.py timeseries_ls_12345.pb --output my_plot.png

   # Filter by thread ID
   python3 tools/timeparser_plot.py timeseries_ls_12345.pb --thread 12345 --output thread_plot.png

**Command-Line Options:**

.. list-table::
   :header-rows: 1
   :widths: 20 15 65

   * - Flag
     - Default
     - Description
   * - ``--output``, ``-o``
     - None
     - Output image file (if not provided, displays GUI)
   * - ``--thread``
     - None
     - Filter by thread ID

**Generated Plots:**

The tool creates a multi-grid figure with 5 subplots:

1. **Read Count** - Total reads with size breakdown (1B, 2B, 4B, 8B, etc.)
2. **Write Count** - Total writes with size breakdown
3. **WSS Exact** - Exact working set size over time
4. **WSS Approx** - Approximate WSS (HyperLogLog) over time
5. **WSS Absolute Error** - Difference between exact and approximate WSS

4.2 Memory Analysis Tools
-------------------------

4.2.1 reuse_distance.py
~~~~~~~~~~~~~~~~~~~~~~~

**Purpose:** Calculate reuse distance for cache behavior analysis. Reuse distance is the number of unique addresses accessed between consecutive accesses to the same address. The tool is currently in legacy mode.

**Location:** ``tools/reuse_distance.py``

**Usage:**

.. code-block:: bash

   # Calculate reuse distance (output to reuse_<input_name>.txt)
   python3 tools/reuse_distance.py trace.csv

   # Specify output file
   python3 tools/reuse_distance.py trace.csv --output reuse_results.txt

   # Use windowed tracking (memory-efficient for large traces)
   python3 tools/reuse_distance.py trace.csv --window-size 100000

**Command-Line Options:**

.. list-table::
   :header-rows: 1
   :widths: 20 20 60

   * - Flag
     - Default
     - Description
   * - ``--output``, ``-o``
     - ``reuse_<input>.txt``
     - Output file for reuse distances
   * - ``--window-size``
     - -1 (unlimited)
     - Memory window size (-1 for unlimited, >0 for windowed tracking)

**Input Format:**

Supports two trace formats:

- **CSV:** ``timestamp,addr,op,size``
- **Legacy:** ``timestamp address operation size`` (space-separated)

**Output Format:**

Each line contains an address and its list of reuse distances:

.. code-block:: text

   0x7fff5a3c1000: [5, 12, 3, 8]
   0x7fff5a3c1008: [2, 45, 7]

4.3 Metadata Extraction Tools
-----------------------------

These tools extract build and environment metadata for profiling context.

4.3.1 environment_capture
~~~~~~~~~~~~~~~~~~~~~~~~~

**Purpose:** Capture environment variables and system information for profiling context. Provides both a C library and Python wrapper.

**File Locations:**

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Component
     - Path
   * - C Header
     - ``profilers/common/include/environment_capture.h``
   * - C Source
     - ``profilers/common/src/environment_capture.c``
   * - Python Wrapper
     - ``tools/environment_capture.py``

C Library API
^^^^^^^^^^^^^

**Data Structure:**

.. code-block:: c

   typedef struct {
       char *hostname;           // Machine hostname
       char *os_name;            // Operating system name
       char *os_version;         // OS version string
       char *architecture;       // CPU architecture (e.g., "x86_64")
       char *working_directory;  // Current working directory

       char **env_names;         // Array of environment variable names
       char **env_values;        // Array of environment variable values
       size_t env_count;         // Number of captured variables
   } system_environment_t;

**Functions:**

.. list-table::
   :header-rows: 1
   :widths: 40 60

   * - Function
     - Description
   * - ``environment_capture_create()``
     - Create and populate a system environment structure. Returns ``NULL`` on failure.
   * - ``environment_capture_destroy(env)``
     - Free the environment structure and all allocated memory.
   * - ``environment_capture_get_var(env, name)``
     - Get a specific environment variable value. Returns ``NULL`` if not found.
   * - ``environment_capture_print(env)``
     - Print environment information to stdout (for debugging).
   * - ``environment_capture_timestamp_ns()``
     - Get current timestamp in nanoseconds.
   * - ``environment_capture_process_id()``
     - Get current process ID.

**C Usage Example:**

.. code-block:: c

   #include "environment_capture.h"
   #include <stdio.h>

   int main() {
       system_environment_t* env = environment_capture_create();
       if (env == NULL) {
           fprintf(stderr, "Failed to capture environment\n");
           return 1;
       }

       printf("Hostname: %s\n", env->hostname);
       printf("OS: %s %s\n", env->os_name, env->os_version);
       printf("Architecture: %s\n", env->architecture);

       const char* user = environment_capture_get_var(env, "USER");
       if (user) {
           printf("User: %s\n", user);
       }

       environment_capture_destroy(env);
       return 0;
   }

Python Wrapper API
^^^^^^^^^^^^^^^^^^

**Properties:**

.. list-table::
   :header-rows: 1
   :widths: 25 75

   * - Property
     - Description
   * - ``hostname``
     - System hostname
   * - ``os_name``
     - Operating system name (e.g., "Linux")
   * - ``os_version``
     - OS version/kernel release
   * - ``architecture``
     - CPU architecture (e.g., "x86_64")
   * - ``working_directory``
     - Current working directory at capture time
   * - ``process_id``
     - Current process ID
   * - ``timestamp_ns``
     - Capture timestamp in nanoseconds

**Methods:**

.. list-table::
   :header-rows: 1
   :widths: 35 65

   * - Method
     - Description
   * - ``get_variable(name)``
     - Get a single environment variable value. Returns ``None`` if not found.
   * - ``get_all_variables()``
     - Get dictionary of all environment variables.
   * - ``to_dict()``
     - Convert entire capture to a dictionary (for JSON serialization).

**Python Usage Example:**

.. code-block:: python

   from tools.environment_capture import EnvironmentCapture
   import json

   env = EnvironmentCapture()

   # Access system properties
   print(f"Hostname: {env.hostname}")
   print(f"OS: {env.os_name} {env.os_version}")
   print(f"Architecture: {env.architecture}")
   print(f"Working Directory: {env.working_directory}")

   # Get specific environment variables
   user = env.get_variable('USER')
   home = env.get_variable('HOME')

   # Get all environment variables
   all_vars = env.get_all_variables()

   # Export to dictionary (for JSON)
   metadata = env.to_dict()
   with open('environment_metadata.json', 'w') as f:
       json.dump(metadata, f, indent=2)

Integration with BaseMetadata
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``EnvironmentCapture`` class integrates with ``BaseMetadata.py`` to provide unified metadata collection across all profilers.

**Fields populated by EnvironmentCapture:**

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Metadata Field
     - Source
   * - ``hostname``
     - ``env.hostname``
   * - ``os_name``
     - ``env.os_name``
   * - ``os_version``
     - ``env.os_version``
   * - ``architecture``
     - ``env.architecture``
   * - ``working_directory``
     - ``env.working_directory``
   * - ``capture_timestamp``
     - ``env.timestamp_ns``

4.3.2 makefile_parser.py
~~~~~~~~~~~~~~~~~~~~~~~~

**Purpose:** Extract build metadata from Makefiles, including variables, targets, dependencies, and compiler settings.

**Location:** ``tools/makefile_parser.py``

**Python Usage:**

.. code-block:: python

   from tools.makefile_parser import MakefileParser, get_profiler_build_metadata

   # Parse a single Makefile
   parser = MakefileParser()
   metadata = parser.parse_makefile('/path/to/Makefile')

   print(f"Variables: {metadata['variables']}")
   print(f"Targets: {metadata['targets']}")
   print(f"Compiler Info: {metadata['compiler_info']}")
   print(f"Version Info: {metadata['version_info']}")

   # Parse all profiler Makefiles
   profiler_metadata = get_profiler_build_metadata('/path/to/profilers')

**Extracted Information:**

- ``variables``: Makefile variable assignments
- ``targets``: Build targets and their dependencies
- ``phony_targets``: List of .PHONY targets
- ``includes``: Included Makefile paths
- ``version_info``: Version-related variables
- ``compiler_info``: Compiler settings (CC, CXX, etc.)
- ``paths``: Path-related variables

4.3.3 profiler_flag_parser.py
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Purpose:** Extract profiler-specific command-line flags from profiler source code.

**Location:** ``tools/profiler_flag_parser.py``

**Python Usage:**

.. code-block:: python

   from tools.profiler_flag_parser import ProfilerFlagParser, extract_all_flags

   # Parse a single profiler
   parser = ProfilerFlagParser()
   flags = parser.extract_flags('/path/to/profiler_dir')

   print(f"Command Flags: {flags['command_flags']}")
   print(f"Base Commands: {flags['base_commands']}")
   print(f"Environment Variables: {flags['environment_variables']}")

   # Extract flags from all profilers
   all_flags = extract_all_flags('/path/to/profilers')

**Extracted Information:**

- ``command_flags``: Profiler-specific CLI flags
- ``base_commands``: Base command executables (e.g., ``drrun``, ``ncu``, ``perf``)
- ``configuration_options``: Configuration settings
- ``environment_variables``: Required environment variables

4.4 Quick Reference
-------------------

.. list-table::
   :header-rows: 1
   :widths: 25 25 25 25

   * - Tool
     - Input
     - Output
     - Use Case
   * - ``trace_parser.py``
     - ``.pb`` (protobuf)
     - JSON, CSV, summary
     - Parse memory traces
   * - ``timeseries_parser.py``
     - ``.pb`` (protobuf)
     - JSON, CSV, summary
     - Parse WSS time-series
   * - ``timeparser_plot.py``
     - ``.pb`` (protobuf)
     - PNG image
     - Visualize WSS trends
   * - ``reuse_distance.py``
     - CSV or legacy trace
     - Text file
     - Cache behavior analysis
   * - ``environment_capture.py``
     - System state
     - Python dict
     - Capture system metadata
   * - ``makefile_parser.py``
     - Makefile
     - Python dict
     - Extract build config
   * - ``profiler_flag_parser.py``
     - Profiler source
     - Python dict
     - Extract CLI flags

4.5 Common Workflows
--------------------

4.5.1 Analyze Memory Behavior from Protobuf Output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This workflow demonstrates how to analyze memory behavior from DynamoRIO profiler output.

**Step 1: Generate protobuf files from DynamoRIO**

Run your workload with the DynamoRIO profiler to generate ``.pb`` output files.

**Step 2: Parse time-series data for WSS trends**

.. code-block:: bash

   # Get summary statistics
   python3 tools/timeseries_parser.py output/timeseries_myworkload_12345.pb

   # Export to CSV for further analysis
   python3 tools/timeseries_parser.py output/timeseries_myworkload_12345.pb \
       --format csv --output wss_data.csv

**Step 3: Visualize WSS over time**

.. code-block:: bash

   python3 tools/timeparser_plot.py output/timeseries_myworkload_12345.pb \
       --output wss_plot.png

**Step 4: Parse memory trace for detailed analysis**

.. code-block:: bash

   # Get trace summary
   python3 tools/trace_parser.py output/memtrace_12345.pb

   # Export first 10000 events to CSV
   python3 tools/trace_parser.py output/memtrace_12345.pb \
       --format csv --limit 10000 --output trace_sample.csv

**Step 5: Calculate reuse distance (optional)**

.. code-block:: bash

   python3 tools/reuse_distance.py trace_sample.csv --output reuse_analysis.txt

4.5.2 Generate Complete Metadata
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This workflow shows how to combine metadata extraction tools.

.. code-block:: python

   from tools.environment_capture import EnvironmentCapture
   from tools.makefile_parser import get_single_profiler_build_metadata
   from tools.profiler_flag_parser import ProfilerFlagParser
   import json

   # Capture environment
   env = EnvironmentCapture()

   # Get build metadata
   build_meta = get_single_profiler_build_metadata('profilers/dynamorio')

   # Get profiler flags
   parser = ProfilerFlagParser()
   flags = parser.extract_flags('profilers/dynamorio')

   # Combine into complete metadata
   complete_metadata = {
       'environment': env.to_dict(),
       'build': build_meta,
       'profiler_flags': flags
   }

   # Save to JSON
   with open('complete_metadata.json', 'w') as f:
       json.dump(complete_metadata, f, indent=2)

4.6 Installation Notes
----------------------

**Protobuf Parsers**

The protobuf parsers (``trace_parser.py`` and ``timeseries_parser.py``) require the generated Python protobuf modules. These are auto-generated when building the common library:

.. code-block:: bash

   cd profilers/common
   make

This generates ``memory_trace_pb2.py`` and ``timeseries_metrics_pb2.py`` in the ``profilers/common/proto/`` directory.

**Visualization Tool**

The ``timeparser_plot.py`` tool requires matplotlib:

.. code-block:: bash

   pip3 install matplotlib

**Metadata Tools**

The metadata extraction tools (``environment_capture.py``, ``makefile_parser.py``, ``profiler_flag_parser.py``) have no additional dependencies beyond Python's standard library.

For issues or feature requests, please refer to the project's GitHub repository.