Common Profiler Library Documentation

The Common Profiler Library provides a collection of reusable, high-performance utilities that are shared across all profilers in the MemSysExplorer framework. This library eliminates code duplication and ensures consistent data structures and algorithms are used throughout the profiling infrastructure.

Important

MemSysExplorer Common Library

Refer to the codebase for the latest update: https://github.com/duca181/MemSysExplorer/tree/apps_dev/apps/profilers/common

To learn more about license terms and third-party attribution, refer to the 6. Licensing and Attribution page.

Overview

The common library contains five core components that support memory profiling, cardinality estimation, hashing, metadata collection, and trace serialization:

  1. HyperLogLog (HLL) – Probabilistic cardinality estimation for working set size

  2. MurmurHash3 – Fast non-cryptographic hash function

  3. Working Set Tree Search – Exact working set tracking using tree-based data structures

  4. Memory Trace (Protobuf) – Compact binary trace format for memory access events

  5. Environment Capture – System and runtime metadata collection

Components

HyperLogLog (HLL)

A probabilistic data structure that estimates the cardinality (number of unique elements) in large datasets with minimal memory overhead.

Key Features:

  • Memory Efficient: Uses logarithmic space relative to dataset size

  • Configurable Precision: Supports 4-16 precision bits

  • Fast Updates: Constant-time element insertion

  • Merge Support: Combine multiple HLL structures for distributed counting

Use Cases:

  • Approximate working set size estimation

  • Unique cache line counting

  • Memory footprint analysis with low overhead

Files:

  • include/hll.h

  • src/hll.c

MurmurHash3

A fast, non-cryptographic hash function optimized for hash table lookups and distributed hashing.

Key Features:

  • High-quality hash distribution

  • Platform-independent output

  • Optimized for x86/x64 architectures

Files:

  • include/MurmurHash3.h

  • src/MurmurHash3.c

Dependencies:

  • Used internally by HyperLogLog for element hashing

Working Set Tree Search (ws_tsearch)

An exact working set tracking data structure based on balanced binary trees. This structure maintains precise counts of unique memory accesses and provides statistics for memory footprint analysis.

Key Features:

  • Exact Counting: Tracks every unique access with perfect accuracy

  • Windowed Statistics: Reset and track working sets over time windows

  • Efficient Lookup: O(log n) insertion and search using GNU tsearch

Tracked Metrics:

  • Total number of references

  • Distinct unique addresses

  • Single-access addresses (temporal locality indicator)

Files:

  • include/ws_tsearch.h

  • src/ws_tsearch.c

Memory Trace (Protobuf)

A Google Protocol Buffers-based serialization format for memory access traces. Provides compact binary encoding with structured schema for trace analysis.

Key Features:

  • Compact Format: Binary protobuf serialization reduces file size

  • Structured Schema: Well-defined message format for interoperability

  • Optional Dependency: Gracefully degrades if protobuf is unavailable

  • Fast I/O: Efficient buffered writes

Trace Fields:

  • timestamp_us – Microsecond-resolution timestamp

  • thread_id – Thread identifier

  • address – Memory address accessed

  • is_write – Read/write operation type

  • cache_hit – Hit/miss status (if available)

Files:

  • include/memory_trace.h

  • src/memory_trace.cpp

  • proto/memory_trace.proto

Environment Capture

A standalone library for capturing system and runtime metadata including hostname, OS information, architecture, and environment variables.

Key Features:

  • Static Capture: Captures environment at initialization time

  • Minimal Dependencies: Uses only standard C library

  • Python Wrapper: Integration with BaseMetadata.py for unified metadata collection

  • Comprehensive Coverage: Captures all environment variables and system info

Captured Metadata:

  • Hostname

  • Operating system name and version

  • System architecture (x86_64, ARM, etc.)

  • Current working directory

  • All environment variables (PATH, HOME, CUDA_HOME, etc.)

Files:

  • include/environment_capture.h

  • src/environment_capture.c

  • Python wrapper: tools/environment_capture.py

Usage in Profilers

To use the common library in your profiler implementation:

CMake Integration

Add the common library as a subdirectory in your CMakeLists.txt:

# Add common library
add_subdirectory(../common common)

# Link against the library
target_link_libraries(your_profiler profiler_common)

Header Inclusion

Include the necessary headers in your C/C++ source:

#include "hll.h"                 // HyperLogLog cardinality estimation
#include "MurmurHash3.h"         // Fast hashing
#include "ws_tsearch.h"          // Exact working set tracking
#include "memory_trace.h"        // Protobuf trace output (optional)
#include "environment_capture.h" // System metadata capture

Example: Working Set Tracking

#include "ws_tsearch.h"
#include "hll.h"

int main() {
    // Exact working set tracking
    ws_ctx_t* ws = ws_create();
    ws_record(ws, 0x400000);  // Record cache line access
    ws_record(ws, 0x400040);

    ws_stats_t stats;
    ws_get_stats(ws, &stats);
    printf("Distinct accesses: %lu\n", stats.distinct);
    ws_destroy(ws);

    // Approximate working set tracking with HLL
    HLL hll;
    hll_init(&hll, 8);  // 8 precision bits

    uintptr_t addr = 0x400000;
    hll_add(&hll, &addr, sizeof(addr));

    double estimated_unique = hll_count(&hll);
    printf("Estimated unique lines: %.0f\n", estimated_unique);

    hll_destroy(&hll);
    return 0;
}

Example: Memory Trace Output

#include "memory_trace.h"

int main() {
    // Create trace writer
    memory_trace_writer_t* writer = memory_trace_create_writer();
    if (!writer) {
        fprintf(stderr, "Protobuf not available\n");
        return 1;
    }

    // Record memory events
    memory_trace_add_event(writer,
                           1000000,      // timestamp (us)
                           123,          // thread_id
                           0x400000,     // address
                           MEM_READ,     // read operation
                           CACHE_HIT);   // cache hit

    // Write to file
    memory_trace_write_to_file(writer, "trace.pb");
    memory_trace_destroy_writer(writer);

    return 0;
}

Example: Environment Metadata Capture

#include "environment_capture.h"

int main() {
    system_environment_t* env = environment_capture_create();

    printf("Hostname: %s\n", env->hostname);
    printf("OS: %s\n", env->os_name);
    printf("Architecture: %s\n", env->arch);
    printf("Working Directory: %s\n", env->cwd);

    // Get specific environment variable
    const char* cuda_home = environment_capture_get_var(env, "CUDA_HOME");
    if (cuda_home) {
        printf("CUDA_HOME: %s\n", cuda_home);
    }

    environment_capture_destroy(env);
    return 0;
}

Dependencies

Core Dependencies

  • Standard C Library (required)

  • GNU C Library for tsearch/tfind/twalk/tdestroy (required on Linux)

Optional Dependencies

  • Google Protocol Buffers (libprotobuf-dev, protobuf-compiler)

    Required for memory trace functionality. If not available, the library will compile without protobuf support.

Installing Protocol Buffers

Ubuntu/Debian:

sudo apt-get install libprotobuf-dev protobuf-compiler

CentOS/RHEL:

sudo yum install protobuf-devel protobuf-compiler

macOS:

brew install protobuf

Build System

The common library uses CMake and automatically detects available dependencies:

cd profilers/common
mkdir build && cd build
cmake ..
make

The build system will:

  • Detect if Protocol Buffers is available

  • Compile protobuf-dependent features only if found

  • Provide graceful fallback for missing optional dependencies

Integration with Profilers

The following MemSysExplorer profilers use the common library:

  • DynamoRIO – Uses HLL and ws_tsearch for working set estimation

  • NVBit – Uses memory trace protobuf format for GPU memory access traces

  • Perf – Uses environment capture for metadata collection

  • Sniper – Uses working set tracking for cache simulation validation

Additional Notes

  • The common library is designed to be zero-dependency for core functionality

  • Protocol Buffers is optional and only required for trace serialization

  • All data structures are thread-safe or explicitly documented as thread-unsafe

  • The library follows POSIX standards for maximum portability

Troubleshooting

Issue: Protobuf not found during build

Solution: Install Protocol Buffers development libraries or build without protobuf support. The library will automatically disable protobuf-dependent features.

Issue: tsearch/tfind functions not available

Solution: Ensure you are building on a POSIX-compliant system with GNU C Library. For non-Linux systems, alternative implementations may be required.

Issue: Linker errors when using the library

Solution: Ensure profiler_common is properly linked in your CMakeLists.txt:

target_link_libraries(your_profiler profiler_common)