Skip to main content

Purpose & Scope

StereumLabs provides neutral, hardware-backed measurements of Ethereum clients. This page explains what we measure, how we choose clients and environments, and what we publish so others can reproduce or critique our results.

info

These lists are not exhaustive.


Measurements

We measure client behavior under defined scenarios (e.g., initial sync, steady state, stress/load, long-horizon stability). Each run is a measurement episode with pinned versions, fixed configuration, and an environment profile.

  • Measurement domains

    • Resource: CPU, memory, disk I/O, network throughput/packets.
    • Protocol: subtopic throughput, message drop rate, reorg indicators (where applicable).
    • Connectivity: peer count/churn, peer scoring events, sync status.
    • Performance: block import time, execution latency, queue depths.
    • Reliability: error rates, crashes, restarts, uptime.
  • Run anatomy

    • clients: EC/CC names and versions.
    • config: flags, files, and deviations from defaults.
    • environment: hardware/VM specs, kernel/OS, network topology, geolocation.
    • collection: exporters/log sources, sampling intervals, time sync strategy.
    • validation: sanity checks and acceptance criteria.

Neutral

Our neutrality goal is comparability:

  • Equalized conditions: identical scenarios, synchronized start/end windows, equal resource budgets (CPU/RAM/storage/NIC) per test slot.
  • Config policy: start from client defaults; apply only documented, client-team-recommended flags required for correctness/stability or scenarios; justify any deviation.
  • Isolation: no co-tenancy with unrelated workloads; pin CPU sets and NUMA when relevant.
  • Clock discipline: NTP/Chrony enforced; reject runs with material skew.
  • Reviewer notes: record trade-offs (e.g., different pruning modes) and mark any residual asymmetry.

Hardware-backed

We maintain a bare-metal baseline to anchor measurements:

  • Declared specs: CPU model/stepping, core/thread counts, RAM size/type, storage model/RAID/filesystem, NIC model/firmware.
  • Kernel/OS: version, scheduler/IO tunables, hugepages, transparent hugepages, swap policy.
  • Power/thermal: profile and caps where applicable.
  • Repeatability: publish spec hashes and firmware versions to detect drift.

Client selection

  • Version policy: prefer latest stable; include N-1 if it materially affects interpretation; prereleases only in experimental tracks.
  • Matrix strategy: run all EC/CC pairs to cover the cross-product over time; prioritize pairs widely used in production; scenarios might only run a subselect.
  • Exclusion criteria: known critical bugs affecting the scenario; lack of required exporter/telemetry; unresolvable reproducibility issues (documented when excluded).

Client coverage

BesuErigonEthrexGethNethermindReth
Grandine
Lighthouse
Lodestar
Nimbus
Prysm
Teku

Publishing

We publish runs (or artifacts) so others can read, reproduce, and compare results.

  • Dashboards: time-aligned charts with scenario details.
  • Data exports: CSV/Parquet snapshots with schema and sampling metadata.
  • Run manifests: metadata (clients, versions, flags, env details, scenario, start/stop timestamps).
  • Method notes: rationale for flags, deviations, and known caveats.

Definitions

Every charted metric maps to a definition that includes:

  • Name & unit, sampling method, aggregation (e.g., mean over 10s windows), data source (exporter/log).
  • Scope: per-process, per-host, or protocol-level.
  • Transformations: normalization (e.g., per core), rate derivations, smoothing.
  • Interpretation notes: what “good”/“bad” looks like and common pitfalls.

Procedures

  • Planning: select scenario → pin versions → pre-flight checks.
  • Execution: apply configs → cold/warm start policy → run for target duration.
  • Collection: start exporters first; label streams with run ID; ensure monotonic time.
  • Validation: missingness thresholds, outlier detection, compare to historical bands.
  • Aggregation: align series; compute derived metrics; emit manifest.
  • Publication: push dashboards/exports; attach links to definitions and manifests.

Limitations

  • Environment variance: even with controls, NIC/IO contention and kernel behavior can vary across updates.
  • Network randomness: peer set quality affects sync/throughput; we document peer discovery mode and limits.
  • Client heterogeneity: different pruning modes, caches, or defaults may not be functionally identical; any non-default is justified in notes.
  • Sampling bias: exporter cadence and jitter can skew short-interval metrics; we prefer windowed aggregates for comparisons.
  • External change: network upgrades or client releases during longer runs can change baselines; affected segments are flagged.

Environments

We use two categories of environments. Each published run declares its environment profile and a hash of key parameters which can be resolved to the respective configration parameters.

Bare-metal

  • Purpose: stable, low-noise anchor for long-term comparability.
  • Spec disclosure:
    • CPU: Epyc 9654P
    • Memory: ECC DDR5
    • Storage: ZFS raidz1 with Solidigm D7-P5520
    • Network: 2x Broadcom P225P
    • OS/Kernel: Proxmox VE 8.4
  • Repeatability: re-image procedure; firmware freeze windows; change log for hardware swaps.

Cloud

  • Purpose: reflect common operator choices; quantify differences vs bare-metal.
  • Profiles: representative instance families per provider (e.g., general purpose vs compute-optimized), with pinned images and storage/network configs.
  • Scenarios may include:
    • Provider/region/zone; instance type; vCPU/mem; hypervisor generation.
    • Attached storage type/size/IOPS; network bandwidth class; placement policy (if any).
    • OS image ID; kernel; agent/metadata settings; clock sync configuration.
  • Controls & caveats:
    • Noisy neighbor risk and variable underlying hardware; runs may be repeated and averaged.
    • Cap burst features (CPU credits, network burst) where possible; document when unavoidable.
    • Affinity/anti-affinity and placement groups are disclosed if used.

Change control for this page: material edits will be logged in the global Changelog with a short rationale and effective date.