Where the slot goes: Nimbus and the execution timing it can't see

July 1, 2026 · 15 min read

Founder RockLogic

Artificial Intelligence

In the first edition of this series we measured, under Lighthouse, where a 12-second slot goes against the 4-second attestation deadline, and found that the execution client you pick is the lever: it shifts how often a node lands late enough to fail attestations by about 1.5x across the mainstream clients, and 3.7x once Erigon's disk-bound tail is counted. This edition runs the same six execution clients on the same bare-metal fleet, and asks the same question of Nimbus. The answer is the finding: Nimbus cannot tell you which execution client is costing you, because the one timing it reports is block arrival, and arrival is the part the execution client does not touch.

That is not a gap in our data. It is what Nimbus exposes. Where Lighthouse breaks the path to attestable into arrival, consensus verification and execution verification, Nimbus publishes a single histogram of block-arrival delay. The good news is that this histogram is complete, counting every block, not the once-a-minute sample Lighthouse's gauges gave us. The catch is that it sees only the network-and-proposer part of the slot, so the 3.7x spread that mattered under Lighthouse is simply not in the data Nimbus reports.

Read this first

Nimbus reports arrival, not execution. Its beacon_block_delay histogram measures the time from slot start to block arrival. It does not expose the execution-verification slice, a per-component breakdown, or a would-fail-attestations counter. Those are Lighthouse metrics, and they are empty for Nimbus on our fleet. So the per-client numbers below are arrival timing, which is set by the network and the proposer, not by your execution client.
The arrival distribution is complete, not sampled. Unlike Lighthouse's block-delay gauges, which were read about once a minute, this histogram counts every block Nimbus times, so the mean and the count past 4 seconds are exact, not a once-a-minute estimate. The buckets are coarse (2 s wide), so the median and 99th percentile are interpolated: read the 99th percentile as "just under 4 s," not a precise figure. The count past the 4 s bucket boundary is the trustworthy tail number.
Reth is excluded. It was still back-filling throughout this window (v2.3.0), its sync log reporting Executing stage 4/14 after six days on this deployment while the tip was past 25.4M, so it was not a synced node and its arrival timing is not comparable. See the sync-speed census for why a node can look alive on metrics while a log check shows it is still syncing.
Window and versions. Three days, 2026-06-27 to 2026-06-30, on the current Nimbus deployment multiarch-v26.6.0 paired with Geth v1.17.3, Nethermind 1.38.1, Besu 26.6.1, Erigon v3.4.4, Ethrex 17.0.0, on the NDC2 bare-metal hosts in Vienna. For Besu, Erigon and Ethrex these are a step newer than the Lighthouse edition's window (the methodology lists both). Each execution client's sync state was confirmed at the tip from its own container logs across the window, not from metrics alone.

Two instruments, two views of the same slot

The slot is the same one Lighthouse showed us: a block is proposed elsewhere, gossiped across the network, observed locally somewhere around 2 seconds in, then verified by the consensus and execution layers before it can be attested to, all against the 4-second line. What changes between consensus clients is which parts of that path they put a number on.

Two instruments for the same slot: Lighthouse breaks the path into block arrival, consensus verification and execution verification with a separate would-fail counter, while Nimbus publishes a single histogram covering block arrival only

Lighthouse gives the granular gauges (beacon_block_delay_observed_slot_start, _consensus_verification_time, _execution_time, _attestable_slot_start) plus the head_slot_start_exceeded counter, sampled about once a minute. Nimbus gives one histogram, beacon_block_delay, of arrival delay, complete to the block. Neither is the full picture. Lighthouse shows you the execution slice but only in samples; Nimbus shows you arrival in full but stops there. The consensus client you run decides which half of the trade you get.

What Nimbus shows well: the arrival distribution

For the part it does measure, Nimbus measures it completely. Where Lighthouse sampled its block-delay gauges about once a minute, Nimbus's histogram counts every block it times, so the mean and the count past 4 seconds are exact rather than the conservative floors that sampling forced on us in the first edition. Over the window, per pairing:

Block-arrival timing under Nimbus, per execution client: mean arrival around 2 seconds, 90th percentile around 3.5 seconds, and a few tenths of a percent of blocks arriving after the 4-second deadline, with the five clients in a narrow band

Execution client	Mean arrival	90th pct	Blocks past 4 s (3 d)	Blocks timed
Besu	1.94 s	3.43 s	37	11,801
Erigon	1.97 s	3.45 s	40	14,019
Ethrex	2.02 s	3.50 s	80	17,751
Geth	2.05 s	3.54 s	86	18,299
Nethermind	2.05 s	3.54 s	101	18,814

The shape is healthy and close for everyone: blocks arrive around 2 seconds in on the mean, the 90th percentile sits near 3.5 seconds, and over three days each pairing logged 37 to 101 blocks past the 4-second line, out of twelve to nineteen thousand timed, between about a third and a half of a percent. (A note on the percentiles: because well under 1% of blocks cross 4 seconds and the bucket below it is 2 seconds wide, an interpolated 99th percentile sits just under 4 seconds for every client, an artifact of bucket width, not a per-client reading. The mean and the past-4-second count are exact.) Those late-arriving blocks are a network-and-proposer story, not an execution-client one: a block that shows up at 4.5 seconds was slow to reach the node, and no execution layer makes it earlier.

Look down the table again. The five clients run from 1.94 to 2.05 seconds on the mean and from 37 to 101 blocks past 4 seconds, and that order does not match the execution ranking. The clearest tell is Erigon. Under Lighthouse, on the same NDC2 fleet and the same five execution clients (at slightly different versions, spelled out in the methodology), Erigon was the worst by a wide margin: the disk-bound newPayload tail that pushed its would-fail rate to 3.7x Geth's. Under Nimbus, Erigon has among the fewest late arrivals on the fleet, 40 in three days, second only to Besu. The execution outlier is unremarkable here, and the client with the most late arrivals, Nethermind, was middle of the pack on execution.

The blind spot: under Lighthouse the same five execution clients spread 3.7x apart on would-fail rate, driven by execution; under Nimbus they fall into a narrow arrival band where Erigon, the Lighthouse outlier, is unremarkable

That is the whole point. If Nimbus's histogram were picking up the execution cost, Erigon's heavy newPayload tail would put it near the top here, the way it topped the would-fail count under Lighthouse. Instead it sits near the bottom. There is a genuine spread in the arrival numbers, but it is a host-and-network spread, not an execution one, and the giveaway is that it is not stable: in the previous deployment of this fleet the same metric put Besu at the top of the late-arrival count, and here Besu is at the bottom. That pairing even moved to a different Nimbus host between the two deployments, and a patch bump to Besu cannot change when Nimbus first hears a block over gossip, so what moved the number is the machine and where it sits in the network, not the execution client. Arrival is decided by a host's network position, peer set and local load, before the execution client is asked to verify anything, so it cannot carry the execution client's signature.

So an operator running Nimbus who asks "which execution client is eating my attestation timing" gets no answer from Nimbus. The histogram will show all of them arriving around 2 seconds and will look reassuring, while a 3.7x difference in how often the timing budget blows sits entirely in the slice Nimbus does not report.

Where the execution slice went, and where to find it

Nimbus has not hidden the execution time out of carelessness; it records block delay at arrival and leaves engine-API timing to the execution layer. So the number you want is on the execution-layer side of the engine API, and the coverage there is itself uneven. On our fleet, Nethermind exposes nethermind_new_payload_execution_time and Reth exposes reth_consensus_engine_beacon_new_payload_latency, both timing the engine call directly. The others surface per-block execution time mainly in their logs rather than as a clean metric: Geth's import lines carry an elapsed field, Ethrex's [METRIC] BLOCK entries a per-block time. There is no single cross-client newPayload panel to read; you assemble it from a different source per client, which is the same instrumentation-coverage gap one layer down, and the cut Lighthouse handed us for free by timing newPayload itself.

Some consensus clients do time the engine call themselves. Prysm, for one, publishes a per-execution-client newPayload latency histogram, the view Nimbus does not give, which a later edition will use. Nimbus does publish a second histogram, validator_monitor_beacon_block_delay_seconds, which is populated on our fleet, but it tracks block delay for monitored validators, an arrival-side measurement again, not the engine call, so it does not rescue the blind spot.

What to measure on your own Nimbus nodes

Use the arrival histogram for what it is good at: proposer and network health. beacon_block_delay gives you a complete distribution of how late blocks reach this node. A drifting mean or a fattening 90th percentile is a peering or connectivity signal, and it is comparable across your pairings because arrival does not depend on the execution client. Our peering deep dive is where to take that.
Do not read the per-execution-client arrival spread as an execution-client ranking. As the Erigon inversion shows, the spread is host and network position, and it reshuffles when hosts change. If you switch execution client to chase a 0.1-second arrival difference, you will not get it, because you are not changing the thing that sets arrival.
For the execution slice, instrument the execution layer directly. Pull your execution client's own newPayload or block-processing timing and watch its tail, not its average. That is the lever the Lighthouse edition found, and on Nimbus it is the only place that lever is visible.
Watch the outcome, not only the inputs. Attestation inclusion distance and attestation effectiveness, which most operators already track through rated.network or beaconcha.in, are what a blown timing budget shows up as. If those stay clean, a slightly fat arrival tail or a slow newPayload is not yet costing you duties; if they slip, the arrival histogram and the execution-layer timing together tell you which half of the slot to chase.

Coming next in the series

Two consensus clients down, and they already disagree on what a slot timing metric even is: Lighthouse times the execution slice but only samples it, Nimbus measures arrival in full but stops there. The next editions take the same question to Teku, Prysm, Lodestar and Grandine, and the pattern we are testing is that each one shows a different slice of the same slot. Where a client's instrumentation hides something its neighbours expose, that is the finding, and we will say so.

The comparison holds the hardware and the consensus client fixed and varies only the execution client. Lining clients up on equal footing across windows of uneven data quality, and catching when a node only looks synced, is an analysis in its own right rather than a lookup on a static panel, and it is what StereumLabs AI does on our fleet, on the measurement stack we described here. The same data lets you do this yourself: line specific clients up on matched versions and identical hardware, with StereumLabs AI as the analyst, instead of eyeballing two editions run weeks apart. If you run Ethereum infrastructure and want this lens on your own nodes, reach us at stereumlabs.com or contact@stereumlabs.com.

Methodology

Numbers come from Nimbus's beacon_block_delay histogram on our NDC2 deployment (Vienna), queried on the Prometheus-cold datasource (uid aez9ck4wz05q8e), with the fleet labels documented in build your own dashboards. Per-client values are for cc_client="nimbus-super", cc_version="multiarch-v26.6.0", deployment="NDC2" grouped by ec_client, over the three days from 2026-06-27T00:00:00Z to 2026-06-30T00:00:00Z, with increase(...[3d]) evaluated at the closing anchor so the counts are stable and reproducible rather than drifting with query time. This is the current deployment, which by this window had been running about six days, long enough for every mainstream execution client to be synced and steady throughout.

Versions differ between the two editions, and we state it. The Lighthouse edition's window ran Besu 26.6.0, Erigon v3.4.3 and Ethrex 16.0.0; this Nimbus window runs Besu 26.6.1, Erigon v3.4.4 and Ethrex 17.0.0, with Geth v1.17.3 and Nethermind 1.38.1 identical in both (all confirmed from the datasource). Arrival is consensus-side and fixed before the execution client runs, so these differences do not move the arrival numbers here; they would matter for a direct execution-time comparison. Comparing specific execution clients head-to-head therefore wants matched versions on identical hardware, which is the version-pinned alignment StereumLabs and StereumLabs AI do across windows of version churn, and which you can run on your own pairings with the same data.
The metric is Nimbus's beacon_block_delay histogram (_bucket/_sum/_count), with buckets at 2, 4, 6, 8, 10, 12 and 14 seconds, in seconds. Nimbus records this delay when it first receives a block over gossip, before it hands the block to the execution layer to verify, so it is a block-arrival measurement. The cross-check that it is arrival and not a post-execution timestamp: its magnitude (around 2 s mean here) is in the range of Lighthouse's observed-arrival gauge on the same fleet, sits well below the execution-inclusive time-to-attestable, and does not move with execution-client speed.
The maths. The mean is increase(_sum[3d]) / increase(_count[3d]); the count past 4 seconds is increase(_bucket{le="+Inf"}[3d]) - increase(_bucket{le="4.0"}[3d]), exact because 4 seconds is a bucket boundary; quantiles are histogram_quantile over the same windowed buckets. Because the buckets are 2 seconds wide, the median and the 99th percentile are bucket-interpolated and approximate; the mean and the past-4-seconds count are not.
What the histogram counts, and why the per-client tail is not an execution ranking. Nimbus records this delay for blocks it received and timed on gossip, so the per-pairing counts (around 12,000 to 19,000 over roughly 21,600 slots) are a subset of all slots, and the share past 4 seconds is conditional on a block being timed rather than a per-slot miss rate. The per-client tail does show a spread (37 to 101 blocks past 4 seconds), but it tracks the host, not the execution client: it does not match the execution ranking, and the order is not stable across deployments. In the previous deployment of this fleet the same metric put Besu at the top of the late-arrival count; here Besu is at the bottom. That reshuffle is the signature of network position, not a client property.
Arrival is execution-client-independent, and we checked it both ways. The mean spans only 1.94 to 2.05 s across the five synced pairings, and the ordering does not match the execution ranking we measured under Lighthouse: Erigon, the Lighthouse would-fail outlier at 3.7x, has among the fewest late arrivals here. That non-correlation, together with the cross-deployment reshuffle, is the evidence that the spread is host and network position, not the execution client.
Sync was verified from logs, not metrics. For each pairing we read the execution client's own container logs across the window and confirmed it was importing at the chain tip: Geth's Imported new potential chain segment and Chain head was updated at the tip block, Besu's canonical block updates, Erigon's head validated and Forkchoice Commit with initialCycle false, Ethrex's [METRIC] BLOCK lines, Nethermind's Valid. Result of New Block at the tip. Reth was still in staged sync (Executing stage 4/14 after six days on this deployment), so it is excluded; its Received new payload lines reach the tip number but only record that the consensus client offered the head, not that Reth had imported it. On identical 12-core hosts; consensus and validator processes run on separate machines.
Nimbus exposes no execution-side block-delay metric. The Lighthouse gauges beacon_block_delay_execution_time, _consensus_verification_time, _attestable_slot_start and the counter _head_slot_start_exceeded_total return no series for cc_client="nimbus-super". This is the basis for the claim that Nimbus cannot rank execution clients on timing from its own metrics; the execution slice has to come from the execution client's own newPayload instrumentation.

Two instruments, two views of the same slot​

What Nimbus shows well: the arrival distribution​

The blind spot: arrival cannot rank execution clients​

Where the execution slice went, and where to find it​

What to measure on your own Nimbus nodes​

Coming next in the series​

Methodology​