Tracing a Besu memory leak to a one-line method
Six Besu nodes, same version, same hardware, same config. Five held a flat JVM heap around 1.0 to 1.3 GB. The sixth climbed about 10 GB a day and was on track to be OOM-killed roughly 30 hours after a restart. The one thing different about it was the consensus client on the other side of the engine API.
This is a walkthrough of how StereumLabs AI, reading our fleet's metrics and logs, took that one anomalous node, traced it to a single method, and filed it upstream. Besu shipped a round of mitigations and closed the issue. A later devnet reproduction showed the underlying layers still pile up, the issue was reopened, and the fix that followed is now in review. The bug is operational: recoverable by a restart, no consensus impact, no double-sign, no state-root divergence. It is also the kind of cross-client interaction a single-node test will never surface, because it only appears when a live pairing lands in a specific state.

