Description
Description
We experienced an unexpected restart of our Nethermind node. System metrics do not indicate resource exhaustion as the cause of the restart. Upon inspecting the logs, we found the following error occurring shortly before the restart:
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
This appears to be an isolated event, and we cannot reproduce the issue at this time. Since this is a mainnet node, the restart caused downtime, which is a significant concern for us. We would appreciate any guidance on:
- Understanding the root cause of this error.
- Steps to mitigate such incidents in the future.
- Recommendations for hardening our setup to improve resilience.
Steps to Reproduce
Unfortunately, we cannot provide a reproducible scenario for this issue:
- The node was running normally.
- The fatal error occurred, and the node restarted itself.
Actual behavior
The node unexpectedly restarted, resulting in downtime.
Expected behavior
If the error is non-critical, we would expect it to be handled gracefully, allowing the node to continue operating without a restart.
Desktop (please complete the following information):
Please provide the following information regarding your setup:
- Operating System: NixOS 23.05
- Version: 1.31.1
- Installation Method: GitHub Release
- Consensus Client: Nimbus
Logs
Please include any relevant logs that may help identify the issue.
2025-03-15 18:22:30.279 15 Mar 17:22:30 | Nethermind is starting up
2025-03-15 18:22:06.641 at System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
2025-03-15 18:22:06.641 at System.Threading.ThreadPoolWorkQueue.Dispatch()
2025-03-15 18:22:06.641 at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequests>d__238`1[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], Microsoft.AspNetCore.Server.Kestrel.Core, Version=9.0.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60]].MoveNext(System.Threading.Thread)
2025-03-15 18:22:06.641 at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread, System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
2025-03-15 18:22:06.641 at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequests>d__238`1[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
2025-03-15 18:22:06.641 at Microsoft.AspNetCore.Builder.Extensions.MapMiddleware.Invoke(Microsoft.AspNetCore.Http.HttpContext)
2025-03-15 18:22:06.641 at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Builder.Extensions.MapMiddleware+<InvokeCore>d__4, Microsoft.AspNetCore.Http.Abstractions, Version=9.0.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60]](<InvokeCore>d__4 ByRef)
2025-03-15 18:22:06.641 at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Prometheus.MetricServerMiddleware+<Invoke>d__7, Prometheus.AspNetCore, Version=8.0.0.0, Culture=neutral, PublicKeyToken=a243e9817ba9d559]](<Invoke>d__7 ByRef)
2025-03-15 18:22:06.641 at Microsoft.AspNetCore.Builder.Extensions.MapMiddleware+<InvokeCore>d__4.MoveNext()
2025-03-15 18:22:06.641 at Prometheus.MetricServerMiddleware+<Invoke>d__7.MoveNext()
2025-03-15 18:22:06.641 at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Prometheus.CollectorRegistry+<CollectAndSerializeAsync>d__21, Prometheus.NetStandard, Version=8.0.0.0, Culture=neutral, PublicKeyToken=a243e9817ba9d559]](<CollectAndSerializeAsync>d__21 ByRef)
2025-03-15 18:22:06.641 at Prometheus.CollectorRegistry+<CollectAndSerializeAsync>d__21.MoveNext()
2025-03-15 18:22:06.641 at Prometheus.CollectorRegistry+<RunBeforeCollectCallbacksAsync>d__22.MoveNext()
2025-03-15 18:22:06.641 at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Prometheus.CollectorRegistry+<RunBeforeCollectCallbacksAsync>d__22, Prometheus.NetStandard, Version=8.0.0.0, Culture=neutral, PublicKeyToken=a243e9817ba9d559]](<RunBeforeCollectCallbacksAsync>d__22 ByRef)
2025-03-15 18:22:06.641 at Prometheus.DotNetStats.UpdateMetrics()
2025-03-15 18:22:06.641 at System.Diagnostics.Process.EnsureHandleCountPopulated()
2025-03-15 18:22:06.641 at System.Collections.Generic.List`1[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]..ctor(System.Collections.Generic.IEnumerable`1<System.__Canon>)
2025-03-15 18:22:06.641 at System.IO.Enumeration.FileSystemEnumerator`1[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
2025-03-15 18:22:06.641 at System.IO.Enumeration.FileSystemEntry.Initialize(System.IO.Enumeration.FileSystemEntry ByRef, DirectoryEntry, System.ReadOnlySpan`1<Char>, System.ReadOnlySpan`1<Char>, System.ReadOnlySpan`1<Char>, System.Span`1<Char>)
2025-03-15 18:22:06.641 at System.SpanHelpers.NonPackedIndexOfValueType[[System.Byte, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.SpanHelpers+DontNegate`1[[System.Byte, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]](Byte ByRef, Byte, Int32)
2025-03-15 18:22:06.641 Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
2025-03-15 18:22:06.560 15 Mar 17:22:06 | Attempt to request ENR before bonding
Additional Context
We discovered that similar issues have been reported previously:
- Issue 1702 addressed by PR 1741
- Issue 1214
These reports suggest this may not be an isolated case. However, the recurrence raises concerns about whether the underlying problem is fully resolved or if additional safeguards are needed.