Skip to content

Conversation

@mbrandenburger
Copy link
Member

@mbrandenburger mbrandenburger commented Aug 28, 2025

The current stack suffers from high resource utilization (active number of goroutines and increasing memory usage) when invoking views through the p2p comm layer.

This PR tames the resource consumption.

  • set reasonable read buffer size
  • cleanup subcon when they are closed
  • more efficient sessionID computation
  • remove delayed subcon close
  • remove delayed context deletion after view execution

@mbrandenburger
Copy link
Member Author

This screenshot shows the memory consumption (memory leak) and exploding number of goroutines with and without the fix in this PR.

Screenshot 2025-08-29 at 09 07 10

@adecaro
Copy link
Contributor

adecaro commented Aug 29, 2025

@mbrandenburger , wow, super interesting. Only the rate of objects allocated got worse. Do I read it correctly?

@mbrandenburger
Copy link
Member Author

@mbrandenburger , wow, super interesting. Only the rate of objects allocated got worse. Do I read it correctly?

No no, not worse. It increased was the FSC node was not able to process at a higher rate. Here is a screen shot of TPS/latency for the view invocation.

You can see, with the fix in this PR, the TPS ~doubled and the latency remains stable. Thus, this the node is processing more invocations, the object allocation rate is highter.

Screenshot 2025-08-29 at 09 07 00

@mbrandenburger
Copy link
Member Author

I am running this fix now with the TokenSDK hyperledger-labs/fabric-token-sdk#1217 🤞

@mbrandenburger mbrandenburger requested review from adecaro, ale-linux, alexandrosfilios and arner and removed request for alexandrosfilios August 29, 2025 08:53
@mbrandenburger mbrandenburger added this to the Q3 milestone Aug 29, 2025
@mbrandenburger mbrandenburger marked this pull request as ready for review August 29, 2025 08:54
Copy link
Contributor

@adecaro adecaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mbrandenburger mbrandenburger changed the title Fix comm stack resource consumtion Fix comm stack resource consumption Sep 1, 2025
Copy link
Contributor

@ale-linux ale-linux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff @mbrandenburger! Only issue I see is that getting rid of the debug printouts might bite is in the ass later... how about we leave them encased in an if that checks whether the debug level is enabled?

_ = s.Close()
return
}
logger.Debugf("Read message of length [%d] on [%s]", len(msg), s.Hash())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it not be useful to have such a debug log?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is we could put it within a logger.IsEnabledFor(zapcore.DebugLevel) if statement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! I restored the log here and put it inside the if clause using logger.IsEnabledFor(zapcore.DebugLevel). Thanks guys.

Copy link
Contributor

@arner arner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks like a big improvement!

_ = s.Close()
return
}
logger.Debugf("Read message of length [%d] on [%s]", len(msg), s.Hash())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is we could put it within a logger.IsEnabledFor(zapcore.DebugLevel) if statement.


output := make([]*comm.ViewPacket, 0, len(input))
m := sync.RWMutex{}
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I'm missing something but does this have to be in a goroutine + waitgroup or can it just as well be inlined?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that the read and write channels are buffered, we could indeed first write all messages to the stream and then read without an spawning the goroutine to do that. I don't have strong preferences here.

- set reasonable read buffer size
- cleanup subcon when they are closed
- more efficient sessionID computation
- remove delayed subcon close
- remove delayed context deletion after view execution

Signed-off-by: Marcus Brandenburger <bur@zurich.ibm.com>
@mbrandenburger mbrandenburger merged commit 0af30f4 into hyperledger-labs:main Sep 2, 2025
18 checks passed
@mbrandenburger mbrandenburger deleted the comm-perf-fixes branch September 2, 2025 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants