Skip to content

Add BufferChannel for fsm incoming/outgoing channels. #2869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

taitov
Copy link

@taitov taitov commented Jan 15, 2025

Added the ability to select a type of fsm incoming/outgoing channels:

  1. InfiniteChannel (old behavior, default)
  2. BufferChannel

BufferChannel allows to accumulate and combine paths in a separate thread, in the case when the reading side (for example, BgpServer.Serve) is busy.

A map with NLRI key (bufferChannel.pathIdxs) is also used to combine multiple changes of one path into a single update.

BufferChannel prevents infinite memory consumption because its maximum size is comparable to the size of the adj-in (or adj-out for the fsm outgoing channel).

@taitov
Copy link
Author

taitov commented Jan 15, 2025

cc #2862

@bayrinat
Copy link
Contributor

@fujita could you please take a look at it?

@fujita
Copy link
Member

fujita commented Jan 17, 2025

I don't think it's a good idea for GoBGP's most core code to have two different implementations.
What's the maximum buffer size of Go's buffered channel?
If it's large, then we could use buffered channel instead of infinite channel like #2862 The buffered size can be specified by a command line option or such.

@fujita
Copy link
Member

fujita commented Jan 17, 2025

btw, what kind of use case do you have that puts such a heavy load on GoBGP?

@taitov
Copy link
Author

taitov commented Jan 17, 2025

I don't think it's a good idea for GoBGP's most core code to have two different implementations. What's the maximum buffer size of Go's buffered channel? If it's large, then we could use buffered channel instead of infinite channel like #2862 The buffered size can be specified by a command line option or such.

A regular channel with a limited size will not work, as it can still be reached, which may lead to disconnection with peer.

BufferChannel can combine multiple updates into one, discarding outdated path's attributes. The most effectiveness of this channel is shown when there is a storm of attribute updates for one path from peer.

@taitov
Copy link
Author

taitov commented Jan 17, 2025

Сonfiguration example:

peer-groups:
  - config:
      peer-group-name: group1
      peer-as: 12345
      local-as: 54321
      incoming-channel-timeout: 1
      incoming-channel:
        channel-type: buffer
        size: 64
      outgoing-channel:
        channel-type: buffer
        size: 64

neighbors:
  - config:
      neighbor-address: 10.0.0.1
      peer-group: group1

or

neighbors:
  - config:
      neighbor-address: 10.0.0.1
      incoming-channel:
        channel-type: buffer
        size: 64
      outgoing-channel:
        channel-type: buffer
        size: 64
  - config:
      neighbor-address: 10.0.0.2
      incoming-channel:
        channel-type: buffer
        size: 256

@taitov
Copy link
Author

taitov commented Jan 17, 2025

btw, what kind of use case do you have that puts such a heavy load on GoBGP?

Paths are being cyclically updated from peers due to attribute updates. And main thread BgpServer.Serve() a busy (ex, processes the mgmtOperation or propagateUpdate paths when peer went disconnect). In this scenario, InfiniteChannel begins to fill up quickly.

@taitov
Copy link
Author

taitov commented Jan 28, 2025

@fujita gentle ping

1 similar comment
@taitov
Copy link
Author

taitov commented Feb 19, 2025

@fujita gentle ping

@fujita
Copy link
Member

fujita commented Feb 24, 2025

Why just not removing incoming queues?

@taitov
Copy link
Author

taitov commented Feb 24, 2025

Why just not removing incoming queues?

  • Then peers can start flap (hold timer expired).
  • OOM on remote peer (overflow outgoing fsm channel).

@taitov
Copy link
Author

taitov commented Feb 24, 2025

Why just not removing incoming queues?

  • Then peers can start flap (hold timer expired).
  • OOM on remote peer (overflow outgoing fsm channel).

But with this PR - сan add another type of fsm incoming/outgoing channels, which will disable queues.

@fujita
Copy link
Member

fujita commented Feb 24, 2025

I meant that why not removing incoming channels?

@taitov
Copy link
Author

taitov commented Feb 24, 2025

I meant that why not removing incoming channels?

I understand you :) And my answer: if we remove incoming channels, we will get possible problems: peers flap and OOM on remote gobgp server.

@fujita
Copy link
Member

fujita commented Feb 24, 2025

Hmm, why? goroutine, reads from a socket, parses a BGP message, and updates the table with a lock and PrefixLimit config. I don't see the difference about OOM.

@mpeim
Copy link

mpeim commented Apr 10, 2025

Hey!
Do you know where we are regarding this PR?
Couldn't we do it in two steps, first solve inbound channels and afterward the others?

Added the ability to select a type of fsm incoming/outgoing channels:
1. InfiniteChannel (old behavior, default)
2. BufferChannel

BufferChannel allows to accumulate and combine paths in a separate thread, in the case when the reading side (for example, BgpServer.Serve) is busy.
A map with NLRI key is also used to combine multiple changes of one path into a single update.
@taitov
Copy link
Author

taitov commented May 13, 2025

Test result with 10k updates per second:

Green line: memory usage.
Blue line: CPU usage.

InfiniteChannel (old behavior)
image

BufferChannel
image

InfiniteChannel used 4GB memory vs. BufferChannel 16MB on same BGP updates pattern.

@fujita
Copy link
Member

fujita commented May 16, 2025

Nice graphs! Really clear and informative.

Is there a way to reproduce a similar load in my environment (like MRT or something)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants