Skip to content

Conversation

@markgoddard
Copy link

SPIRE agent periodically syncs entries, federated bundles and SVIDs from
the SPIRE server. This happens every 5 seconds with a backoff. There is
a 30 second timeout for this process. If there are many federations for
the entries allowed for the agent, this timeout can be exceeded.

This change adds a concurrent fetch for federated bundles to reduce the
chance of hitting the timeout.

Fixes: #6490

SPIRE agent periodically syncs entries, federated bundles and SVIDs from
the SPIRE server. This happens every 5 seconds with a backoff. There is
a 30 second timeout for this process. If there are many federations for
the entries allowed for the agent, this timeout can be exceeded.

This change adds a concurrent fetch for federated bundles to reduce the
chance of hitting the timeout.

Fixes: spiffe#6490
Copy link
Collaborator

@sorindumitru sorindumitru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @markgoddard for providing a fix to this issue. This is looking good. I think we have some tests for fetching the federated bundles, but could you add one with more bundles than the number of workers to make sure that keeps working?

}

// fetchBundle fetches a single federated bundle from SPIRE server.
func (c *client) fetchBundle(ctx context.Context, bundleClient bundlev1.BundleClient, trustDomain string) (*types.Bundle, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include federated in the names of these functions since they can only deal with federated bundles.

Suggested change
func (c *client) fetchBundle(ctx context.Context, bundleClient bundlev1.BundleClient, trustDomain string) (*types.Bundle, error) {
func (c *client) fetchFederatedBundle(ctx context.Context, bundleClient bundlev1.BundleClient, trustDomain string) (*types.Bundle, error) {

// fetchBundlesConcurrently fetches federated bundles concurrently.
// This is done to improve sync times when there are many federations. This should ensure that the
// sync does not exceed rpcTimeout.
func (c *client) fetchBundlesConcurrently(ctx context.Context, bundleClient bundlev1.BundleClient, trustDomains []string, bundles []*types.Bundle) ([]*types.Bundle, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (c *client) fetchBundlesConcurrently(ctx context.Context, bundleClient bundlev1.BundleClient, trustDomains []string, bundles []*types.Bundle) ([]*types.Bundle, error) {
func (c *client) fetchFederatedBundlesConcurrently(ctx context.Context, bundleClient bundlev1.BundleClient, trustDomains []string, bundles []*types.Bundle) ([]*types.Bundle, error) {

Comment on lines +642 to +644
wg.Add(1)
go func() {
defer wg.Done()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can now do something like this to avoid the Add() and defer:

Suggested change
wg.Add(1)
go func() {
defer wg.Done()
wg.Go(func() {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SPIRE agent can exceed the RPC timeout when synchronising bundles with many federations

2 participants