Skip to content

Conversation

@vgonkivs
Copy link
Member

@vgonkivs vgonkivs commented May 31, 2025

What was done:

  • Implemented 3 new shrex operations: GetSamples, GetRow, and GetRangeNamespaceData (previously returned ErrOperationNotSupported)
  • Unified metrics system: Replaced separate EDS/namespace metrics with a single getters_shrex_attempts_per_request metric that uses request_type labels
  • Refactored request handling: Introduced a unified executeRequest method that handles retry logic, peer management, and verification for all request types.

@vgonkivs vgonkivs self-assigned this May 31, 2025
@vgonkivs vgonkivs changed the base branch from main to shrex-unify May 31, 2025 11:28
@github-actions github-actions bot added the kind:break! Attached to breaking PRs label May 31, 2025
@vgonkivs vgonkivs added kind:feat Attached to feature PRs and removed kind:break! Attached to breaking PRs labels May 31, 2025
@vgonkivs vgonkivs force-pushed the enable_endpoints_in_shrex branch 3 times, most recently from 52d1c79 to 079288a Compare June 2, 2025 10:29
@vgonkivs vgonkivs force-pushed the shrex-unify branch 3 times, most recently from a04fc0f to 098e1f8 Compare July 23, 2025 11:04
@vgonkivs vgonkivs force-pushed the shrex-unify branch 2 times, most recently from 8f00af9 to b91fa09 Compare August 6, 2025 11:53
@vgonkivs vgonkivs force-pushed the enable_endpoints_in_shrex branch 2 times, most recently from 75cd1cc to 24a414f Compare August 13, 2025 12:47
Base automatically changed from shrex-unify to main September 1, 2025 14:29
@vgonkivs vgonkivs force-pushed the enable_endpoints_in_shrex branch 2 times, most recently from 031a56b to 1777240 Compare September 8, 2025 12:11
@vgonkivs vgonkivs force-pushed the enable_endpoints_in_shrex branch 3 times, most recently from 11d56ef to d7e3475 Compare September 15, 2025 15:09
@vgonkivs vgonkivs marked this pull request as ready for review September 15, 2025 15:21
@vgonkivs vgonkivs force-pushed the enable_endpoints_in_shrex branch from d7e3475 to 92d502f Compare September 15, 2025 15:30
@vgonkivs vgonkivs force-pushed the enable_endpoints_in_shrex branch from 92d502f to 5ef2686 Compare October 13, 2025 09:00
@vgonkivs
Copy link
Member Author

blocked on #4696

Comment on lines 318 to 331
_, err := getter.GetSamples(ctx, eh, coords)
require.Error(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to asset if partial response

Copy link
Member Author

@vgonkivs vgonkivs Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current implementation does not support both samples, error. I think, such response are very confusing for the users. Also, error means, that we've requested ALL(or almost all) peers from the peer manager and they have responded with the error.

Made a mistake, naming this test case. My original idea, was to cover the case when request contains both valid and invalid cords, so the getter returns error only

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You retry all samples at once and if one failed the attempt will retry all again. Instead client should retry only one failed sample and if one sample is unavailable return those that are available so on next attempts they not longer requested

Copy link
Member Author

@vgonkivs vgonkivs Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You retry all samples at once and if one failed the attempt will retry all again

if you are talking about the client, then NO bc execute request contains an infinite for loop that will try to request a sample from an available peer in PeerManager.
So, the request(of a single sample) fails only if context was canceled or deadline exceeded. Do you want me to return a partial response in this case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. need to return succesful samples similar to existing bitswap getter implementation

return nil, fmt.Errorf("getting rngdata from accessor: %w", err)
}

// TODO(@vgonkivs): This is a temporary solution that will be reworked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked to what? What makes it remporary?

Copy link
Member Author

@vgonkivs vgonkivs Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Custom reader implementation per each container instead of a buffer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rewrite this comment or remove it , because it does not provide any information for unaware reader

@vgonkivs vgonkivs force-pushed the enable_endpoints_in_shrex branch from c73aa81 to ae9632a Compare November 20, 2025 10:08
@vgonkivs vgonkivs added the kind:break! Attached to breaking PRs label Nov 20, 2025
renaynay
renaynay previously approved these changes Nov 20, 2025
renaynay
renaynay previously approved these changes Dec 1, 2025
Comment on lines +294 to +301
coords := []shwap.SampleCoords{
{Row: 0, Col: 10},
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: -check that requesting samples with ods_size+1 return error. It will help to capture edge cases with test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should limit users of requesting ods shares only. SampleID contructor was designed to verify against EDS, so I'd suggest to keep it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right. I mean eds_size+1 which is edge case

Comment on lines 318 to 331
_, err := getter.GetSamples(ctx, eh, coords)
require.Error(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You retry all samples at once and if one failed the attempt will retry all again. Instead client should retry only one failed sample and if one sample is unavailable return those that are available so on next attempts they not longer requested

return nil, fmt.Errorf("getting rngdata from accessor: %w", err)
}

// TODO(@vgonkivs): This is a temporary solution that will be reworked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rewrite this comment or remove it , because it does not provide any information for unaware reader

@vgonkivs vgonkivs force-pushed the enable_endpoints_in_shrex branch from 7efeaa5 to 6f336f8 Compare December 8, 2025 09:08
Comment on lines 476 to 485
func (rngdata *RangeNamespaceData) WriteTo(writer io.Writer) (int64, error) {
pbrngData := rngdata.ToProto()

b, err := pbrngData.Marshal()
if err != nil {
return 0, err
}
l, err := writer.Write(b)
return int64(l), err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From private discussion: Need to split into rows and sent them one by one similar to ND data.

Copy link
Member Author

@vgonkivs vgonkivs Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it requires changing in proto definition that will break the protocol. Current approach also works. Also NS data is a []RowNsData and RangeNsData contains a set of shares + 2 proofs(that can be empty)


// AccessorGetter abstracts storage system that indexes and manages multiple eds.AccessorGetter by
// network height.
type AccessorGetter interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this iface for one implementation but i'm also not gonna die on hill

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know but I've added it to be able to properly test some edge cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:break! Attached to breaking PRs kind:feat Attached to feature PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants