feat(full/availability): adding different withholding options to tests #2697

distractedm1nd · 2023-09-11T16:00:34Z

Closes #2592 by adding three different modes to full availability test nodes: FullyRecoverable, BarelyRecoverable, and Unrecoverable.

BarelyRecoverable test case requires 230 nodes for 99% chance of recoverability:

BarelyRecoverable is not used in DisconnectedFullNodes, only in ConnectedFullNodes, because as you can see by the graph by splitting the 230 node count in two we are not ensured failed recoverability.

…s and extending test cases

share/availability/test/withholding_adder.go

ramin

i had a good "in person" review of this that was convincing. I'm curious about the effectiveness of the "2 lightnode subnet split brain" test based on how the removed blocks are provisioned from Unrecoverable, but maybe i'm imagining a test that doesn't exist yet (ie: a full node transitions from failing to reconstruct from 1 subnet, to being able to with 2 subnets) but this might just be my limited grasp

share/availability/full/reconstruction_test.go

walldiss · 2023-09-14T10:29:58Z

share/availability/full/reconstruction_test.go

 	}
-
-	err := errg.Wait()
-	require.NoError(t, err)
 }

 // TestShareAvailable_DisconnectedFullNodes asserts that two disconnected full


Random thought: Just curious, if it would still work same for similar topology, but for more than 2 full nodes.

It should, yes

Good thought, we need to park this in an issue. @musalbas requested the ability to scale the test in such a way.

share/availability/test/withholding_adder.go

codecov-commenter · 2023-09-18T08:02:37Z

Codecov Report

Merging #2697 (416d090) into main (7c1f8f3) will decrease coverage by 0.05%.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##             main    #2697      +/-   ##
==========================================
- Coverage   51.37%   51.33%   -0.05%     
==========================================
  Files         159      159              
  Lines       10699    10709      +10     
==========================================
  Hits         5497     5497              
- Misses       4723     4734      +11     
+ Partials      479      478       -1

Files Changed	Coverage Δ
share/availability/full/testing.go	`93.33% <83.33%> (-6.67%)`	⬇️

... and 7 files with indirect coverage changes

…tion on first attempt

renaynay · 2023-09-25T09:29:31Z

share/availability/test/testing.go

+// false, otherwise it acts the same but keeps the node at index (k + 1, k + 1).
+func FillWithheldBS(t *testing.T, n int, bServ blockservice.BlockService, recoverable bool) *share.Root {
+	shares := sharetest.RandShares(t, n*n)
+	eds, err := AddAndWithholdShares(context.TODO(), shares, bServ, recoverable)


should we make this a context.WithCancel and then t.Cleanup the cancel

Wondertan · 2023-09-27T14:04:27Z

share/availability/full/reconstruction_test.go

+		{
+			name:            "barely recoverable",
+			origSquareSize:  16,
+			lightNodes:      230, // 99% chance of recoverability


[question]
Where is 230 coming from?

24 light nodes(c) are enough when data is not withheld

Table 1 in the paper states 69 for light nodes(c) as the number for the same k and s as in our test.

The graph below does not have it either

2k2k = 1024 total shares in square; 23020 = 4600 max shares clients would sample. Seems too much,

Again, this case is not covered in the paper. Refer to my description, and I can send you the script so you can verify it for yourself

The description didn't describe where 230 is coming from and thus the question.

Again, this case is not covered in the paper
Okay, then I missed some discussion around it because I don't remember such a fact. In my memory, figures from the paper assume withholding attack as it is extensively described in theorems in the same sections as all the plots I am referring to.

Please share the script, I am curious about how it's constructed and would like to play and verify with it. The number 230 feels too much for me when 23 is enough to reconstruct the whole square when data isn't withheld. Did you verify the script with @musalbas

23 is not enough to reconstruct the whole square with high probability in this case and matches my simulations, which you can verify running yourself. Will send the script over slack

I meant 23/24 when data is not withheld, updated the comment.

https://gist.github.com/distractedm1nd/d2b306da13581b97bbbcefaa5b050738
I apologize for the shitty code, didnt expect it to be read by others :)

Main loop flow in montle carlo script looks familiar 🙂
If ds.sample(s) would act as noop (for example for share that is withheld), it would mean that given share is not sampled, but you count it as sampled here:

for len(uniqueSamples) < shares { s := randomSample(k) uniqueSamples[s]++ ds.sample(s) }

Could you please clarify, why you count it as sampled, when it is not? Looks like it could lead to less than 20 samples being assigned to each light node and make required amount of light nodes higher that it needs to be.
I have tried to modify the code to be as following, to ensure 20 samples per every light node:

for len(uniqueSamples) < shares { s := randomSample(k) if ds.sample(s) { uniqueSamples[s]++ } }

On adjusted calculation algo, probability graph shows, that there is decent chance to do reconstruction for withholding case on amount of light node lower than 230:

Yes - I tried to match it with the current implementation that does not change the coordinates to sample after failure but just retries. Or is this incorrect?

In any case, seeing how much the numbers improve after this change indicates it should be what we are doing both in IPLD retrieval as well as here in the test. Thank you for looking deeper into this!

I've checked light availability and seems like it is acting like neither of simulations. 😆 I'll drop my finding into new issue to discuss further.

Wondertan · 2023-09-27T14:12:41Z

share/availability/test/withholding_adder.go

+	if err != nil {
+		return nil, err
+	}
+	// commit the batch to ipfs


[uber nit]

Suggested change

// commit the batch to ipfs

// commit the batch to ipld

Actually, this looks like a copy of a AddShares. I'm curious why we copy as delete functionality in AddAndWithholdShares looks like we can compose it over the AddShares func.

It was only copied in the first case because the intention was to use some internal state, but the most recent version just iterates over the adder and ipld Translates the DAH.

So I will see if I can just call AddShares and do the iteration after

share/availability/test/withholding_adder.go

Wondertan · 2023-09-27T15:22:09Z

share/availability/full/reconstruction_test.go

 	}
-
-	err := errg.Wait()
-	require.NoError(t, err)
 }

 // TestShareAvailable_DisconnectedFullNodes asserts that two disconnected full


Good thought, we need to park this in an issue. @musalbas requested the ability to scale the test in such a way.

Wondertan · 2023-09-27T15:22:32Z

share/availability/full/testing.go

+	// FullyRecoverable makes all EDS shares available when filling the
+	// blockservice.
+	FullyRecoverable Recoverability = iota
+	// BarelyRecoverable withholds the (k + 1)^2 subsquare of the 2k^2 EDS,


[doc uber nit]
re: subsquare. We call em quadrants and just to decrease confusion we should continue to call em so.

I do not mean quadrants, look to the image in desc to see what I am referring to

share/availability/full/testing.go

Wondertan · 2023-09-27T15:27:20Z

share/availability/full/testing.go

@@ -26,10 +26,39 @@ func GetterWithRandSquare(t *testing.T, n int) (share.Getter, *share.Root) {
 	return getter, availability_test.RandFillBS(t, n, bServ)
 }

+type Recoverability int64


[discussion]

Dropping the alternative naming, I have in my mind. There is no actionable here.
DataWithholding(FullyWitheld, PartialyWithheld, UnWithheld)

Agree with this

Wondertan · 2023-09-27T15:40:28Z

share/availability/full/reconstruction_test.go

-		net.Connect(lights2[i].ID(), full2.ID())
-		net.Disconnect(lights2[i].ID(), full1.ID())
-	}
+			// start connection lights with sources


Suggested change

// start connection lights with sources

// connect lights with sources

Wondertan · 2023-09-27T15:45:06Z

share/availability/full/reconstruction_test.go

+			sampleAmount:    20,
+			recoverability:  BarelyRecoverable,
+			expectedFailure: true,
+		},


Why do we omit unrecoverable case here?

Wondertan · 2023-09-27T15:57:10Z

share/availability/full/reconstruction_test.go

+		// NOTE: This test contains cases for barely recoverable but
+		// DisconnectedFullNodes does not.  The reasoning for this is that
+		// DisconnectedFullNodes has the additional contstraint that the data
+		// should not be reconstructable from a single subnetwork, while this
+		// test only tests that the data is reconstructable once the subnetworks
+		// are connected.


Suggested change

// NOTE: This test contains cases for barely recoverable but

// DisconnectedFullNodes does not. The reasoning for this is that

// DisconnectedFullNodes has the additional contstraint that the data

// should not be reconstructable from a single subnetwork, while this

// test only tests that the data is reconstructable once the subnetworks

// are connected.

// NOTE: This test contains cases for barely recoverable but

// DisconnectedFullNodes does not. The reasoning for this is that

// DisconnectedFullNodes has the additional constraint that the data

// should not be reconstructible from a single subnetwork, while this

// test only tests that the data is reconstructible once the subnetworks

// are connected.

As per Grammarly

I need help understanding the reasoning here. Why DisconnectedFullNodes would not have a BarelyRecoverable case? BarelyReconstructible is the worst-case success scenario that, IMO, we should test. Two subnetworks sample the available minimum and the fulls collectively reconstruct.

Please refer to my description again to see - we also discussed this internally in slack. In order to have 99% confidence with BarelyRecoverable in DisconnectedFullNodes, you cannot use 2 subnetworks, as each subnetwork has a too high probability of the data being reconstructible. This is fine, as it is exactly what ConnectedFullNodes intends to test - that the data is available once connecting the full nodes, WITHOUT the assumption that it is unrecoverable from the independent subnetworks.

Sorry if this is being repeated. I still would like to get to the bottom of this.

In order to have 99% confidence with BarelyRecoverable in DisconnectedFullNodes, you cannot use 2 subnetworks, as each subnetwork has a too high probability of the data being reconstructible

Isn't that a sign that 230 is too big of a number if the 115 subnetwork has a very high chance of reconstructing? In other words, is there a number that would give 99% for subnetwork, but ~0% for the whole network? Is there a way to modify the script to support that? Note that my intuition for probabilities is not good.

walldiss · 2023-09-21T08:59:43Z

share/availability/full/reconstruction_test.go

+		{
+			name:            "fully recoverable",
+			origSquareSize:  16,
+			lightNodes:      24, // ~99% chance of recoverability


So those tests are flaky by nature, which is not good for CI. I did quick math with current suite and it seems those combined have 3% chance to fail. 0.990.99(1-0.007)*(1-0.003) = 0.9703

We should do something about it, but also I think it could be resolved by separate issue. One way to deal with it would be to separate all flaky test into separate CI runner, that is not blocking merge.

Good point, and we can adjust the LN count to be less at the threshold to decrease flakiness further

walldiss · 2023-09-29T05:18:55Z

share/availability/full/reconstruction_test.go

+		{
+			name:            "barely recoverable",
+			origSquareSize:  16,
+			lightNodes:      230, // 99% chance of recoverability


Main loop flow in montle carlo script looks familiar 🙂
If ds.sample(s) would act as noop (for example for share that is withheld), it would mean that given share is not sampled, but you count it as sampled here:

for len(uniqueSamples) < shares { s := randomSample(k) uniqueSamples[s]++ ds.sample(s) }

Could you please clarify, why you count it as sampled, when it is not? Looks like it could lead to less than 20 samples being assigned to each light node and make required amount of light nodes higher that it needs to be.
I have tried to modify the code to be as following, to ensure 20 samples per every light node:

for len(uniqueSamples) < shares { s := randomSample(k) if ds.sample(s) { uniqueSamples[s]++ } }

On adjusted calculation algo, probability graph shows, that there is decent chance to do reconstruction for withholding case on amount of light node lower than 230:

walldiss · 2025-05-05T12:40:11Z

Closing because reconstruction tests will be fully reimplemented

feat(full/availability): adding different withholding options to test…

643c25f

…s and extending test cases

distractedm1nd added area:ipld IPLD plugin kind:testing Related to unit tests labels Sep 11, 2023

distractedm1nd self-assigned this Sep 11, 2023

distractedm1nd requested review from renaynay, Wondertan, vgonkivs and walldiss as code owners September 11, 2023 16:00

Merge branch 'main' into withholding-attacks

702add3

ramin reviewed Sep 12, 2023

View reviewed changes

share/availability/test/withholding_adder.go Show resolved Hide resolved

ramin self-requested a review September 13, 2023 09:14

ramin previously approved these changes Sep 13, 2023

View reviewed changes

walldiss reviewed Sep 14, 2023

View reviewed changes

Merge branch 'main' into withholding-attacks

416d090

prog

e643548

distractedm1nd dismissed ramin’s stale review via e643548 September 20, 2023 07:07

distractedm1nd and others added 3 commits September 20, 2023 09:32

fix: making DisconnectedFullNodes require both FNs to fail reconstruc…

652c018

…tion on first attempt

fix: hanging waitgroup

5bf12c5

Merge branch 'main' into withholding-attacks

2e3aff4

renaynay approved these changes Sep 25, 2023

View reviewed changes

Wondertan reviewed Sep 27, 2023

View reviewed changes

walldiss reviewed Sep 29, 2023

View reviewed changes

Wondertan mentioned this pull request Sep 29, 2023

full/availability: Vet parameters in reconstruction tests #2779

Open

walldiss mentioned this pull request Sep 29, 2023

light/availability: reevaluate light availability success condition #2780

Closed

ramin self-requested a review January 2, 2024 12:14

walldiss closed this May 5, 2025

	// start connection lights with sources
	// connect lights with sources

feat(full/availability): adding different withholding options to tests #2697

feat(full/availability): adding different withholding options to tests #2697

Uh oh!

Conversation

distractedm1nd commented Sep 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ramin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov-commenter commented Sep 18, 2023

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Wondertan Sep 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Wondertan Sep 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

walldiss Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

distractedm1nd Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

distractedm1nd commented Sep 11, 2023 •

edited

Loading

Wondertan Sep 27, 2023 •

edited

Loading

Wondertan Sep 27, 2023 •

edited

Loading

walldiss Sep 29, 2023 •

edited

Loading

distractedm1nd Sep 29, 2023 •

edited

Loading

Wondertan Sep 27, 2023 •

edited

Loading

walldiss Sep 29, 2023 •

edited

Loading