Skip to content

Conversation

@rach-id
Copy link
Member

@rach-id rach-id commented Oct 16, 2025

Closes: #2573

rootulp and others added 8 commits October 14, 2025 15:20
And remove an unnecessary mutex lock / unlock in Size.

This is a prerequisite to #2567
* add VaidateBasic to BitArray to ensure Bits and len(Elems) are valid

* call ValidateBasic on BitArrays when receiving as a msg from exteranl nodes

* enfore SetIndex is not setting out of bounds

* add guard to getNumTrueIndices

getNumTrueIndices will index out of bounds if Bits and Elems have a
mismatch where len(elems) != (bits+63)/64, this guard makes it simply
return 0 if this mismatch is present

* changelog

* fix missing import for v0.38.x

* update changelog for release of v0.38.19

* remove duplicate bug fixes from unreleased

* fix changelog date

* fix lint

* fix expected error string in test
@rach-id rach-id self-assigned this Oct 16, 2025
@rach-id rach-id marked this pull request as draft October 16, 2025 06:57
@rach-id rach-id marked this pull request as ready for review October 16, 2025 07:48
@rach-id rach-id merged commit f38454d into main Oct 16, 2025
21 checks passed
@rach-id rach-id deleted the rachid/bitarray-vuln-fix branch October 16, 2025 11:40
}

func (bA *BitArray) getNumTrueIndices() int {
if bA == nil || bA.Bits == 0 || len(bA.Elems) == 0 || len(bA.Elems) != numElements(bA.Bits) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh so the solution to avoiding the deadlock was to modify this line to avoid calling Size()

Ref: cometbft/cometbft@be5677c#diff-204903b7f426357ddfc87a0d0a8d49b3ec22f4a48a8bb3a7ce44f821d5ae9d96R285

@odeke-em
Copy link
Contributor

odeke-em commented Oct 17, 2025

@rach-id @rootulp @liamsi this test can serve as a regression to prevent future failures. I had spent some 6+ hours crafting things up to submit a bug bounty entry but HackenProof requires some certain KYC of which I am not in the right place to get done so I missed out on this for bug bounty since it also became public, but here is some code that you can adapt inside consensus/vuln_test.go

package consensus

import (
	"context"
	"errors"
	"fmt"
	"testing"

	"github.com/cometbft/cometbft/libs/bits"
	"github.com/cometbft/cometbft/libs/log"
	"github.com/cometbft/cometbft/p2p"
	p2pmock "github.com/cometbft/cometbft/p2p/mock"
	cmtcons "github.com/cometbft/cometbft/proto/tendermint/consensus"
	protobits "github.com/cometbft/cometbft/proto/tendermint/libs/bits"
	cmtypes "github.com/cometbft/cometbft/proto/tendermint/types"
	"github.com/cometbft/cometbft/types"
	"github.com/stretchr/testify/assert"
)

// This test is a minimalized vulnerability check against malformed VoteSetBitsMessage
// being injected by malicious peers who can gossip different variations
// to different peers and then cause consensus failures all around instead
// of making a single node panic.
func TestReactorVulnerabilityOnMalformedBitArraysInNewValidBlockMessage(t *testing.T) {
	// 0. Setup the reactor and for the simplest case, just 1 peer.
	N := 1
	css, cleanup := randConsensusNet(t, N, "consensus_reactor_test", newMockTickerFunc(true), newKVStore)
	defer cleanup()
	reactors, _, eventBuses := startConsensusNet(t, css, N)
	logger := log.TestingLogger()
	defer stopConsensusNet(logger, reactors, eventBuses)
	reactor := reactors[0]
	maliciousPeer := p2pmock.NewPeer(nil)
	reactor.InitPeer(maliciousPeer)
	reactor.AddPeer(maliciousPeer)

	// 1. Attain the events channel to assert against disconnection events
	// ideally when the malicious message is caught, when this bug is fixed.
	ctx := context.Background()
	eventBus := eventBuses[0]
	subscription, err := eventBus.Subscribe(ctx, testSubscriber, types.EventQueryValidBlock)
	assert.Nil(t, err)

	// 2. Craft the malicious message built from inconsistent state and that
	// spread it all around to the network with different variations
	// and if the network's votes are inconsistent they cannot come to
	// consensus and hence a halt.
	nBlockParts := 10
	maliciousBitArray := bits.NewBitArray(nBlockParts)
	// 2.5. Intentionally cause .Elems to be nil and hence an obvious
	// invalid state whereby the bitArray's reported bits and elements
	// do not match.
	maliciousBitArray.Elems = nil
	ps, ok := maliciousPeer.Get(types.PeerStateKey).(*PeerState)
	assert.True(t, ok)
	_ = ps
	msg := &cmtcons.NewValidBlock{
		Height: ps.PRS.Height,
		Round:  1,
		BlockParts: &protobits.BitArray{
			Elems: maliciousBitArray.Elems,
			Bits:  int64(maliciousBitArray.Bits),
		},
		IsCommit: true,
		BlockPartSetHeader: cmtypes.PartSetHeader{
			Total: uint32(nBlockParts),
		},
	}

	// 3. Inject into the reactor the malicious message from the malicious peer.
	reactor.Receive(p2p.Envelope{
		ChannelID: StateChannel,
		Src:       maliciousPeer,
		Message:   msg,
	})

	// 4. When fixed and no longer vulnerabile, we must expect that the malicious peer was
	// disconnected due to an invalid message sent in to the reactor and assert on it.
	// TODO: Retrofit this code for proper event subscription checks as it has diverged quite a lot.
	/*
		select {
		case event := <-subscription.Out():
			// 4.5. We MUST have a PeerDisconnectedEvent with the error
			// that the peer sent us a malformed message.
			gotEvent := event.Data() // .(p2pevents.PeerDisconnectedEvent)
			// 4.6. Assert that it was the same malicious peer who just got disconnected by ID
			// assert.Equal(t, gotEvent.PeerID, maliciousPeer.ID(), "same peerID")
			// 4.7. Assert that it was the same malicious peer who just got disconnected by IP address.
			// assert.Equal(t, gotEvent.Address.(*net.TCPAddr).IP, maliciousPeer.RemoteIP(), "same address")
			// 4.8. Finally assert that we caught the invalid BitArray.
			fmt.Printf("gotEvent: %+v\n", gotEvent)
			_ = errors.New
			// gotErr := errors.Unwrap(gotEvent.Reason)
			// assert.Contains(t, gotErr.Error(), "mismatch between specified number of bits")

		default:
			t.Fatal("Vulnerable! The malformed message was accepted yet should have been disconnected by the reactor.")
		}
	*/

	// 5. We expect that the caller for DumpConsensusState won't panic inside the rpc/consensus.go code.
	peerStateJSON, err := ps.MarshalJSON()
	assert.Nil(t, err, "No error expected")
	_ = peerStateJSON
}

and really you should be checking to affirm that the respective 3 message types with BitArrays don't accept malformed bit arrays and reject them then confirm with the events subscription.

@rach-id
Copy link
Member Author

rach-id commented Oct 17, 2025

Hello @odeke-em,
Thanks for the test. However, I think the simple tests we added in bit array tests are enough to catch this issue. Adding such a complicated test, which might even be flaky, will only complicate maintainer's work.

If you think that the tests we added are not enough, and this one is catching an extra case, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix invalid bit array handling on main

6 participants