Skip to content

Conversation

@alexander-e1off
Copy link
Collaborator

Currently, in case of wrong partition config, brokers could abort if actual file size is greater configured value. The same problem could occur if some node has less configured value than other nodes use.
This PR adds the new feature to synchronize partition max file sizes (find the highest) similar way as highest sequence number is found.

  1. The format of partition file is enhanced by adding MaxFileSize field in file header, which holds current max file size;
  2. The following new config parameters are added into cluster config:
  • FileGrowLimit per each file type;
  • growStepPercent and minAvailSpacePercent;
  1. Partition FSM is enhanced with new RESIZE event;

Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
…luster in case of misconfig

Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
@alexander-e1off alexander-e1off changed the title WIP: Feat[mqbc, mqbs] FSM: synchronize partition max files at cluster start up Feat[mqbc, mqbs] FSM: synchronize partition max file sizes at cluster start up Dec 4, 2025
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Signed-off-by: Aleksandr Ivanov <[email protected]>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a feature to synchronize partition maximum file sizes across cluster nodes at startup to prevent broker aborts caused by misconfigured file size limits. The implementation follows a similar pattern to the existing highest sequence number synchronization mechanism.

Key Changes

  • File Format Enhancement: Added MaxFileSize field to partition file headers (journal, data, qlist) to persist the configured maximum file size alongside the file content
  • Configuration Extension: Introduced growth limit parameters (dataFileGrowLimit, journalFileGrowLimit, qlistFileGrowLimit) and rollover criteria (growStepPercent, minAvailSpacePercent) to control file size expansion during synchronization
  • FSM Extension: Added new RESIZE events and state transitions to the partition finite state machine to handle file size synchronization and resizing operations between primary and replica nodes

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated no comments.

Show a summary per file
File Description
mqbs_filestoreprotocol.h/cpp Enhanced FileHeader structure with d_maxFileSizeUpperBits and d_maxFileSizeLowerBits fields to store 64-bit max file size
mqbs_filestore.h/cpp Added partition max file size tracking, override mechanism, and integration with recovery logic to read/write file sizes from headers
mqbc_storagemanager.h/cpp Implemented max file size synchronization logic parallel to sequence number sync, including quorum checks and highest value selection
mqbc_recoverymanager.h/cpp Added recovery of max file sizes from partition file headers during startup
mqbc_partitionfsm.h Extended PartitionFSMEventData to carry partition max file sizes through FSM events
mqbc_partitionstatetable.h Added RESIZE-related events and updated state transition table to handle file size synchronization scenarios
bmqp_ctrlmsg.xsd Extended control messages (ReplicaStateRequest/Response, PrimaryStateRequest/Response, ReplicaDataRequest/Response) with PartitionMaxFileSizes field
mqbcfg.xsd Added new configuration parameters for file growth limits and rollover criteria to PartitionConfig
test_fsm_partition_sync.py Added comprehensive integration tests covering primary sync, replica sync, and combined sync scenarios
Configuration files Updated default cluster configurations with new growth limit parameters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Aleksandr Ivanov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants