-
Notifications
You must be signed in to change notification settings - Fork 59
Enhancement Proposal: Distributed Management, Operational Resilience #1060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,104 @@ | ||
| # Liquid Metal Governance Enhancement Proposal | ||
|
|
||
| ## Enhancement Proposal: Flintlock Distributed Management and Operational Improvements | ||
|
|
||
| **Proposal by:** Microscaler | ||
|
|
||
| --- | ||
|
|
||
| ## Table of Contents | ||
|
|
||
| 1. [Executive Summary](#executive-summary) | ||
| 2. [Motivation and Background](#motivation-and-background) | ||
| 3. [Microscaler's Requirements and Objectives](#microscalers-requirements-and-objectives) | ||
| 4. [Detailed Technical Proposals](#detailed-technical-proposals) | ||
| 5. [Technical and Community Value](#technical-and-community-value) | ||
| 6. [Benefits to the Broader Community](#benefits-to-the-broader-community) | ||
| 7. [Implementation Plan and Collaboration](#implementation-plan-and-collaboration) | ||
| 8. [Governance and Community Engagement](#governance-and-community-engagement) | ||
| 9. [References and Supporting Documentation](#references-and-supporting-documentation) | ||
|
|
||
| --- | ||
|
|
||
| ## Executive Summary | ||
|
|
||
| Microscaler proposes comprehensive enhancements to Flintlock, significantly advancing distributed management capabilities, operational resilience, and scalability. Key benefits include improved reliability, reduced downtime, enhanced security, and simplified operational practices. Microscaler seeks active collaboration from the Liquid Metal governance board and the broader community to realise these improvements promptly and effectively. | ||
|
|
||
| --- | ||
|
|
||
| ## Motivation and Background | ||
|
|
||
| Flintlock currently faces scalability, robustness, and operational clarity limitations that hinder effective deployment at scale. Real-world scenarios, such as managing large-scale distributed deployments and recovering from critical host failures, underscore the importance of these enhancements. Addressing these areas ensures Flintlock can effectively support complex, real-world operational needs. | ||
|
|
||
| --- | ||
|
|
||
| ## Microscaler's Requirements and Objectives | ||
|
|
||
| Microscaler seeks these enhancements to efficiently manage distributed workloads at scale, minimise operational risks, enhance system resilience, and reduce administrative overhead. Clear objectives include reduced downtime, improved resource utilisation, and robust recovery mechanisms, benefiting both Microscaler and the broader community. | ||
|
|
||
| --- | ||
|
|
||
| ## Detailed Technical Proposal | ||
|
|
||
| Please follow the links to detailed documents: | ||
|
|
||
| | Enhancement | Description | Link | | ||
| |----------------------------------------------------| --- | --- | | ||
| | Raft Consensus Integration | Integrate Raft for distributed consensus and reliability, ensuring consistent state replication and leader election across nodes. | [Details](./docs/01-Raft_Consensus_Integration.md) | | ||
| | Distributed Scheduling and Bidding Mechanism | Enable distributed workload scheduling and resource bidding, allowing dynamic allocation of resources based on demand and availability. | [Details](./docs/02-Distributed_Scheduling_and_Bidding_Mechanism.md) | | ||
| | Unified API Interface and Proxy Routing | Provide a unified API and proxy routing for seamless operations, simplifying client interactions and enabling transparent request forwarding. | [Details](./docs/03-Unified_API_Interface_and_Proxy_Routing.md) | | ||
| | Host Failure Handling | Improve detection and recovery from host failures, including automated failover and state reconciliation to minimize service disruption. | [Details](./docs/04-Host_Failure_Handling.md) | | ||
| | Detached Host Garbage Collection | Automate cleanup of detached or orphaned hosts, reclaiming resources and maintaining cluster hygiene without manual intervention. | [Details](./docs/05-Detached_Host_Garbage_Collection.md) | | ||
| | Host Regenesis and PXE-Based Provisioning | Support host re-provisioning using PXE, enabling rapid recovery and scaling by automating bare-metal host setup and configuration. | [Details](./docs/06-Host_Regenesis_and_PXE_Provisioning.md) | | ||
| | VM Regeneration, Persistence and Recovery | Ensure VM state can be persisted and recovered, allowing restoration of workloads after failures or migrations with minimal data loss. | [Details](./docs/07-VM_State_Persistence_and_Recovery.md) | | ||
| | Network Partition Handling & Split-Brain Scenarios | Address network partitions and split-brain issues, implementing safeguards to maintain data consistency and prevent conflicting operations. | [Details](./docs/08-Network_Partition_Handling_and_Split-Brain_Scenarios.md) | | ||
| | Leader Scheduling Bottleneck | Mitigate leader scheduling bottlenecks by distributing scheduling responsibilities and optimizing leader election processes for scalability. | [Details](./docs/09-Leader_Scheduling_Bottleneck.md) | | ||
| | Host Rejoining and State Reconciliation | Enable hosts to rejoin and reconcile state, ensuring that returning nodes synchronize with the cluster and recover their workloads safely. | [Details](./docs/10-Host_Rejoining_and_State_Reconciliation.md) | | ||
| | Raft Log Scalability and Snapshotting | Improve Raft log scalability and add snapshotting, reducing storage overhead and speeding up recovery by periodically compacting logs. | [Details](./docs/11-Raft_Log_Scalability_and_Snapshotting.md) | | ||
| | Security and Authorization | Enhance security and authorization mechanisms, introducing fine-grained access controls and robust authentication for all operations. | [Details](./docs/12-Security_and_Authorization.md) | | ||
| | Garbage Collection Policy | Define and enforce garbage collection policies, specifying criteria and schedules for resource cleanup to optimize system performance. | [Details](./docs/13-Garbage_Collection_Policy.md) | | ||
| | Observability, Metrics, and Tracing | Add observability, metrics, and tracing support, enabling real-time monitoring, troubleshooting, and performance analysis of distributed components. | [Details](./docs/14-Observability_Metrics_and_Tracing.md) | | ||
| | Graceful VM Migration Support | Support graceful migration of VMs, allowing live or planned movement of workloads between hosts with minimal downtime and service impact. | [Details](./docs/15-Graceful_VM_Migration_Support.md) | | ||
| | Configuration and Operational Clarity | Improve configuration and operational transparency, providing clear documentation, validation, and tooling for easier management and troubleshooting. | [Details](./docs/16-Configuration_and_Operational_Clarity.md) | | ||
| --- | ||
|
|
||
| ## Technical and Community Value | ||
|
|
||
| These enhancements address key technical challenges facing Flintlock, providing scalable, resilient, secure, and operationally efficient solutions. By collectively addressing these gaps, the community can significantly accelerate the adoption of Flintlock, ensuring its long-term innovation and viability. | ||
|
|
||
| --- | ||
|
|
||
| ## Benefits to the Broader Community | ||
|
|
||
| * **Quantified Operational Benefits:** | ||
|
|
||
| * Potential reduction in downtime by up to 50%. | ||
| * Operational cost savings through reduced administrative overhead. | ||
| * Increased scalability supporting deployments exceeding current capacities. | ||
| * **Community Growth Opportunities:** | ||
|
|
||
| * Google Summer of Code mentorships, attracting new contributors. | ||
| * Enhanced onboarding, facilitating community adoption. | ||
| * Robust knowledge-sharing and innovation opportunities across community members. | ||
|
|
||
| --- | ||
|
|
||
| ## Implementation Plan and Collaboration | ||
|
|
||
| The Liquid Metal governance team will provide leadership and oversight, with Microscaler actively contributing through development, mentorship, and collaboration. Microscaler commits to supporting the governance team's vision and aligning contributions to benefit the broader community's collective goals. Community contributors will be engaged through incentivised programs, mentorship opportunities, and clear pathways for participation. | ||
|
|
||
| --- | ||
|
|
||
| ## Governance and Community Engagement | ||
|
|
||
| Microscaler commits to transparent and open engagement with the governance board and community stakeholders. This includes regular communication, transparent decision-making processes, clear conflict resolution mechanisms, and continuous integration of community feedback, ensuring alignment with community values and project goals. | ||
|
|
||
| --- | ||
|
|
||
| ## References and Supporting Documentation | ||
|
|
||
| Comprehensive technical documentation is available through linked documents, supporting detailed review and validation by the governance board and community. | ||
|
|
||
| --- | ||
|
|
||
| Microscaler respectfully encourages the Liquid Metal governance board to adopt this proposal, fostering active community collaboration to enhance Flintlock for mutual benefit. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| Charles Sibbald<br> | ||
| Founder & Lead Engineer, Microscaler<br> | ||
| Lead Engineer, Tinkerbell Project<br> | ||
| 26 June 2025 | ||
|
|
||
| Liquid Metal Governance Board<br> | ||
| [Address or Contact Information] | ||
|
|
||
| Dear Members of the Liquid Metal Governance Board, | ||
|
|
||
| I am pleased to submit this Enhancement Proposal on behalf of Microscaler for your consideration. | ||
| At Microscaler, we are committed to advancing open-source technologies that empower scalable and robust | ||
| distributed infrastructure. | ||
| As the lead engineer for the Tinkerbell Project and founder of Microscaler, I firmly believe in | ||
| collaborative innovation that delivers substantial and lasting benefits to our community. | ||
|
|
||
| The attached proposal outlines critical enhancements to Flintlock, focusing on comprehensive improvements | ||
| to distributed management, operational resilience, and scalability. If implemented successfully, these | ||
| enhancements position Flintlock as a leading system in distributed infrastructure management, significantly | ||
| ahead of contemporary solutions in robustness, operational clarity, and community-driven innovation. | ||
|
|
||
| In preparing this proposal, we consciously chose not to fork Flintlock or fragment its potential user base. | ||
|
|
||
| Instead, we decided to collaborate with the existing community actively, enhancing Flintlock’s capabilities to | ||
| create a more compelling, unified solution. This collaborative approach not only preserves but actively | ||
| grows Flintlock's user base, fostering broader adoption and deeper community engagement. | ||
|
|
||
| We have structured the proposal with an inclusive vision, inviting collaboration and contribution from the | ||
| broader community. This strategic partnership will ensure the enhancements meet diverse operational | ||
| requirements while maintaining alignment with our shared goals. | ||
|
|
||
| Your support and leadership will be instrumental in realising the full potential of Flintlock. Together, we | ||
| can set new standards in operational excellence and community collaboration, establishing Flintlock | ||
| as the benchmark in its domain. | ||
|
|
||
| Thank you for considering this forward-thinking proposal. I look forward to your feedback and to | ||
| continued collaboration | ||
| that advances our collective vision. | ||
|
|
||
| Sincerely, | ||
|
|
||
| Charles Sibbald | ||
| Founder, Microscaler (Tinkerbell Project) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| ## Raft Consensus Integration | ||
|
|
||
| ### Gap Definition and Improvement Objectives | ||
|
|
||
| Currently, Flintlock operates with isolated state per host, lacking a unified, cluster-wide coordination mechanism. Integrating Raft consensus addresses this gap by ensuring reliable leader election, consistent log replication, and synchronized state management across the cluster. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Raft could be used on the layer that does the distributed scheduling, but not in flintlock itself. One option would be to just use etcd for storege and leader election. Might be easier than using a Raft package (like https://github.com/etcd-io/raft) directly. |
||
|
|
||
| **Objectives:** | ||
|
|
||
| * Reliable leader election to ensure continuity | ||
| * Consistent log replication across hosts | ||
| * Robust global state synchronization for VM management | ||
|
|
||
| ### Technical Implementation and Detailed Architecture | ||
|
|
||
| * **Raft Library:** Leverage a well-established Raft implementation such as HashiCorp Raft or etcd Raft. | ||
| * **Leader Election:** Implement leader election protocols ensuring rapid detection of failures and quick election of a new leader. | ||
| * **Log Replication:** Define structured logs capturing critical VM lifecycle events (creation, updates, deletion). | ||
| * **Cluster-wide State Machine:** Develop a state machine that consistently applies VM lifecycle operations from replicated logs. | ||
|
|
||
| ### Trade-offs and Risks | ||
|
|
||
| * **Complexity:** Increased system complexity balanced by significant reliability improvements. | ||
| * **Performance Overhead:** Slight overhead from log replication and consensus coordination, which must be monitored and optimized. | ||
|
|
||
| ### Operational Impacts and User Considerations | ||
|
|
||
| * **Transparency:** The integration should remain transparent to end-users, requiring no changes in current workflows. | ||
| * **Reliability:** Improved operational reliability and simplified management for system operators. | ||
|
|
||
| ### Validation and Testing Strategies | ||
|
|
||
| * **Leader Election Tests:** Comprehensive tests to validate rapid leader election and failover. | ||
| * **Log Replication Tests:** Validate accuracy and performance of log replication across nodes. | ||
| * **State Consistency Tests:** Continuously ensure the cluster maintains a consistent view of the global state. | ||
|
|
||
| ### Visualizations and Diagrams | ||
|
|
||
| * **High-Level Design (HLD) Diagram:** Clearly illustrates the integration of Raft within Flintlock. | ||
| * **Sequence Diagram:** Demonstrates the leader election, log replication, and state synchronization processes clearly. | ||
|
|
||
| ### Summary for Enhancement Proposal | ||
|
|
||
| Integrating Raft consensus into Flintlock significantly enhances cluster reliability, consistency, and operational resilience. This structured approach ensures minimal operational overhead while providing robust coordination capabilities, preparing Flintlock for highly available, distributed deployments. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| ## Distributed Scheduling and Bidding Mechanism | ||
|
|
||
| ### Gap Definition and Improvement Objectives | ||
|
|
||
| Flintlock currently lacks a distributed scheduling system, relying instead on manual workload allocation per host. Implementing a distributed scheduling mechanism using a bidding process will ensure balanced resource utilization and improved VM provisioning speed. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was never going to be part of Flintlock itself, but a part of the wider Liquid Metal. A reverse bidding process would make the scheduling simpler. |
||
|
|
||
| **Objectives:** | ||
|
|
||
| * Balanced resource allocation across all hosts | ||
| * Reduced VM boot latency | ||
| * Automated and transparent workload distribution | ||
|
|
||
| ### Technical Implementation and Detailed Architecture | ||
|
|
||
| * **Resource Broadcasting:** Hosts periodically broadcast current resource metrics (CPU, memory, VM count). | ||
| * **Leader Coordination:** Leader initiates VM scheduling by broadcasting requests to hosts. | ||
| * **Bid Calculation:** Hosts compute utilization scores based on available resources and VM requirements, responding with bids. | ||
| * **Scheduling Decision:** Leader selects the host with the lowest utilization score (best bid), updating the global state through consensus. | ||
|
|
||
| ### Trade-offs and Risks | ||
|
|
||
| * **Complexity:** Additional complexity due to broadcast and bidding logic. | ||
| * **Latency:** Slight communication overhead for bid requests and responses. | ||
|
|
||
| ### Operational Impacts and User Considerations | ||
|
|
||
| * **Transparency:** Users experience automated and balanced VM scheduling without manual intervention. | ||
| * **Operational Simplicity:** Reduced administrative overhead and improved cluster scalability. | ||
|
|
||
| ### Validation and Testing Strategies | ||
|
|
||
| * **Bid Accuracy Tests:** Ensure host bids accurately reflect resource availability. | ||
| * **Scheduling Fairness Tests:** Verify balanced workload distribution across hosts. | ||
| * **Performance Benchmarks:** Assess scheduling latency and efficiency under various loads. | ||
|
|
||
| ### Visualizations and Diagrams | ||
|
|
||
| * **High-Level Design (HLD) Diagram:** | ||
|
|
||
| ```mermaid | ||
| graph TD | ||
| Leader["Raft Leader"] | ||
| Host1["Flintlock Host 1"] | ||
| Host2["Flintlock Host 2"] | ||
| HostN["Flintlock Host N"] | ||
|
|
||
| Leader -->|Broadcast VM Request| Host1 | ||
| Leader -->|Broadcast VM Request| Host2 | ||
| Leader -->|Broadcast VM Request| HostN | ||
|
|
||
| Host1 -->|Bid Response| Leader | ||
| Host2 -->|Bid Response| Leader | ||
| HostN -->|Bid Response| Leader | ||
|
|
||
| Leader -->|Scheduling Decision| Host1 | ||
| Leader -->|State Update via Raft| Host2 | ||
| Leader -->|State Update via Raft| HostN | ||
| ``` | ||
|
|
||
| * **Sequence Diagram:** | ||
|
|
||
| ```mermaid | ||
| sequenceDiagram | ||
| participant Leader | ||
| participant Host1 | ||
| participant Host2 | ||
|
|
||
| Leader->>Host1: Broadcast VM scheduling request | ||
| Leader->>Host2: Broadcast VM scheduling request | ||
| Host1->>Leader: Bid (utilization score) | ||
| Host2->>Leader: Bid (utilization score) | ||
| Leader->>Leader: Evaluate best bid | ||
| Leader->>Host1: Scheduling decision | ||
| Leader->>Host2: State update via Raft | ||
| ``` | ||
|
|
||
| ### Summary for Enhancement Proposal | ||
|
|
||
| Introducing a distributed scheduling and bidding mechanism significantly enhances Flintlock's ability to evenly distribute workloads and minimize VM provisioning times. This approach improves cluster resource utilization efficiency and operational transparency, setting the foundation for robust scalability and responsiveness. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| ## Unified API Interface and Proxy Routing | ||
|
|
||
| ### Gap Definition and Improvement Objectives | ||
|
|
||
| Currently, Flintlock APIs are isolated per host, causing inconsistent and fragmented state queries. Introducing a unified API interface with proxy routing will resolve these inconsistencies and enable accurate VM state reporting from the authoritative host. | ||
|
|
||
| **Objectives:** | ||
|
|
||
| * Preserve compatibility with existing Flintlock APIs | ||
| * Provide enhanced global state querying through new `/api/v2` endpoints | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We wouldn't need a v2 if the changes were made outside of flintlock. |
||
| * Implement proxy routing for authoritative, real-time VM status | ||
|
|
||
| ### Technical Implementation and Detailed Architecture | ||
|
|
||
| * **API Versioning:** Retain current `/api/v1` APIs for backward compatibility, adding new `/api/v2` endpoints. | ||
| * **Global State Registry:** Maintain minimal global VM metadata (host location, VM ID). | ||
| * **Proxy Routing:** Route detailed state queries to the actual host running the VM, providing accurate real-time metrics. | ||
|
|
||
| ### Trade-offs and Risks | ||
|
|
||
| * **Latency:** Slight latency increase from proxy routing requests to authoritative hosts. | ||
| * **Complexity:** Increased API routing logic to handle proxy queries. | ||
|
|
||
| ### Operational Impacts and User Considerations | ||
|
|
||
| * **Transparency:** Users experience transparent and accurate VM state querying without changing existing workflows. | ||
| * **Improved Observability:** Enhanced visibility into VM states and metrics. | ||
|
|
||
| ### Validation and Testing Strategies | ||
|
|
||
| * **API Compatibility Tests:** Ensure backward compatibility with existing endpoints. | ||
| * **Proxy Routing Accuracy Tests:** Verify accuracy and responsiveness of proxy-routed queries. | ||
| * **Real-time Metrics Validation:** Continuous validation of real-time metrics accuracy. | ||
|
|
||
| ### Visualizations and Diagrams | ||
|
|
||
| * **High-Level Design (HLD) Diagram:** | ||
|
|
||
| ```mermaid | ||
| graph TD | ||
| Client["API Client"] | ||
| HostA["Flintlock Host A (Leader)"] | ||
| HostB["Flintlock Host B"] | ||
| HostC["Flintlock Host C"] | ||
|
|
||
| Client -->|Query VM Status| HostA | ||
| HostA -->|Lookup VM location| HostB | ||
| HostB -->|Real-time VM status| HostA | ||
| HostA -->|Response| Client | ||
| HostA --- HostC | ||
| HostB --- HostC | ||
| ``` | ||
|
|
||
| * **Sequence Diagram:** | ||
|
|
||
| ```mermaid | ||
| sequenceDiagram | ||
| actor Client | ||
| participant HostA | ||
| participant HostB | ||
| participant HostC | ||
|
|
||
| Client->>HostA: GET /api/v2/vm/{vm_id}/status | ||
| HostA->>HostA: Lookup VM location | ||
| HostA->>HostB: Proxy GET /api/v2/vm/{vm_id}/status | ||
| HostB->>HostA: Real-time VM metrics | ||
| HostA->>Client: Forward VM status | ||
| ``` | ||
|
|
||
| ### Summary for Enhancement Proposal | ||
|
|
||
| Implementing a unified API interface with proxy routing significantly improves the consistency and accuracy of VM state queries in Flintlock. This enhancement provides transparent compatibility, real-time metrics accuracy, and enhanced operational visibility, preparing Flintlock for effective distributed operations. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flintlock was only ever designed for a single host. It was always envisioned that any distributed scheduling/clustering would be done at a layer above.
I'd be keen to maintain this separation and keep flintlock solely focused on interacting with microvms on a single machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The layer above is something i've thought of as being called brigade: https://github.com/liquidmetal-dev/brigade
The idea is that it would API compatible with flintlock, so that consumers, like CAPMVM, could switch to distributed scheduling across a number of flintlock hosts without any changes.