Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pr_type.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@ jobs:
steps:
- uses: docker://index.docker.io/agilepathway/pull-request-label-checker:latest@sha256:50540ac95f572ef27f2181130edd273f9ed75304f602fb43a8dd7e8ebf65fcca # latest
with:
one_of: kind/bug,kind/documentation,kind/feature,kind/regression,kind/refactor,kind/cleanup,kind/chore
one_of: kind/bug,kind/documentation,kind/feature,kind/regression,kind/refactor,kind/cleanup,kind/chore,kind/proposal
repo_token: ${{ secrets.GITHUB_TOKEN }}
104 changes: 104 additions & 0 deletions docs/proposals/distributed_architecture/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Liquid Metal Governance Enhancement Proposal

## Enhancement Proposal: Flintlock Distributed Management and Operational Improvements

**Proposal by:** Microscaler

---

## Table of Contents

1. [Executive Summary](#executive-summary)
2. [Motivation and Background](#motivation-and-background)
3. [Microscaler's Requirements and Objectives](#microscalers-requirements-and-objectives)
4. [Detailed Technical Proposals](#detailed-technical-proposals)
5. [Technical and Community Value](#technical-and-community-value)
6. [Benefits to the Broader Community](#benefits-to-the-broader-community)
7. [Implementation Plan and Collaboration](#implementation-plan-and-collaboration)
8. [Governance and Community Engagement](#governance-and-community-engagement)
9. [References and Supporting Documentation](#references-and-supporting-documentation)

---

## Executive Summary

Microscaler proposes comprehensive enhancements to Flintlock, significantly advancing distributed management capabilities, operational resilience, and scalability. Key benefits include improved reliability, reduced downtime, enhanced security, and simplified operational practices. Microscaler seeks active collaboration from the Liquid Metal governance board and the broader community to realise these improvements promptly and effectively.

---

## Motivation and Background

Flintlock currently faces scalability, robustness, and operational clarity limitations that hinder effective deployment at scale. Real-world scenarios, such as managing large-scale distributed deployments and recovering from critical host failures, underscore the importance of these enhancements. Addressing these areas ensures Flintlock can effectively support complex, real-world operational needs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flintlock was only ever designed for a single host. It was always envisioned that any distributed scheduling/clustering would be done at a layer above.

I'd be keen to maintain this separation and keep flintlock solely focused on interacting with microvms on a single machine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The layer above is something i've thought of as being called brigade: https://github.com/liquidmetal-dev/brigade

The idea is that it would API compatible with flintlock, so that consumers, like CAPMVM, could switch to distributed scheduling across a number of flintlock hosts without any changes.


---

## Microscaler's Requirements and Objectives

Microscaler seeks these enhancements to efficiently manage distributed workloads at scale, minimise operational risks, enhance system resilience, and reduce administrative overhead. Clear objectives include reduced downtime, improved resource utilisation, and robust recovery mechanisms, benefiting both Microscaler and the broader community.

---

## Detailed Technical Proposal

Please follow the links to detailed documents:

| Enhancement | Description | Link |
|----------------------------------------------------| --- | --- |
| Raft Consensus Integration | Integrate Raft for distributed consensus and reliability, ensuring consistent state replication and leader election across nodes. | [Details](./docs/01-Raft_Consensus_Integration.md) |
| Distributed Scheduling and Bidding Mechanism | Enable distributed workload scheduling and resource bidding, allowing dynamic allocation of resources based on demand and availability. | [Details](./docs/02-Distributed_Scheduling_and_Bidding_Mechanism.md) |
| Unified API Interface and Proxy Routing | Provide a unified API and proxy routing for seamless operations, simplifying client interactions and enabling transparent request forwarding. | [Details](./docs/03-Unified_API_Interface_and_Proxy_Routing.md) |
| Host Failure Handling | Improve detection and recovery from host failures, including automated failover and state reconciliation to minimize service disruption. | [Details](./docs/04-Host_Failure_Handling.md) |
| Detached Host Garbage Collection | Automate cleanup of detached or orphaned hosts, reclaiming resources and maintaining cluster hygiene without manual intervention. | [Details](./docs/05-Detached_Host_Garbage_Collection.md) |
| Host Regenesis and PXE-Based Provisioning | Support host re-provisioning using PXE, enabling rapid recovery and scaling by automating bare-metal host setup and configuration. | [Details](./docs/06-Host_Regenesis_and_PXE_Provisioning.md) |
| VM Regeneration, Persistence and Recovery | Ensure VM state can be persisted and recovered, allowing restoration of workloads after failures or migrations with minimal data loss. | [Details](./docs/07-VM_State_Persistence_and_Recovery.md) |
| Network Partition Handling & Split-Brain Scenarios | Address network partitions and split-brain issues, implementing safeguards to maintain data consistency and prevent conflicting operations. | [Details](./docs/08-Network_Partition_Handling_and_Split-Brain_Scenarios.md) |
| Leader Scheduling Bottleneck | Mitigate leader scheduling bottlenecks by distributing scheduling responsibilities and optimizing leader election processes for scalability. | [Details](./docs/09-Leader_Scheduling_Bottleneck.md) |
| Host Rejoining and State Reconciliation | Enable hosts to rejoin and reconcile state, ensuring that returning nodes synchronize with the cluster and recover their workloads safely. | [Details](./docs/10-Host_Rejoining_and_State_Reconciliation.md) |
| Raft Log Scalability and Snapshotting | Improve Raft log scalability and add snapshotting, reducing storage overhead and speeding up recovery by periodically compacting logs. | [Details](./docs/11-Raft_Log_Scalability_and_Snapshotting.md) |
| Security and Authorization | Enhance security and authorization mechanisms, introducing fine-grained access controls and robust authentication for all operations. | [Details](./docs/12-Security_and_Authorization.md) |
| Garbage Collection Policy | Define and enforce garbage collection policies, specifying criteria and schedules for resource cleanup to optimize system performance. | [Details](./docs/13-Garbage_Collection_Policy.md) |
| Observability, Metrics, and Tracing | Add observability, metrics, and tracing support, enabling real-time monitoring, troubleshooting, and performance analysis of distributed components. | [Details](./docs/14-Observability_Metrics_and_Tracing.md) |
| Graceful VM Migration Support | Support graceful migration of VMs, allowing live or planned movement of workloads between hosts with minimal downtime and service impact. | [Details](./docs/15-Graceful_VM_Migration_Support.md) |
| Configuration and Operational Clarity | Improve configuration and operational transparency, providing clear documentation, validation, and tooling for easier management and troubleshooting. | [Details](./docs/16-Configuration_and_Operational_Clarity.md) |
---

## Technical and Community Value

These enhancements address key technical challenges facing Flintlock, providing scalable, resilient, secure, and operationally efficient solutions. By collectively addressing these gaps, the community can significantly accelerate the adoption of Flintlock, ensuring its long-term innovation and viability.

---

## Benefits to the Broader Community

* **Quantified Operational Benefits:**

* Potential reduction in downtime by up to 50%.
* Operational cost savings through reduced administrative overhead.
* Increased scalability supporting deployments exceeding current capacities.
* **Community Growth Opportunities:**

* Google Summer of Code mentorships, attracting new contributors.
* Enhanced onboarding, facilitating community adoption.
* Robust knowledge-sharing and innovation opportunities across community members.

---

## Implementation Plan and Collaboration

The Liquid Metal governance team will provide leadership and oversight, with Microscaler actively contributing through development, mentorship, and collaboration. Microscaler commits to supporting the governance team's vision and aligning contributions to benefit the broader community's collective goals. Community contributors will be engaged through incentivised programs, mentorship opportunities, and clear pathways for participation.

---

## Governance and Community Engagement

Microscaler commits to transparent and open engagement with the governance board and community stakeholders. This includes regular communication, transparent decision-making processes, clear conflict resolution mechanisms, and continuous integration of community feedback, ensuring alignment with community values and project goals.

---

## References and Supporting Documentation

Comprehensive technical documentation is available through linked documents, supporting detailed review and validation by the governance board and community.

---

Microscaler respectfully encourages the Liquid Metal governance board to adopt this proposal, fostering active community collaboration to enhance Flintlock for mutual benefit.
43 changes: 43 additions & 0 deletions docs/proposals/distributed_architecture/coverletter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
Charles Sibbald<br>
Founder & Lead Engineer, Microscaler<br>
Lead Engineer, Tinkerbell Project<br>
26 June 2025

Liquid Metal Governance Board<br>
[Address or Contact Information]

Dear Members of the Liquid Metal Governance Board,

I am pleased to submit this Enhancement Proposal on behalf of Microscaler for your consideration.
At Microscaler, we are committed to advancing open-source technologies that empower scalable and robust
distributed infrastructure.
As the lead engineer for the Tinkerbell Project and founder of Microscaler, I firmly believe in
collaborative innovation that delivers substantial and lasting benefits to our community.

The attached proposal outlines critical enhancements to Flintlock, focusing on comprehensive improvements
to distributed management, operational resilience, and scalability. If implemented successfully, these
enhancements position Flintlock as a leading system in distributed infrastructure management, significantly
ahead of contemporary solutions in robustness, operational clarity, and community-driven innovation.

In preparing this proposal, we consciously chose not to fork Flintlock or fragment its potential user base.

Instead, we decided to collaborate with the existing community actively, enhancing Flintlock’s capabilities to
create a more compelling, unified solution. This collaborative approach not only preserves but actively
grows Flintlock's user base, fostering broader adoption and deeper community engagement.

We have structured the proposal with an inclusive vision, inviting collaboration and contribution from the
broader community. This strategic partnership will ensure the enhancements meet diverse operational
requirements while maintaining alignment with our shared goals.

Your support and leadership will be instrumental in realising the full potential of Flintlock. Together, we
can set new standards in operational excellence and community collaboration, establishing Flintlock
as the benchmark in its domain.

Thank you for considering this forward-thinking proposal. I look forward to your feedback and to
continued collaboration
that advances our collective vision.

Sincerely,

Charles Sibbald
Founder, Microscaler (Tinkerbell Project)
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
## Raft Consensus Integration

### Gap Definition and Improvement Objectives

Currently, Flintlock operates with isolated state per host, lacking a unified, cluster-wide coordination mechanism. Integrating Raft consensus addresses this gap by ensuring reliable leader election, consistent log replication, and synchronized state management across the cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raft could be used on the layer that does the distributed scheduling, but not in flintlock itself.

One option would be to just use etcd for storege and leader election. Might be easier than using a Raft package (like https://github.com/etcd-io/raft) directly.


**Objectives:**

* Reliable leader election to ensure continuity
* Consistent log replication across hosts
* Robust global state synchronization for VM management

### Technical Implementation and Detailed Architecture

* **Raft Library:** Leverage a well-established Raft implementation such as HashiCorp Raft or etcd Raft.
* **Leader Election:** Implement leader election protocols ensuring rapid detection of failures and quick election of a new leader.
* **Log Replication:** Define structured logs capturing critical VM lifecycle events (creation, updates, deletion).
* **Cluster-wide State Machine:** Develop a state machine that consistently applies VM lifecycle operations from replicated logs.

### Trade-offs and Risks

* **Complexity:** Increased system complexity balanced by significant reliability improvements.
* **Performance Overhead:** Slight overhead from log replication and consensus coordination, which must be monitored and optimized.

### Operational Impacts and User Considerations

* **Transparency:** The integration should remain transparent to end-users, requiring no changes in current workflows.
* **Reliability:** Improved operational reliability and simplified management for system operators.

### Validation and Testing Strategies

* **Leader Election Tests:** Comprehensive tests to validate rapid leader election and failover.
* **Log Replication Tests:** Validate accuracy and performance of log replication across nodes.
* **State Consistency Tests:** Continuously ensure the cluster maintains a consistent view of the global state.

### Visualizations and Diagrams

* **High-Level Design (HLD) Diagram:** Clearly illustrates the integration of Raft within Flintlock.
* **Sequence Diagram:** Demonstrates the leader election, log replication, and state synchronization processes clearly.

### Summary for Enhancement Proposal

Integrating Raft consensus into Flintlock significantly enhances cluster reliability, consistency, and operational resilience. This structured approach ensures minimal operational overhead while providing robust coordination capabilities, preparing Flintlock for highly available, distributed deployments.
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
## Distributed Scheduling and Bidding Mechanism

### Gap Definition and Improvement Objectives

Flintlock currently lacks a distributed scheduling system, relying instead on manual workload allocation per host. Implementing a distributed scheduling mechanism using a bidding process will ensure balanced resource utilization and improved VM provisioning speed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was never going to be part of Flintlock itself, but a part of the wider Liquid Metal.

A reverse bidding process would make the scheduling simpler.


**Objectives:**

* Balanced resource allocation across all hosts
* Reduced VM boot latency
* Automated and transparent workload distribution

### Technical Implementation and Detailed Architecture

* **Resource Broadcasting:** Hosts periodically broadcast current resource metrics (CPU, memory, VM count).
* **Leader Coordination:** Leader initiates VM scheduling by broadcasting requests to hosts.
* **Bid Calculation:** Hosts compute utilization scores based on available resources and VM requirements, responding with bids.
* **Scheduling Decision:** Leader selects the host with the lowest utilization score (best bid), updating the global state through consensus.

### Trade-offs and Risks

* **Complexity:** Additional complexity due to broadcast and bidding logic.
* **Latency:** Slight communication overhead for bid requests and responses.

### Operational Impacts and User Considerations

* **Transparency:** Users experience automated and balanced VM scheduling without manual intervention.
* **Operational Simplicity:** Reduced administrative overhead and improved cluster scalability.

### Validation and Testing Strategies

* **Bid Accuracy Tests:** Ensure host bids accurately reflect resource availability.
* **Scheduling Fairness Tests:** Verify balanced workload distribution across hosts.
* **Performance Benchmarks:** Assess scheduling latency and efficiency under various loads.

### Visualizations and Diagrams

* **High-Level Design (HLD) Diagram:**

```mermaid
graph TD
Leader["Raft Leader"]
Host1["Flintlock Host 1"]
Host2["Flintlock Host 2"]
HostN["Flintlock Host N"]

Leader -->|Broadcast VM Request| Host1
Leader -->|Broadcast VM Request| Host2
Leader -->|Broadcast VM Request| HostN

Host1 -->|Bid Response| Leader
Host2 -->|Bid Response| Leader
HostN -->|Bid Response| Leader

Leader -->|Scheduling Decision| Host1
Leader -->|State Update via Raft| Host2
Leader -->|State Update via Raft| HostN
```

* **Sequence Diagram:**

```mermaid
sequenceDiagram
participant Leader
participant Host1
participant Host2

Leader->>Host1: Broadcast VM scheduling request
Leader->>Host2: Broadcast VM scheduling request
Host1->>Leader: Bid (utilization score)
Host2->>Leader: Bid (utilization score)
Leader->>Leader: Evaluate best bid
Leader->>Host1: Scheduling decision
Leader->>Host2: State update via Raft
```

### Summary for Enhancement Proposal

Introducing a distributed scheduling and bidding mechanism significantly enhances Flintlock's ability to evenly distribute workloads and minimize VM provisioning times. This approach improves cluster resource utilization efficiency and operational transparency, setting the foundation for robust scalability and responsiveness.
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
## Unified API Interface and Proxy Routing

### Gap Definition and Improvement Objectives

Currently, Flintlock APIs are isolated per host, causing inconsistent and fragmented state queries. Introducing a unified API interface with proxy routing will resolve these inconsistencies and enable accurate VM state reporting from the authoritative host.

**Objectives:**

* Preserve compatibility with existing Flintlock APIs
* Provide enhanced global state querying through new `/api/v2` endpoints
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wouldn't need a v2 if the changes were made outside of flintlock.

* Implement proxy routing for authoritative, real-time VM status

### Technical Implementation and Detailed Architecture

* **API Versioning:** Retain current `/api/v1` APIs for backward compatibility, adding new `/api/v2` endpoints.
* **Global State Registry:** Maintain minimal global VM metadata (host location, VM ID).
* **Proxy Routing:** Route detailed state queries to the actual host running the VM, providing accurate real-time metrics.

### Trade-offs and Risks

* **Latency:** Slight latency increase from proxy routing requests to authoritative hosts.
* **Complexity:** Increased API routing logic to handle proxy queries.

### Operational Impacts and User Considerations

* **Transparency:** Users experience transparent and accurate VM state querying without changing existing workflows.
* **Improved Observability:** Enhanced visibility into VM states and metrics.

### Validation and Testing Strategies

* **API Compatibility Tests:** Ensure backward compatibility with existing endpoints.
* **Proxy Routing Accuracy Tests:** Verify accuracy and responsiveness of proxy-routed queries.
* **Real-time Metrics Validation:** Continuous validation of real-time metrics accuracy.

### Visualizations and Diagrams

* **High-Level Design (HLD) Diagram:**

```mermaid
graph TD
Client["API Client"]
HostA["Flintlock Host A (Leader)"]
HostB["Flintlock Host B"]
HostC["Flintlock Host C"]

Client -->|Query VM Status| HostA
HostA -->|Lookup VM location| HostB
HostB -->|Real-time VM status| HostA
HostA -->|Response| Client
HostA --- HostC
HostB --- HostC
```

* **Sequence Diagram:**

```mermaid
sequenceDiagram
actor Client
participant HostA
participant HostB
participant HostC

Client->>HostA: GET /api/v2/vm/{vm_id}/status
HostA->>HostA: Lookup VM location
HostA->>HostB: Proxy GET /api/v2/vm/{vm_id}/status
HostB->>HostA: Real-time VM metrics
HostA->>Client: Forward VM status
```

### Summary for Enhancement Proposal

Implementing a unified API interface with proxy routing significantly improves the consistency and accuracy of VM state queries in Flintlock. This enhancement provides transparent compatibility, real-time metrics accuracy, and enhanced operational visibility, preparing Flintlock for effective distributed operations.
Loading