Skip to content

[RFC] Add FlagOS Backend Support for Multi-Chip AI Accelerators #8786

@GWinfinity

Description

@GWinfinity

Summary

This RFC proposes adding support for FlagOS (a unified open-source AI system software stack) to Taichi, enabling Taichi programs to run o
n various domestic AI chips including MLU (Cambricon), Ascend (Huawei), DCU (Hygon), and GCU (Enflame) through FlagOS's unified compiler infr
astructure.

Motivation

Current Situation

  • Taichi currently supports NVIDIA GPUs (CUDA), AMD GPUs, and other backends
  • Domestic AI chips are widely used in China but lack Taichi support
  • Each chip requires individual backend development effort

Proposed Solution

Integrate with FlagOS, which provides:

  • Unified Compiler (FlagTree): Single compiler targeting multiple AI chip architectures
  • Multi-chip Support: MLU, Ascend, DCU, GCU, and more
  • Mature Ecosystem: FlagGems (operators), FlagCX (communication), FlagScale (training)

Benefits

  1. Expand Taichi's Hardware Support: Access to domestic AI chip market
  2. Reduce Development Effort: One backend for multiple chips
  3. Ecosystem Integration: Connect Taichi with FlagOS ecosystem

Proposed Design

Architecture Overview

Taichi DSL → LLVM IR → FlagTree Compiler → AI Chip Binary

Key Components

  1. RHI Device Layer (taichi/rhi/flagos/)

    • Memory management via FlagOS runtime
    • Kernel launch interface
    • Multi-chip support (MLU370, Ascend910, DCU, GCU)
  2. Code Generation Layer (taichi/codegen/flagos/)

    • LLVM IR generation for AI chips
    • SPMD execution model
    • Chip-specific optimizations
  3. Program Implementation (taichi/runtime/program_impls/flagos/)

    • Kernel compilation via FlagTree
    • Kernel launch management

Python API

  import taichi as ti

  # Initialize FlagOS backend
  ti.init(arch=ti.flagos, flagos_chip="mlu370")

  @ti.kernel
  def compute():
      for i in range(1000000):
          pass

  compute()

Implementation Plan

Phase Timeline Deliverables
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Phase 1 1-2 weeks Infrastructure (current PR ready)
Phase 2 2-4 weeks FlagTree SDK integration
Phase 3 4-8 weeks Advanced features & optimization

Current Status

✅ Completed:

• Core architecture (27 files, +2900 lines)
• RHI Device layer
• Code generation layer
• Program implementation
• Build system integration
• Example programs
• Documentation

🔄 Pending (requires FlagOS SDK):

• FlagTree compiler integration
• Kernel binary generation
• Hardware-specific optimizations

Testing Strategy

  1. Stub Testing: Use generic stub for API testing
  2. Mock Testing: Mock FlagTree compiler for unit tests
  3. CI Integration: GitHub Actions with TI_WITH_FLAGOS=ON

Questions for Discussion

  1. Should FlagosProgramImpl inherit from LlvmProgramImpl or be separate?
  2. Is stub/mock testing acceptable for initial merge without hardware?
  3. Who will maintain this backend long-term?

Related Links

• FlagOS: https://github.com/flagos-ai
• FlagTree: https://github.com/flagos-ai/flagtree
• Taichi FlagOS Fork: https://github.com/GWinfinity/taichi

Checklist

• [x] RFC created
• [ ] Community feedback incorporated
• [ ] Core team approval
• [ ] PR submitted

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
/cc @ailzhang @k-ye @bobcao3

Would love to hear your thoughts on this proposal!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Untriaged

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions