Home

Welcome to the PCIe wiki!

Objective

To enhance popular open-source AI tools—like run:ai, Parallax, vLLM, and Petals to natively support AI workloads that run on PCIe-attached AI accelerator cards, enabling both inference and training scenarios allowing users to build their own AI accelerator hardware-agnostic AI Platforms. Democratizing access to large language models, proving that accessible, high-performance LLM inference is achievable beyond centralized data centers and on the hardware, people already own. Thus, making Software-Defined AI Factories (SDAF) ubiquitous for inference and running Agentic AI workflows and other Edge use cases.

Problem Statement

Many open-source AI platforms and workflows lack streamlined support for AI acceleration via PCIe-connected hardware, limiting performance and flexibility for modern hardware setups.

Proposed Solution

Extend codebases of tools such as run:ai (and similar FOSS projects) to integrate with PCIe-based AI accelerator cards, providing both efficient inference and training capabilities through hardware and software level enhancements, leveraging NIST's AI Risk Management Framework.

Value Proposition

Hardware-agnostic compatibility: Leverages rapid communication via PCIe to improve performance across various AI card platforms.
Open ecosystem enhancement: Enriches popular ML tools, Multi-modal LLMs to support cutting-edge AI accelerator integration, benefiting the broader community.
Empowers users to build their own AI accelerator hardware-agnostic AI Platforms.

Implementation Plan

Identify candidate tools (e.g., run:ai, drivers, MPI-based frameworks) for PCIe integration.
Design new modules or adapters that bridge existing workflows to PCIe accelerator APIs.
Prototype with representative workloads (inference and training). Test performance and compatibility across different accelerator cards.
Document integration steps and best practices for end users with monitoring and Observability features.
Documentation to support transparency and accountability.

Budget Estimates

Typical resource breakdowns will include:

Development hours for adapter design and coding
Testing infrastructure (access to various PCIe accelerator hardware)
Documentation and community support time

Risks and Mitigation

Hardware diversity: PCIe AI cards differ in interface and requirements.
Mitigation: Start with a few mainstream accelerator models.
Complex integrations: Code modifications may introduce instability.
Mitigation: Employ rigorous testing and modular architecture.
Project scope creep: Too many tool integrations could dilute focus.
Mitigation: Prioritize integration targets based on impact and feasibility.
NIST AI Risk Management, and TRiSM Framework.

Success Metrics

Number of AI frameworks successfully extended for PCIe.
Performance gains in inference/training benchmarks.
Adoption by open-source communities (stars, forks, contributions).
Positive feedback from early adopters.

Next Steps

Define supported PCIe accelerator platforms and target AI frameworks.
Create a development roadmap and prioritize integration effort.
Develop initial proof-of-concept adapter for one combination (e.g., run:ai + a specific card). Build out benchmarking suite for performance validation.
Expand documentation and encourage community contributions.

Appendices

Languages used: Python, C/C++ and others
Related topics/tags: cuda, k8s, k3s, mpi4py, runai, cxl, onnxoptimizer, vllm, opentelemetry-ebpf-profiler, mpio, DisTrO, cxl-mem, photonics-computing, llamacpp, llm-d, paxos-cluster, onnxoptimizer, Triton, TensorRT, Petals, Parallax, SGLang, ray and others
External reference: Links to Medium article comparing AI/ML hardware performance and DIY AI Hardware

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!