Skip to content
Maneesh edited this page Jan 9, 2026 · 75 revisions

Welcome to the PCIe wiki!

Objective

To enhance popular open-source AI tools—like run:ai, Parallax, vLLM, and Petals to natively support AI workloads that run on PCIe-attached AI accelerator cards, enabling both inference and training scenarios allowing users to build their own AI accelerator hardware-agnostic AI Platforms. Democratizing access to large language models, proving that accessible, high-performance LLM inference is achievable beyond centralized data centers and on the hardware, people already own. Thus, making Software-Defined AI Factories (SDAF) ubiquitous for inference and running Agentic AI workflows and other Edge use cases.

Problem Statement

Many open-source AI platforms and workflows lack streamlined support for AI acceleration via PCIe-connected hardware, limiting performance and flexibility for modern hardware setups.

Proposed Solution

Extend codebases of tools such as run:ai (and similar FOSS projects) to integrate with PCIe-based AI accelerator cards, providing both efficient inference and training capabilities through hardware and software level enhancements, leveraging NIST's AI Risk Management Framework.

Value Proposition

  • Hardware-agnostic compatibility: Leverages rapid communication via PCIe to improve performance across various AI card platforms.
  • Open ecosystem enhancement: Enriches popular ML tools, Multi-modal LLMs to support cutting-edge AI accelerator integration, benefiting the broader community.
  • Empowers users to build their own AI accelerator hardware-agnostic AI Platforms.

Implementation Plan

  • Identify candidate tools (e.g., run:ai, drivers, MPI-based frameworks) for PCIe integration.
  • Design new modules or adapters that bridge existing workflows to PCIe accelerator APIs.
  • Prototype with representative workloads (inference and training). Test performance and compatibility across different accelerator cards.
  • Document integration steps and best practices for end users with monitoring and Observability features.
  • Documentation to support transparency and accountability.

Budget Estimates

Typical resource breakdowns will include:

  • Development hours for adapter design and coding
  • Testing infrastructure (access to various PCIe accelerator hardware)
  • Documentation and community support time

Risks and Mitigation

  • Hardware diversity: PCIe AI cards differ in interface and requirements.
  • Mitigation: Start with a few mainstream accelerator models.
  • Complex integrations: Code modifications may introduce instability.
  • Mitigation: Employ rigorous testing and modular architecture.
  • Project scope creep: Too many tool integrations could dilute focus.
  • Mitigation: Prioritize integration targets based on impact and feasibility.
  • NIST AI Risk Management, and TRiSM Framework.

Success Metrics

  • Number of AI frameworks successfully extended for PCIe.
  • Performance gains in inference/training benchmarks.
  • Adoption by open-source communities (stars, forks, contributions).
  • Positive feedback from early adopters.

Next Steps

  • Define supported PCIe accelerator platforms and target AI frameworks.
  • Create a development roadmap and prioritize integration effort.
  • Develop initial proof-of-concept adapter for one combination (e.g., run:ai + a specific card). Build out benchmarking suite for performance validation.
  • Expand documentation and encourage community contributions.

Appendices

  • Languages used: Python, C/C++ and others
  • Related topics/tags: cuda, k8s, k3s, mpi4py, runai, cxl, onnxoptimizer, vllm, opentelemetry-ebpf-profiler, mpio, DisTrO, cxl-mem, photonics-computing, llamacpp, llm-d, paxos-cluster, onnxoptimizer, Triton, TensorRT, Petals, Parallax, SGLang, ray and others
  • External reference: Links to Medium article comparing AI/ML hardware performance and DIY AI Hardware
  1. https://medium.com/@maneeshsharma_68969/comparing-performance-of-ai-ml-hardware-a0d18cf657a0
  2. https://medium.com/@maneeshsharma_68969/diy-ai-infrastructure-a7a1ecf8d688
  3. https://gradient.network/parallax.pdf
  4. https://arxiv.org/abs/2209.01188
  5. https://arxiv.org/abs/2509.26182
  6. https://arxiv.org/abs/2309.06180
  7. https://arxiv.org/abs/1706.01160
  8. https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
  9. https://www.jeffgeerling.com/blog/2025/all-intel-gpus-run-on-raspberry-pi-and-risc-v
  10. https://developer.nvidia.com/dcgm
  11. https://docs.ray.io/en/latest/cluster/getting-started.html
  12. https://github.com/NVIDIA/TensorRT
  13. https://catalog.ngc.nvidia.com/
  14. https://www.amd.com/content/dam/amd/en/documents/pensando-technical-docs/product-briefs/pollara-product-brief.pdf
  15. https://www.gigabyte.com/PC-Accessory/AI-TOP-CXL-R5X4
  16. https://csrc.nist.gov/projects/post-quantum-cryptography
  17. https://falcon-sign.info/falcon.pdf
  18. https://cdi.liqid.com/hubfs/Liqid-CXL%20HBA-102725.pdf
  19. https://cdi.liqid.com/hubfs/Liqid-CXL%202.0%20Fabric_072125.pdf
  20. https://www.broadcom.com/products/ethernet-connectivity/network-adapters/n1800go
  21. https://arxiv.org/pdf/2511.15950
  22. https://www.qualcomm.com/news/releases/2025/10/qualcomm-unveils-ai200-and-ai250-redefining-rack-scale-data-cent
  23. https://www.mobilint.com/aries/mla100
  24. https://www.qualcomm.com/internet-of-things/solutions/ai-on-prem-appliance
  25. https://www.qualcomm.com/developer/software/qualcomm-ai-inference-suite
  26. https://www.qualcomm.com/content/dam/qcomm-martech/dm-assets/documents/Prod_Brief_QCOM_Cloud_AI_100_Ultra.pdf
  27. https://store.axelera.ai/products/metis-pcie-card-unmatched-performance-for-edge-ai-applications
  28. https://www.gigabyte.com/Motherboard/TRX50-AERO-D-rev-12
  29. https://github.com/exo-explore/exo
  30. https://arxiv.org/pdf/2503.01861v3
  31. https://huggingface.co/blog/ibm-research/cuga-on-hugging-face
  32. https://huggingface.co/collections/nvidia/nvidia-nemotron-v3
  33. https://mikrotik.com/product/ccr2004_1g_2xs_pcie
  34. https://www.asus.com/networking-iot-servers/wired-networking/all-series/xg-c100c/
  35. https://www.tp-link.com/in/home-networking/pci-adapter/tx401/
  36. https://github.com/ml-explore/mlx
  37. https://aaif.io/
  38. https://www.kolosal.ai/
  39. https://www.foundrylocal.ai/models
  40. https://plugable.com/blogs/news/plugable-introduces-tbt5-ai-at-ces-secure-local-ai-powered-by-thunderbolt-5
image