Description
Summary
This RFC aims to discuss and gather community input on refactoring the bitsandbytes/cuda_setup
module. The goal is to enhance its functionality, simplify the user experience across different hardware and operating systems, and prepare it for upcoming device support expansions.
Background
bitsandbytes
has become instrumental in democratizing AI, thanks to its deep integration with hardware. Despite millions of monthly downloads, a fraction of users encounter issues, such as those detailed in #914. Our objective is to make bitsandbytes
as easily usable (e.g. as easy as pip install bitsandbytes
and load_in_4bit=True
) as possible while mostly hiding the complexities of the software-hardware boundary under the hood, maintaining the ease of installation and use, while improving error reporting and handling.
Current Challenges
Setup Module Issues
- Bug Reports from
python -m bitsandbytes
: This feature, intended to simplify debugging, sometimes presents similar tracebacks for different underlying issues, causing confusion in the issue threads. - CUDA Install and Environment Challenges: Many problems arise not from
bitsandbytes
itself, but from user-side issues with CUDA installations, environment settings (e.g.LD_LIBRARY_PATH
), or hardware configurations. - Perceived Reliability: There are some issues with the setup code that need be fixed. The code quality could be much better.
Diverse Hardware Landscape
- GPU-Nvidia Variability: Different generations and capabilities (e.g., Compute Capability, tensor cores, data types).
- Emerging GPU-AMD Support: Efforts like PR #756 are in progress to integrate AMD hardware.
- Apple Silicon Requests: Interest shown in PR #257 and other contributions.
- Intel GPU+CPU Quantization: The initiative by Intel with PR #898 for device abstraction.
Operating System Variability
- Linux as Primary OS: Continued focus on Linux support.
- Windows and Apple Support: Evaluating and integrating community contributions for broader OS support:
- Windows: Limited support atm, but high impact - need to evaluate the status quo, community contributions and come up with a roadmap
- Apple: Currently, no support - Ongoing discussions, community contributions that weren't accepted so far and quite a high interest - need to further evaluate the status quo, community contributions and come up with a roadmap.
Proposed Improvements
- Refactoring
cuda_setup
: Enhancing code quality and clarity to better handle the diverse hardware and OS scenarios. - Error Reporting Enhancement: Develop a more nuanced error reporting mechanism that reflects the source of issues as accurate as possible, while making the conflation of issues harder by providing distinct traces.
- Community Engagement: Actively seeking community input, especially for cross-platform compatibility and new device support.
- CI/CD Strategies: Discussing and implementing robust CI/CD processes to facilitate testing across various platforms and hardware.
Call to Action
We invite the community to provide feedback and suggestions on the following:
- Improvements to the
cuda_setup
module. - Strategies for handling diverse hardware and operating systems.
- Ideas for an effective CI/CD setup - we'll provide a separate RFC for that, but feel free to mention initial thoughts here as well.
- Any other relevant insights or experiences.
Timeline and Milestones
We'll take an incremental take on improving the setup module. The more actionable and commonly agreed, the quicker we can implement.
Contribution and Feedback Mechanism
Please share your thoughts, suggestions, and feedback in the thread below.
Summary and Next Steps
This RFC serves as a starting point to get feedback and coordinate the collaborative effort to refine bitsandbytes
's setup process. We aim to address the current challenges, embrace the diversity of hardware and operating systems, and build a robust, user-friendly setup. Your participation in this process is crucial, and we look forward to your valuable input.