Releases: nareshis21/Truelarge-RT
Releases · nareshis21/Truelarge-RT
v0.1.0-beta
TrueLarge-RT v1.0-beta: The Pipelining Update
This release marks a significant milestone in mobile LLM inference. We have moved from sequential layer loading to a Deep-Pipelined Architecture, enabling large-scale models (32B - 70B) to run with unprecedented fluidness on Android devices.
Major Highlights
1. Deep-Pipelined Layer Execution (Gen 2)
We have eliminated the I/O bottleneck. The engine no longer waits for weights to load from storage.
- Eager Prefetch Queue: A thread-safe
std::dequebased system that peeks 3-5 layers ahead, keeping the storage pipeline (UFS 3.1/4.0) at 100% saturation. - Asynchronous Memory Touching: Background "touch" loops force physical page-ins, ensuring data is in RAM before the computation thread reaches it.
2. "Greedy" RAM Window (Gen 3)
Maximize your flagship hardware. We've shifted from "minimum RAM usage" to "high-performance hybrid caching."
- Expanded Sliding Window: Increased the layer cap from 10 to 80 layers.
- Aggressive Budgeting: Reduced the OS safety buffer to 500MB, allowing the model to occupy nearly all available RAM for maximum speed.
- Inter-Token Pipelining: Prefetches Layers 0-8 of the next token immediately while the current one is still being sampled.
3. Precision Telemetry
- 4-Digit TPS Tracking: Real-time logging now reports Tokens-Per-Second with
%.4fprecision, allowing for granular performance profiling on large-leaf models. - I/O Overlap Analysis: Enhanced logs to track "HIT" vs "WAIT" times for prefetched layers.
Technical Improvements
- Multi-Layer Logic: Implemented sophisticated eviction protection to prevent the prefetcher from evicting upcoming queue targets.
- Kernel-Level Tweaks: Added
MADV_SEQUENTIALandMADV_WILLNEEDhints to optimize the Android kernel's read-ahead behavior. - Stability: Fixed memory leaks and race conditions in the background I/O thread.
📦 Getting Started
- Model Support: Fully tested with Qwen 2.5 (32B), Llama 3.1 (70B), and Mistral-based models in GGUF format.
- Device Requirements:
- 4GB RAM Minimum (LBL Mode)
- 12GB+ RAM Recommended for Hybrid Mode
- UFS 3.1/4.0 Storage highly recommended.
Full Changelog: f931c56...v1.0.0-beta