TrueLarge-RT v1.0-beta: The Pipelining Update

This release marks a significant milestone in mobile LLM inference. We have moved from sequential layer loading to a Deep-Pipelined Architecture, enabling large-scale models (32B - 70B) to run with unprecedented fluidness on Android devices.

Major Highlights

1. Deep-Pipelined Layer Execution (Gen 2)

We have eliminated the I/O bottleneck. The engine no longer waits for weights to load from storage.

Eager Prefetch Queue: A thread-safe std::deque based system that peeks 3-5 layers ahead, keeping the storage pipeline (UFS 3.1/4.0) at 100% saturation.
Asynchronous Memory Touching: Background "touch" loops force physical page-ins, ensuring data is in RAM before the computation thread reaches it.

2. "Greedy" RAM Window (Gen 3)

Maximize your flagship hardware. We've shifted from "minimum RAM usage" to "high-performance hybrid caching."

Expanded Sliding Window: Increased the layer cap from 10 to 80 layers.
Aggressive Budgeting: Reduced the OS safety buffer to 500MB, allowing the model to occupy nearly all available RAM for maximum speed.
Inter-Token Pipelining: Prefetches Layers 0-8 of the next token immediately while the current one is still being sampled.

3. Precision Telemetry

4-Digit TPS Tracking: Real-time logging now reports Tokens-Per-Second with %.4f precision, allowing for granular performance profiling on large-leaf models.
I/O Overlap Analysis: Enhanced logs to track "HIT" vs "WAIT" times for prefetched layers.

Technical Improvements

Multi-Layer Logic: Implemented sophisticated eviction protection to prevent the prefetcher from evicting upcoming queue targets.
Kernel-Level Tweaks: Added MADV_SEQUENTIAL and MADV_WILLNEED hints to optimize the Android kernel's read-ahead behavior.
Stability: Fixed memory leaks and race conditions in the background I/O thread.

📦 Getting Started

Model Support: Fully tested with Qwen 2.5 (32B), Llama 3.1 (70B), and Mistral-based models in GGUF format.
Device Requirements:
- 4GB RAM Minimum (LBL Mode)
- 12GB+ RAM Recommended for Hybrid Mode
- UFS 3.1/4.0 Storage highly recommended.

Full Changelog: f931c56...v1.0.0-beta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

TrueLarge-RT v1.0-beta: The Pipelining Update

Major Highlights

1. Deep-Pipelined Layer Execution (Gen 2)

2. "Greedy" RAM Window (Gen 3)

3. Precision Telemetry

Technical Improvements

📦 Getting Started

Uh oh!

Releases: nareshis21/Truelarge-RT

v0.1.0-beta

TrueLarge-RT v1.0-beta: The Pipelining Update

Major Highlights

1. Deep-Pipelined Layer Execution (Gen 2)

2. "Greedy" RAM Window (Gen 3)

3. Precision Telemetry

Technical Improvements

📦 Getting Started

Uh oh!