Skip to content

Framework for AI on mobile devices and wearables, hardware-aware C/CPP backend, with wrappers for Kotlin, Java, Swift, React, Flutter.

License

Notifications You must be signed in to change notification settings

cactus-compute/cactus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

Email Discord Design Docs License Stars Forks

Cactus is a lightweight, high-performance framework for running AI models on mobile phones. Cactus has unified and consistent APIs across

  • React-Native
  • Android/Kotlin
  • Android/Java
  • iOS/Swift
  • iOS/Objective-C++
  • Flutter/Dart

Cactus currently leverages GGML backends to support any GGUF model already compatible with Llama.cpp, while we focus on broadly supporting every moblie app development platform, as well as upcoming features like:

  • MCP
  • phone tool use
  • thinking
  • prompt-enhancement
  • higher-level APIs

Contributors with any of the above experiences are welcome! Feel free to submit cool example apps you built with Cactus, issues or tests!

Cactus Models coming soon.

Table of Contents

Technical Architecture

┌─────────────────────────────────────────────────────────┐
│                     Applications                        │
└───────────────┬─────────────────┬───────────────────────┘
                │                 │                
┌───────────────┼─────────────────┼───────────────────────-┐
│ ┌─────────────▼─────┐ ┌─────────▼───────┐ ┌─────────────┐|
│ │     React API     │ │   Flutter API   │ │  Native APIs│|
│ └───────────────────┘ └─────────────────┘ └─────────────┘|
│                Platform Bindings                         │
└───────────────┬─────────────────┬───────────────────────-┘
                │                 │                
┌───────────────▼─────────────────▼───────────────────────┐
│                   Cactus Core (C++)                     │
└───────────────┬─────────────────┬───────────────────────┘
                │                 │                
┌───────────────▼─────┐ ┌─────────▼───────────────────────┐
│   Llama.cpp Core    │ │    GGML/GGUF Model Format       │
└─────────────────────┘ └─────────────────────────────────┘
  • Features:
    • Model download from HuggingFace
    • Text completion and chat completion
    • Streaming token generation
    • Embedding generation
    • JSON mode with schema validation
    • Chat templates with Jinja2 support
    • Low memory footprint
    • Battery-efficient inference
    • Background processing

Benchmarks

we created a little chat app for demo, you can try other models and report your finding here, download the app

Gemma 1B INT8:

  • iPhone 16 Pro Max: ~45 toks/sec
  • iPhone 13 Pro: ~30 toks/sec
  • Galaxy A14: ~6 toks/sec
  • Galaxy S24 plus: ~20 toks/sec
  • Galaxy S21: ~14 toks/sec
  • Google Pixel 6a: ~14 toks/sec

SmollLM 135m INT8:

  • iPhone 13 Pro: ~180 toks/sec
  • Galaxy A14: ~30 toks/sec
  • Galaxy S21: ~42 toks/sec
  • Google Pixel 6a: ~38 toks/sec
  • Huawei P60 Lite (Gran's phone) ~8toks/sec

Getting Started

✅ React Native (TypeScript/JavaScript)

npm install cactus-react-native
# or
yarn add cactus-react-native

# For iOS, install pods if not on Expo
npx pod-install
import { initLlama, LlamaContext } from 'cactus-react-native';

// Load model
const context = await initLlama({
  model: 'models/llama-2-7b-chat.gguf', // Path to your model
  n_ctx: 2048,
  n_batch: 512,
  n_threads: 4
});

// Generate completion
const result = await context.completion({
  prompt: 'Explain quantum computing in simple terms',
  temperature: 0.7,
  top_k: 40,
  top_p: 0.95,
  n_predict: 512
}, (token) => {
  // Process each token
  process.stdout.write(token.token);
});

// Clean up
await context.release();

For more detailed documentation and examples, see the React Native README.

✅ Android (Kotlin/Java)

1. Add Repository to settings.gradle.kts:

// settings.gradle.kts
dependencyResolutionManagement {
    repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS) // Optional but recommended
    repositories {
        google()
        mavenCentral()
        // Add GitHub Packages repository for Cactus
        maven {
            name = "GitHubPackagesCactusCompute"
            url = uri("https://maven.pkg.github.com/cactus-compute/cactus")
        }
    }
}

2. Add Dependency to Module's build.gradle.kts:

// app/build.gradle.kts
dependencies {
    implementation("io.github.cactus-compute:cactus-android:0.0.1")
}

3. Basic Usage (Kotlin):

import com.cactus.android.LlamaContext
import com.cactus.android.LlamaInitParams
import com.cactus.android.LlamaCompletionParams
import kotlinx.coroutines.Dispatchers
import kotlinx.coroutines.withContext

// In an Activity, ViewModel, or coroutine scope

suspend fun runInference() {
    var llamaContext: LlamaContext? = null
    try {
        // Initialize (off main thread)
        llamaContext = withContext(Dispatchers.IO) {
            LlamaContext.create(
                params = LlamaInitParams(
                    modelPath = "path/to/your/model.gguf",
                    nCtx = 2048, nThreads = 4
                )
            )
        }

        // Complete (off main thread)
        val result = withContext(Dispatchers.IO) {
            llamaContext?.complete(
                prompt = "Explain quantum computing in simple terms",
                params = LlamaCompletionParams(temperature = 0.7f, nPredict = 512)
            ) { partialResultMap ->
                val token = partialResultMap["token"] as? String ?: ""
                print(token) // Process stream on background thread
                true // Continue generation
            }
        }
        println("\nFinal text: ${result?.text}")

    } catch (e: Exception) {
        // Handle errors
        println("Error: ${e.message}")
    } finally {
        // Clean up (off main thread)
        withContext(Dispatchers.IO) {
             llamaContext?.close()
        }
    }
}

For more detailed documentation and examples, see the Android README.

🚧 Swift (in developement)

🚧 Flutter (in developement)