CactusCactus

Overview

Cross-platform framework for deploying language, vision, and speech models locally on smartphones.

Cactus SDK Banner

Cactus SDK

Cactus is the fastest cross-platform framework for deploying AI locally on smartphones.

Key Features

  • Cross-Platform: Available in Flutter and React Native for cross-platform developers
  • Any GGUF Model: Supports any GGUF model from Huggingface (Qwen, Gemma, Llama, DeepSeek, etc.)
  • Multi-Modal AI: Run LLMs, VLMs, Embedding Models, TTS models and more
  • Optimized Performance: From FP32 to as low as 2-bit quantized models for efficiency
  • Agentic: More performant workflows with mobile tool calling
  • Native Support: iOS xcframework and JNILibs for native setup
  • Tiny C++ Build: For custom hardware deployments
  • Advanced Features: Chat templates with Jinja2 support and token streaming

Quick Start

Choose your preferred platform:

Platform Examples

Note: due to divergent framework support, the initialization patterns for Flutter and React Native are slightly different:

  • React Native CactusLM is initialized with a local model file (inside the app sandbox)
  • Flutter CactusLM is initialized with a HuggingFace download URL
import 'package:cactus/cactus.dart';

final lm = await CactusLM.init(
  modelUrl: 'https://huggingface.co/Cactus-Compute/Qwen3-600m-Instruct-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf',
  contextSize: 2048,
);

final messages = [ChatMessage(role: 'user', content: 'Hello!')];
final response = await lm.completion(messages, maxTokens: 100, temperature: 0.7);
import { CactusLM } from 'cactus-react-native';
import RNFS from 'react-native-fs'; // install RNFS for file management

const filePath = `${RNFS.DocumentDirectoryPath}/${fileName}`;

const { lm, error } = await CactusLM.init({
  model: filePath,
  n_ctx: 2048,
});

const messages = [{ role: 'user', content: 'Hello!' }];
const params = { n_predict: 100, temperature: 0.7 };
const response = await lm.completion(messages, params);
common_params params;
params.model.path = 'path/to/your/model.gguf';
context.loadModel(params)

context.params.prompt = "Hello, how are you?";
context.params.n_predict = 100;
context.initSampling()

context.beginCompletion();
context.loadPrompt();

while (context.has_next_token && !context.is_interrupted) {
  auto token_output = context.doCompletion();
  if (token_output.tok == -1) break;
}

Get started by watching a quickstart video and building one of our example apps:

Telemetry

Cactus offers powerful telemetry for all your React Native projects.

To take advantage of your Cactus telemetry, visit our React Native documentation .

Performance Benchmarks

Real-world performance on popular mobile devices:

DeviceGemma3 1B Q4 (toks/sec)Qwen3 4B Q4 (toks/sec)
iPhone 16 Pro Max5418
iPhone 16 Pro5418
iPhone 164916
iPhone 15 Pro Max4515
iPhone 15 Pro4515
iPhone 14 Pro Max4414
OnePlus 13 5G4314
Samsung Galaxy S24 Ultra4214
iPhone 154214
OnePlus Open3813
Samsung Galaxy S23 5G3712
Samsung Galaxy S243612
iPhone 13 Pro3511
OnePlus 123511
Galaxy S25 Ultra299
OnePlus 11268
iPhone 13 mini258
Redmi K70 Ultra248
Xiaomi 13248
Samsung Galaxy S24+227
Samsung Galaxy Z Fold 4227
Xiaomi Poco F6 5G226

Demo Apps

Try our demo applications to see Cactus SDK in action:

Next Steps

Community