CactusCactus

Flutter SDK

Complete guide to using Cactus SDK in Flutter applications

Cactus Flutter

Official Flutter library for Cactus, a framework for deploying LLM and STT models locally in your app.

Video walkthrough

Build an example app in 5 minutes by following this video:

Installation

Dependency List

Add to pubspec.yaml

dependencies:
  cactus:
    git:
      url: https://github.com/cactus-compute/cactus-flutter.git
      ref: main

Install Dependencies

flutter pub get

Platform Requirements

  • iOS: iOS 12.0+
  • Android: API level 24+

Quickstart

Get started from scratch by cloning our example app.

Language Model (LLM)

The CactusLM class provides text completion capabilities with high-performance local inference.

Basic Usage

Download Model

Download a model by slug (e.g., "qwen3-0.6", "gemma3-270m"). If no model is specified, it defaults to "qwen3-0.6".

await lm.downloadModel(
  model: "qwen3-0.6", // Optional: specify model slug
  downloadProcessCallback: (progress, status, isError) {
    if (isError) {
      print("Download error: $status");
    } else {
      print("$status ${progress != null ? '(${progress * 100}%)' : ''}");
    }
  },
);

Initialize Model

Initialize the downloaded model for inference.

await lm.initializeModel();

Generate Completion

Generate a completion with default parameters.

final result = await lm.generateCompletion(
  messages: [
    ChatMessage(content: "Hello, how are you?", role: "user"),
  ],
);

if (result.success) {
  print("Response: ${result.response}");
  print("Tokens per second: ${result.tokensPerSecond}");
  print("Time to first token: ${result.timeToFirstTokenMs}ms");
}

Unload Model

Clean up and free the model from memory.

lm.unload();

See the full example in basic_completion.dart.

Streaming Completions

Get a streaming response and process the output as it's generated.

final streamedResult = await lm.generateCompletionStream(
  messages: [ChatMessage(content: "Tell me a story", role: "user")],
);

await for (final chunk in streamedResult.stream) {
  print(chunk);
}

final finalResult = await streamedResult.result;
if (finalResult.success) {
  print("Final response: ${finalResult.response}");
  print("Tokens per second: ${finalResult.tokensPerSecond}");
}

See the full example in streaming_completion.dart.

Function Calling (Experimental)

Define tools and let the model generate function calls.

final tools = [
  CactusTool(
    name: "get_weather",
    description: "Get current weather for a location",
    parameters: ToolParametersSchema(
      properties: {
        'location': ToolParameter(type: 'string', description: 'City name', required: true),
      },
    ),
  ),
];

final result = await lm.generateCompletion(
  messages: [ChatMessage(content: "What's the weather in New York?", role: "user")],
  params: CactusCompletionParams(
    tools: tools
  )
);

if (result.success) {
  print("Response: ${result.response}");
  print("Tools: ${result.toolCalls}");
}

See the full example in function_calling.dart.

Tool Filtering (Experimental)

When working with many tools, you can use tool filtering to automatically select the most relevant tools for each query.

// Configure tool filtering via constructor (optional)
final lm = CactusLM(
  enableToolFiltering: true,  // default: true
  toolFilterConfig: ToolFilterConfig.simple(maxTools: 3),  // default config if not specified
);
await lm.downloadModel(model: "qwen3-0.6");
await lm.initializeModel();

// Define multiple tools
final tools = [
  CactusTool(
    name: "get_weather",
    description: "Get current weather for a location",
    parameters: ToolParametersSchema(
      properties: {
        'location': ToolParameter(type: 'string', description: 'City name', required: true),
      },
    ),
  ),
  CactusTool(
    name: "get_stock_price",
    description: "Get current stock price for a company",
    parameters: ToolParametersSchema(
      properties: {
        'symbol': ToolParameter(type: 'string', description: 'Stock symbol', required: true),
      },
    ),
  ),
  CactusTool(
    name: "send_email",
    description: "Send an email to someone",
    parameters: ToolParametersSchema(
      properties: {
        'to': ToolParameter(type: 'string', description: 'Email address', required: true),
        'subject': ToolParameter(type: 'string', description: 'Email subject', required: true),
        'body': ToolParameter(type: 'string', description: 'Email body', required: true),
      },
    ),
  ),
];

// Tool filtering happens automatically!
final result = await lm.generateCompletion(
  messages: [ChatMessage(content: "What's the weather in Paris?", role: "user")],
  params: CactusCompletionParams(
    tools: tools
  )
);

if (result.success) {
  print("Response: ${result.response}");
  print("Tool calls: ${result.toolCalls}");
}

lm.unload();

Note: When tool filtering is active, you'll see debug output like:

Tool filtering: 3 -> 1 tools
Filtered tools: get_weather

Hybrid Completion (Cloud Fallback)

The CactusLM supports a hybrid completion mode that falls back to a cloud-based LLM provider (OpenRouter) if local inference fails or is not available. This ensures reliability and provides a seamless experience.

To use hybrid mode:

  1. Set completionMode to CompletionMode.hybrid in CactusCompletionParams.
  2. Provide a cactusToken in CactusCompletionParams.

To get a cactusToken, join our Discord community and contact us.

final result = await lm.generateCompletion(
  messages: [ChatMessage(content: "What's the weather in New York?", role: "user")],
  params: CactusCompletionParams(
    completionMode: CompletionMode.hybrid,
    cactusToken: "YOUR_CACTUS_TOKEN"
  ),
);

if (result.success) {
  print("Response: ${result.response}");
}

See the full example in hybrid_completion.dart.

Fetching Available Models

// Get list of available models with caching
final models = await lm.getModels();

for (final model in models) {
  print("Model: ${model.name}");
  print("Slug: ${model.slug}");
  print("Size: ${model.sizeMb} MB");
  print("Downloaded: ${model.isDownloaded}");
  print("Supports Tool Calling: ${model.supportsToolCalling}");
  print("Supports Vision: ${model.supportsVision}");
  print("---");
}

See the full example in fetch_models.dart.

Default Parameters

The CactusLM class provides sensible defaults for completion parameters:

  • maxTokens: 200 - Maximum tokens to generate
  • stopSequences: ["<|im_end|>", "<end_of_turn>"] - Stop sequences for completion
  • completionMode: CompletionMode.local - Default to local-only inference.

LLM API Reference

CactusLM Class

  • CactusLM({bool enableToolFiltering = true, ToolFilterConfig? toolFilterConfig}) - Constructor. Set enableToolFiltering to false to disable automatic tool filtering. Provide toolFilterConfig to customize filtering behavior (defaults to ToolFilterConfig.simple() if not specified).
  • Future<void> downloadModel({String model = "qwen3-0.6", CactusProgressCallback? downloadProcessCallback}) - Download a model by slug (e.g., "qwen3-0.6", "gemma3-270m", etc.). Use getModels() to see available model slugs. Defaults to "qwen3-0.6" if not specified.
  • Future<void> initializeModel({CactusInitParams? params}) - Initialize model for inference
  • Future<CactusCompletionResult> generateCompletion({required List<ChatMessage> messages, CactusCompletionParams? params}) - Generate text completion (uses default params if none provided). Automatically filters tools if enableToolFiltering is true (default).
  • Future<CactusStreamedCompletionResult> generateCompletionStream({required List<ChatMessage> messages, CactusCompletionParams? params}) - Generate streaming text completion (uses default params if none provided). Automatically filters tools if enableToolFiltering is true (default).
  • Future<List<CactusModel>> getModels() - Fetch available models with caching
  • Future<CactusEmbeddingResult> generateEmbedding({required String text, String? modelName}) - Generate text embeddings
  • void unload() - Free model from memory
  • bool isLoaded() - Check if model is loaded

Data Classes

  • CactusInitParams({String model = "qwen3-0.6", int? contextSize = 2048}) - Model initialization parameters
  • CactusCompletionParams({String? model, double? temperature, int? topK, double? topP, int maxTokens = 200, List<String> stopSequences = ["<|im_end|>", "<end_of_turn>"], List<CactusTool>? tools, CompletionMode completionMode = CompletionMode.local, String? cactusToken}) - Completion parameters
  • ChatMessage({required String content, required String role, int? timestamp}) - Chat message format
  • CactusCompletionResult({required bool success, required String response, required double timeToFirstTokenMs, required double totalTimeMs, required double tokensPerSecond, required int prefillTokens, required int decodeTokens, required int totalTokens, List<ToolCall> toolCalls = []}) - Contains response, timing metrics, tool calls, and success status
  • CactusStreamedCompletionResult({required Stream<String> stream, required Future<CactusCompletionResult> result}) - Contains the stream and the final result of a streamed completion.
  • CactusModel({required DateTime createdAt, required String slug, required String downloadUrl, required int sizeMb, required bool supportsToolCalling, required bool supportsVision, required String name, bool isDownloaded = false, int quantization = 8}) - Model information
  • CactusEmbeddingResult({required bool success, required List<double> embeddings, required int dimension, String? errorMessage}) - Embedding generation result
  • CactusTool({required String name, required String description, required ToolParametersSchema parameters}) - Function calling tool definition
  • ToolParametersSchema({String type = 'object', required Map<String, ToolParameter> properties}) - Tool parameters schema with automatic required field extraction
  • ToolParameter({required String type, required String description, bool required = false}) - Tool parameter specification
  • ToolCall({required String name, required Map<String, String> arguments}) - Tool call result from model
  • ToolFilterConfig({ToolFilterStrategy strategy = ToolFilterStrategy.simple, int? maxTools, double similarityThreshold = 0.3}) - Configuration for tool filtering behavior
    • Factory: ToolFilterConfig.simple({int maxTools = 3}) - Creates a simple keyword-based filter config
  • ToolFilterStrategy - Enum for tool filtering strategy (simple for keyword matching, semantic for embedding-based matching)
  • ToolFilterService({ToolFilterConfig? config, required CactusLM lm}) - Service for filtering tools based on query relevance (used internally)
  • CactusProgressCallback = void Function(double? progress, String statusMessage, bool isError) - Progress callback for downloads
  • CompletionMode - Enum for completion mode (local or hybrid).

Embeddings

The CactusLM class also provides text embedding generation capabilities for semantic similarity, search, and other NLP tasks.

Basic Usage

Download and Initialize Download and initialize a model, same as for completions.

await lm.downloadModel();
await lm.initializeModel();

Generate Embeddings Generate embeddings for a piece of text.

final result = await lm.generateEmbedding(
  text: "This is a sample text for embedding generation",
);

if (result.success) {
  print("Embedding dimension: ${result.dimension}");
  print("Embedding vector length: ${result.embeddings.length}");
  print("First few values: ${result.embeddings.take(5)}");
} else {
  print("Embedding generation failed: ${result?.errorMessage}");
}

See the full example in embedding.dart.

Embedding API Reference

CactusLM Class (Embedding Methods)

  • Future<CactusEmbeddingResult?> generateEmbedding({required String text}) - Generate text embeddings

Embedding Data Classes

  • CactusEmbeddingResult({required bool success, required List<double> embeddings, required int dimension, String? errorMessage}) - Contains the generated embedding vector and metadata

Speech-to-Text (STT)

The CactusSTT class provides speech recognition with support for multiple providers (Vosk and Whisper).

Basic Usage

final stt = CactusSTT();

try {
  // Download and initialize model (defaults to Vosk)
  await stt.download(
    downloadProcessCallback: (progress, status, isError) {
        print("$status ${progress != null ? '(${progress * 100}%)' : ''}");
    },
  );
  await stt.init();

  // Transcribe from microphone or file
  final result = await stt.transcribe();

  if (result != null && result.success) {
    print("Transcribed: ${result.text}");
  }
} finally {
  stt.dispose();
}

Default Parameters

The CactusSTT class uses sensible defaults for speech recognition:

  • provider: TranscriptionProvider.vosk - Default transcription provider
  • Vosk provider defaults:
    • model: "vosk-en-us" - Default English (US) voice model
  • Whisper provider defaults:
    • model: "whisper-tiny" - Default Whisper model
  • sampleRate: 16000 - Standard sample rate for speech recognition
  • maxDuration: 30000 - Maximum 30 seconds recording time
  • maxSilenceDuration: 2000 - Stop after 2 seconds of silence
  • silenceThreshold: 500.0 - Sensitivity for silence detection

STT API Reference

CactusSTT Class

  • CactusSTT({TranscriptionProvider provider = TranscriptionProvider.vosk}) - Constructor with optional provider selection
  • TranscriptionProvider get provider - Get the current transcription provider
  • Future<bool> download({String model = "", CactusProgressCallback? downloadProcessCallback}) - Download a voice model with optional progress callback (defaults: "vosk-en-us" for Vosk, "whisper-tiny" for Whisper)
  • Future<bool> init({required String model}) - Initialize speech recognition model (required model parameter)
  • Future<SpeechRecognitionResult?> transcribe({SpeechRecognitionParams? params, String? filePath}) - Transcribe speech from microphone or file
  • void stop() - Stop current recording session
  • bool get isRecording - Check if currently recording
  • bool isReady() - Check if model is initialized and ready
  • Future<List<VoiceModel>> getVoiceModels() - Fetch available voice models
  • Future<bool> isModelDownloaded({required String modelName}) - Check if a specific model is downloaded
  • void dispose() - Clean up resources and free memory

STT Data Classes

  • TranscriptionProvider - Enum for choosing transcription provider (vosk, whisper)
  • SpeechRecognitionParams({int sampleRate = 16000, int maxDuration = 30000, int maxSilenceDuration = 2000, double silenceThreshold = 500.0, String? model}) - Speech recognition configuration
  • SpeechRecognitionResult({required bool success, required String text, double? processingTime}) - Transcription result with timing information
  • VoiceModel({required DateTime createdAt, required String slug, required String language, required String url, required int sizeMb, required String fileName, bool isDownloaded = false}) - Voice model information
  • CactusProgressCallback = void Function(double? progress, String statusMessage, bool isError) - Progress callback for model downloads

Retrieval-Augmented Generation (RAG)

The CactusRAG class provides a local vector database for storing, managing, and searching documents with automatic text chunking. It uses ObjectBox for efficient on-device storage and retrieval, making it ideal for building RAG applications that run entirely locally.

Key Features:

  • Automatic Text Chunking: Documents are automatically split into configurable chunks with overlap for better context preservation
  • Embedding Generation: Integrates with CactusLM to automatically generate embeddings for each chunk
  • Vector Search: Performs efficient nearest neighbor search using HNSW (Hierarchical Navigable Small World) index with squared Euclidean distance
  • Document Management: Supports create, read, update, and delete operations with automatic chunk handling
  • Local-First: All data and embeddings are stored on-device using ObjectBox for privacy and offline functionality

Basic Usage

Note on Distance Scores: The search method returns squared Euclidean distance values where lower distance = more similar vectors. Results are automatically sorted with the most similar chunks first. You don't need to convert to similarity scores - just use the distance values directly for filtering or ranking.

final lm = CactusLM();
final rag = CactusRAG();

try {
  // Initialize
  await lm.downloadModel();
  await lm.initializeModel();
  await rag.initialize();

  // Set up embedding generator
  rag.setEmbeddingGenerator((text) async {
    final result = await lm.generateEmbedding(text: text);
    return result.embeddings;
  });

  // Configure chunking parameters (optional - defaults: chunkSize=512, chunkOverlap=64)
  rag.setChunking(chunkSize: 1024, chunkOverlap: 128);

  // Store a document (automatically chunks and embeds)
  final document = await rag.storeDocument(
    fileName: "document.txt",
    filePath: "/path/to/document.txt",
    content: "Your document content here...",
    fileSize: 1024,
  );

  // Search for similar content
  final searchResults = await rag.search(
    text: "What is the famous landmark in Paris?",
    limit: 5, // Get top 5 most similar chunks
  );

  for (final result in searchResults) {
    print("- Chunk from \${result.chunk.document.target?.fileName} (Distance: \${result.distance.toStringAsFixed(2)})");
    print("  Content: \${result.chunk.content.substring(0, 50)}...");
  }
} finally {
  lm.unload();
  await rag.close();
}

RAG API Reference

CactusRAG Class

  • Future<void> initialize() - Initialize the local ObjectBox database
  • Future<void> close() - Close the database connection
  • void setEmbeddingGenerator(EmbeddingGenerator generator) - Set the function used to generate embeddings for text chunks
  • void setChunking({required int chunkSize, required int chunkOverlap}) - Configure text chunking parameters (defaults: chunkSize=512, chunkOverlap=64)
  • int get chunkSize - Get current chunk size setting
  • int get chunkOverlap - Get current chunk overlap setting
  • List<String> chunkContent(String content, {int? chunkSize, int? chunkOverlap}) - Manually chunk text content (visible for testing)
  • Future<Document> storeDocument({required String fileName, required String filePath, required String content, int? fileSize, String? fileHash}) - Store a document with automatic chunking and embedding generation
  • Future<Document?> getDocumentByFileName(String fileName) - Retrieve a document by its file name
  • Future<List<Document>> getAllDocuments() - Get all stored documents
  • Future<void> updateDocument(Document document) - Update an existing document and its chunks
  • Future<void> deleteDocument(int id) - Delete a document and all its chunks by ID
  • Future<List<ChunkSearchResult>> search({String? text, int limit = 10}) - Search for the nearest document chunks by generating embeddings for the query text and performing vector similarity search. Results are sorted by distance (lower = more similar)
  • Future<DatabaseStats> getStats() - Get statistics about the database

RAG Data Classes

  • Document({int id = 0, required String fileName, required String filePath, DateTime? createdAt, DateTime? updatedAt, int? fileSize, String? fileHash}) - Represents a stored document with its metadata and associated chunks. Has a content getter that joins all chunk contents.
  • DocumentChunk({int id = 0, required String content, required List<double> embeddings}) - Represents a text chunk with its content and embeddings (1024-dimensional vectors by default)
  • ChunkSearchResult({required DocumentChunk chunk, required double distance}) - Contains a document chunk and its distance score from the query vector (lower distance = more similar). Distance is squared Euclidean distance from ObjectBox HNSW index
  • DatabaseStats({required int totalDocuments, required int documentsWithEmbeddings, required int totalContentLength}) - Contains statistics about the document store including total documents, chunks, and content length
  • EmbeddingGenerator = Future<List<double>> Function(String text) - Function type for generating embeddings from text

See the full example in rag.dart.

Platform-Specific Setup

Android

Add the following permissions to your android/app/src/main/AndroidManifest.xml:

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<!-- Required for speech-to-text functionality -->
<uses-permission android:name="android.permission.RECORD_AUDIO" />

iOS

Add microphone usage description to your ios/Runner/Info.plist for speech-to-text functionality:

<key>NSMicrophoneUsageDescription</key>
<string>This app needs access to the microphone for speech-to-text transcription.</string>

macOS

Add the following to your macos/Runner/DebugProfile.entitlements and macos/Runner/Release.entitlements:

<!-- Network access for model downloads -->
<key>com.apple.security.network.client</key>
<true/>
<!-- Microphone access for speech-to-text -->
<key>com.apple.security.device.microphone</key>
<true/>

Performance Tips

  1. Model Selection: Choose smaller models for faster inference on mobile devices
  2. Context Size: Reduce context size for lower memory usage (e.g., 1024 instead of 2048)
  3. Memory Management: Always call unload() when done with models
  4. Batch Processing: Reuse initialized models for multiple completions
  5. Background Processing: Use Isolate for heavy operations to keep UI responsive
  6. Model Caching: Use getModels() for efficient model discovery - results are cached locally to reduce network requests

Telemetry Setup (Optional)

Cactus comes with powerful built-in telemetry that lets you monitor your projects. Create a token on the Cactus dashboard and get started with a one-line setup in your app:

import 'package:cactus/cactus.dart';

CactusTelemetry.setTelemetryToken("your-token-here");

Example App

Check out our example app for a complete Flutter implementation showing:

  • Model discovery and fetching available models
  • Model downloading with real-time progress indicators
  • Text completion with both regular and streaming modes
  • Speech-to-text transcription with multiple provider support (Vosk and Whisper)
  • Voice model management and provider switching
  • Embedding generation
  • RAG document storage and search
  • Error handling and status management
  • Material Design UI integration

To run the example:

cd example
flutter pub get
flutter run

Next Steps