Flutter SDK
Complete guide to using Cactus SDK in Flutter applications
Cactus Flutter
Official Flutter library for Cactus, a framework for deploying LLM and STT models locally in your app.
Installation
Dependency List
Add to pubspec.yaml
dependencies:
cactus:
git:
url: https://github.com/cactus-compute/cactus-flutter.git
ref: main
Install Dependencies
flutter pub get
Platform Requirements
- iOS: iOS 12.0+
- Android: API level 24+
Quickstart
Get started from scratch by cloning our example app.
Language Model (LLM)
The CactusLM
class provides text completion capabilities with high-performance local inference.
Basic Usage
import 'package:cactus/cactus.dart';
Future<void> basicExample() async {
final lm = CactusLM();
try {
// Download a model with progress callback (default: qwen3-0.6)
await lm.downloadModel(
downloadProcessCallback: (progress, status, isError) {
if (isError) {
print("Download error: $status");
} else {
print("$status ${progress != null ? '(${progress * 100}%)' : ''}");
}
},
);
// Initialize the model
await lm.initializeModel();
// Generate completion with default parameters
final result = await lm.generateCompletion(
messages: [
ChatMessage(content: "Hello, how are you?", role: "user"),
],
);
if (result.success) {
print("Response: ${result.response}");
print("Tokens per second: ${result.tokensPerSecond}");
print("Time to first token: ${result.timeToFirstTokenMs}ms");
}
} finally {
// Clean up
lm.unload();
}
}
Streaming Completions
Future<void> streamingExample() async {
final lm = CactusLM();
await lm.downloadModel();
await lm.initializeModel();
// Get the streaming response with default parameters
final streamedResult = await lm.generateCompletionStream(
messages: [ChatMessage(content: "Tell me a story", role: "user")],
);
// Process streaming output
await for (final chunk in streamedResult.stream) {
print(chunk);
}
// You can also get the full completion result after the stream is done
final finalResult = await streamedResult.result;
if (finalResult.success) {
print("Final response: ${finalResult.response}");
print("Tokens per second: ${finalResult.tokensPerSecond}");
}
lm.unload();
}
Function Calling (Experimental)
Future<void> functionCallingExample() async {
final lm = CactusLM();
await lm.downloadModel();
await lm.initializeModel();
final tools = [
CactusTool(
name: "get_weather",
description: "Get current weather for a location",
parameters: ToolParametersSchema(
properties: {
'location': ToolParameter(type: 'string', description: 'City name', required: true),
},
),
),
];
final result = await lm.generateCompletion(
messages: [ChatMessage(content: "What's the weather in New York?", role: "user")],
params: CactusCompletionParams(
tools: tools
)
);
if (result.success) {
print("Response: ${result.response}");
print("Tools: ${result.toolCalls}");
}
lm.unload();
}
Hybrid Completion (Cloud Fallback)
The CactusLM
supports a hybrid
completion mode that falls back to a cloud-based LLM provider (OpenRouter) if local inference fails or is not available. This ensures reliability and provides a seamless experience.
To use hybrid mode:
- Set
completionMode
toCompletionMode.hybrid
inCactusCompletionParams
. - Provide an
cactusToken
togenerateCompletion
orgenerateCompletionStream
.
import 'package:cactus/cactus.dart';
Future<void> hybridCompletionExample() async {
final lm = CactusLM();
// No model download or initialization needed if you only want to use cloud
final result = await lm.generateCompletion(
messages: [ChatMessage(content: "What's the weather in New York?", role: "user")],
params: CactusCompletionParams(
completionMode: CompletionMode.hybrid
),
cactusToken: "YOUR_CACTUS_TOKEN",
);
if (result.success) {
print("Response: ${result.response}");
}
lm.unload();
}
Fetching Available Models
Future<void> fetchModelsExample() async {
final lm = CactusLM();
// Get list of available models with caching
final models = await lm.getModels();
for (final model in models) {
print("Model: ${model.name}");
print("Slug: ${model.slug}");
print("Size: ${model.sizeMb} MB");
print("Downloaded: ${model.isDownloaded}");
print("Supports Tool Calling: ${model.supportsToolCalling}");
print("Supports Vision: ${model.supportsVision}");
print("---");
}
}
Default Parameters
The CactusLM
class provides sensible defaults for completion parameters:
temperature: 0.1
- Controls randomness (0.0 = deterministic, 1.0 = very random)topK: 40
- Number of top tokens to considertopP: 0.95
- Nucleus sampling parametermaxTokens: 200
- Maximum tokens to generatebufferSize: 1024
- Internal buffer size for processingcompletionMode: CompletionMode.local
- Default to local-only inference.
LLM API Reference
CactusLM Class
Future<void> downloadModel({String model = "qwen3-0.6", CactusProgressCallback? downloadProcessCallback})
- Download a model with optional progress callbackFuture<void> initializeModel(CactusInitParams params)
- Initialize model for inferenceFuture<CactusCompletionResult> generateCompletion({required List<ChatMessage> messages, CactusCompletionParams? params, String? cactusToken})
- Generate text completion (uses default params if none provided)Future<CactusStreamedCompletionResult> generateCompletionStream({required List<ChatMessage> messages, CactusCompletionParams? params, List<CactusTool>? tools, String? cactusToken})
- Generate streaming text completion (uses default params if none provided)Future<List<CactusModel>> getModels()
- Fetch available models with cachingFuture<CactusEmbeddingResult?> generateEmbedding({required String text, int bufferSize = 2048})
- Generate text embeddingsvoid unload()
- Free model from memorybool isLoaded()
- Check if model is loaded
Data Classes
CactusInitParams({String? model, int? contextSize})
- Model initialization parametersCactusCompletionParams({double temperature, int topK, double topP, int maxTokens, List<String> stopSequences, int bufferSize, List<CactusTool>? tools, CompletionMode completionMode})
- Completion parametersChatMessage({required String content, required String role, int? timestamp})
- Chat message formatCactusCompletionResult
- Contains response, timing metrics, and success statusCactusStreamedCompletionResult
- Contains the stream and the final result of a streamed completion.CactusModel({required String name, required String slug, required int sizeMb, required bool supportsToolCalling, required bool supportsVision, required bool isDownloaded})
- Model informationCactusEmbeddingResult({required bool success, required List<double> embeddings, required int dimension, String? errorMessage})
- Embedding generation resultCactusTool({required String name, required String description, required Map<String, CactusToolParameter> parameters})
- Function calling tool definitionCactusToolParameter({required String type, required String description, required bool required})
- Tool parameter specificationCactusProgressCallback = void Function(double? progress, String statusMessage, bool isError)
- Progress callback for downloadsCompletionMode
- Enum for completion mode (local
orhybrid
).
Embeddings
The CactusLM
class also provides text embedding generation capabilities for semantic similarity, search, and other NLP tasks.
Basic Usage
import 'package:cactus/cactus.dart';
Future<void> embeddingExample() async {
final lm = CactusLM();
try {
// Download and initialize a model (same as for completions)
await lm.downloadModel();
await lm.initializeModel();
// Generate embeddings for a text
final result = await lm.generateEmbedding(
text: "This is a sample text for embedding generation",
bufferSize: 2048,
);
if (result.success) {
print("Embedding dimension: ${result.dimension}");
print("Embedding vector length: ${result.embeddings.length}");
print("First few values: ${result.embeddings.take(5)}");
} else {
print("Embedding generation failed: ${result?.errorMessage}");
}
} finally {
lm.unload();
}
}
Embedding API Reference
CactusLM Class (Embedding Methods)
Future<CactusEmbeddingResult?> generateEmbedding({required String text, int bufferSize = 2048})
- Generate text embeddings
Embedding Data Classes
CactusEmbeddingResult({required bool success, required List<double> embeddings, required int dimension, String? errorMessage})
- Contains the generated embedding vector and metadata
Platform-Specific Setup
Android
Add the following permissions to your android/app/src/main/AndroidManifest.xml
:
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
Performance Tips
- Model Selection: Choose smaller models for faster inference on mobile devices
- Context Size: Reduce context size for lower memory usage (e.g., 1024 instead of 2048)
- Memory Management: Always call
unload()
when done with models - Batch Processing: Reuse initialized models for multiple completions
- Background Processing: Use
Isolate
for heavy operations to keep UI responsive - Model Caching: Use
getModels()
for efficient model discovery - results are cached locally to reduce network requests
Telemetry Setup (Optional)
Cactus comes with powerful built-in telemetry that lets you monitor your projects. Create a token on the Cactus dashboard and get started with a one-line setup in your app:
import 'package:cactus/cactus.dart';
CactusTelemetry.setTelemetryToken("your-token-here");
Example App
Check out our example app for a complete Flutter implementation showing:
- Model discovery and fetching available models
- Model downloading with real-time progress indicators
- Text completion with both regular and streaming modes
- Embedding generation
- Error handling and status management
- Material Design UI integration
To run the example:
cd example
flutter pub get
flutter run