Hybrid AI
Automatic cloud handoff and confidence-based routing between on-device and cloud models
How It Works
Cactus measures model confidence in real-time during inference. When confidence drops below a threshold, or when the query exceeds device capabilities, Cactus automatically hands off to a cloud model.
- Simple queries (clear audio, standard completions) → on-device NPU/CPU
- Complex queries (noisy audio, long context, ambiguous prompts) → Cactus Cloud
Setup
Set the CACTUS_CLOUD_API_KEY environment variable and Cactus handles handoff automatically.
For Live Transcription, handoff is fully automatic out of the box. For Language Model and Batch Transcription, contact us to enable cloud handoff.
The CompletionResult includes needsCloudHandoff and confidence fields. When needsCloudHandoff is true, your application should route the request to a cloud API for better accuracy.
final result = model.complete('Explain quantum entanglement');
if (result.needsCloudHandoff) {
// Route to cloud API
print('Confidence: ${result.confidence}');
}The CompletionResult includes needsCloudHandoff and confidence fields. When needsCloudHandoff is true, your application should route the request to a cloud API for better accuracy.
val result = model.complete("Explain quantum entanglement")
if (result.needsCloudHandoff) {
// Route to cloud API
println("Confidence: ${result.confidence}")
}Configure your Cactus API key:
cactus authThis enables automatic cloud handoff when the local model confidence is low or context exceeds device limits. The CLI automatically routes simple queries to on-device models, falls back to cloud APIs for complex queries or low confidence, handles context window overflow gracefully, and maintains conversation history across cloud/device switches.
Cloud handoff is signaled in the response. Your application should check the cloud_handoff field and route to a cloud API when it is true.
Hybrid Transcription
Live transcription with automatic cloud correction:
import { CactusSTT } from 'cactus-react-native';
const cactusSTT = new CactusSTT({ model: 'whisper-small' });
await cactusSTT.init();
// Automatic handoff to Cactus Cloud when CACTUS_CLOUD_API_KEY is set
await cactusSTT.streamTranscribeStart();
const result = await cactusSTT.streamTranscribeProcess({
audio: audioChunk
});
// Cactus automatically uses cloud for low-confidence segments
console.log(result.confirmed); // Uses cloud result when needed
console.log(result.cloudResult); // Cloud transcription if available# Transcribe with cloud fallback for noisy audio
cactus transcribe openai/whisper-small --file recording.mp3 --cloud-key YOUR_API_KEYLive Transcription has automatic Cactus Cloud handoff out of the box. For Language Model and Batch Transcription, contact us to enable cloud handoff.
Hybrid Language Model
import { CactusLM } from 'cactus-react-native';
const cactusLM = new CactusLM();
await cactusLM.init();
const result = await cactusLM.complete({
messages: [{ role: 'user', content: 'Explain quantum entanglement' }]
});
if (result.cloudHandoff) {
// Use Cactus Cloud for better accuracy
// Contact us to enable: hello@cactuscompute.com
} else {
console.log(result.response);
}The CompletionResult includes needsCloudHandoff and confidence fields. Check these to decide whether to route to a cloud API.
final result = model.complete('Explain quantum entanglement');
if (result.needsCloudHandoff) {
// Route to cloud API for better accuracy
print('Confidence too low: ${result.confidence}');
} else {
print(result.text);
}The CompletionResult includes needsCloudHandoff and confidence fields. Check these to decide whether to route to a cloud API.
val result = model.complete("Explain quantum entanglement")
if (result.needsCloudHandoff) {
// Route to cloud API for better accuracy
println("Confidence too low: ${result.confidence}")
} else {
println(result.text)
}When the model lacks confidence or encounters complex tasks:
{
"success": true,
"cloud_handoff": true,
"response": null,
"confidence": 0.42
}Your application should route to a cloud API when cloud_handoff is true.