Hybrid AI

Automatic cloud handoff and confidence-based routing between on-device and cloud models

How It Works

Cactus measures model confidence in real-time during inference. When confidence drops below a threshold, or when the query exceeds device capabilities, Cactus automatically hands off to a cloud model.

Simple queries (clear audio, standard completions) → on-device NPU/CPU
Complex queries (noisy audio, long context, ambiguous prompts) → Cactus Cloud

Setup

Set the CACTUS_CLOUD_API_KEY environment variable and Cactus handles handoff automatically.

For Live Transcription, handoff is fully automatic out of the box. For Language Model and Batch Transcription, contact us to enable cloud handoff.

The CompletionResult includes needsCloudHandoff and confidence fields. When needsCloudHandoff is true, your application should route the request to a cloud API for better accuracy.

final result = model.complete('Explain quantum entanglement');
if (result.needsCloudHandoff) {
  // Route to cloud API
  print('Confidence: ${result.confidence}');
}

The CompletionResult includes needsCloudHandoff and confidence fields. When needsCloudHandoff is true, your application should route the request to a cloud API for better accuracy.

val result = model.complete("Explain quantum entanglement")
if (result.needsCloudHandoff) {
    // Route to cloud API
    println("Confidence: ${result.confidence}")
}

Configure your Cactus API key:

cactus auth

This enables automatic cloud handoff when the local model confidence is low or context exceeds device limits. The CLI automatically routes simple queries to on-device models, falls back to cloud APIs for complex queries or low confidence, handles context window overflow gracefully, and maintains conversation history across cloud/device switches.

Cloud handoff is signaled in the response. Your application should check the cloud_handoff field and route to a cloud API when it is true.

Hybrid Transcription

Live transcription with automatic cloud correction:

import { CactusSTT } from 'cactus-react-native';

const cactusSTT = new CactusSTT({ model: 'whisper-small' });
await cactusSTT.init();

// Automatic handoff to Cactus Cloud when CACTUS_CLOUD_API_KEY is set
await cactusSTT.streamTranscribeStart();

const result = await cactusSTT.streamTranscribeProcess({
  audio: audioChunk
});

// Cactus automatically uses cloud for low-confidence segments
console.log(result.confirmed);  // Uses cloud result when needed
console.log(result.cloudResult); // Cloud transcription if available

# Transcribe with cloud fallback for noisy audio
cactus transcribe openai/whisper-small --file recording.mp3 --cloud-key YOUR_API_KEY

Live Transcription has automatic Cactus Cloud handoff out of the box. For Language Model and Batch Transcription, contact us to enable cloud handoff.

Hybrid Language Model

import { CactusLM } from 'cactus-react-native';

const cactusLM = new CactusLM();
await cactusLM.init();

const result = await cactusLM.complete({
  messages: [{ role: 'user', content: 'Explain quantum entanglement' }]
});

if (result.cloudHandoff) {
  // Use Cactus Cloud for better accuracy
  // Contact us to enable: hello@cactuscompute.com
} else {
  console.log(result.response);
}

The CompletionResult includes needsCloudHandoff and confidence fields. Check these to decide whether to route to a cloud API.

final result = model.complete('Explain quantum entanglement');
if (result.needsCloudHandoff) {
  // Route to cloud API for better accuracy
  print('Confidence too low: ${result.confidence}');
} else {
  print(result.text);
}

The CompletionResult includes needsCloudHandoff and confidence fields. Check these to decide whether to route to a cloud API.

val result = model.complete("Explain quantum entanglement")
if (result.needsCloudHandoff) {
    // Route to cloud API for better accuracy
    println("Confidence too low: ${result.confidence}")
} else {
    println(result.text)
}

When the model lacks confidence or encounters complex tasks:

{
    "success": true,
    "cloud_handoff": true,
    "response": null,
    "confidence": 0.42
}

Your application should route to a cloud API when cloud_handoff is true.

How It Works

Setup

Hybrid Transcription

Hybrid Language Model

On this page