CactusCactus

Transcription

Audio transcription with streaming, VAD, and cloud handoff support

Basic Transcription

import { CactusSTT } from 'cactus-react-native';

const cactusSTT = new CactusSTT({ model: 'whisper-small' });

// From file
const result = await cactusSTT.transcribe({
  audio: 'path/to/audio.wav',
  onToken: (token) => console.log(token)
});

// From raw PCM samples
const pcmSamples: number[] = [/* ... */];
await cactusSTT.transcribe({
  audio: pcmSamples
});

Using the Hook:

const cactusSTT = useCactusSTT({ model: 'whisper-small' });

const handleTranscribe = async () => {
  await cactusSTT.transcribe({
    audio: 'path/to/audio.wav'
  });
};

return <Text>{cactusSTT.transcription}</Text>;
final result = model.transcribe('/path/to/audio.wav');
print(result.text);
// 16kHz mono PCM
final pcmData = Uint8List.fromList([...]);
final result = model.transcribePcm(pcmData);
print(result.text);
val result = model.transcribe("/path/to/audio.wav")
println(result.text)
val pcmData: ByteArray = ... // 16kHz mono PCM
val result = model.transcribe(pcmData)
println(result.text)
# Transcribe an audio file
cactus transcribe openai/whisper-small --file recording.mp3

# Live microphone transcription
cactus transcribe UsefulSensors/moonshine-base

# With cloud fallback for noisy audio
cactus transcribe openai/whisper-small --file recording.mp3 --cloud-key YOUR_API_KEY

Streaming Transcription

Real-time transcription with incremental results.

await cactusSTT.streamTranscribeStart({
  confirmationThreshold: 0.99,
  minChunkSize: 32000
});

// Feed audio chunks
const audioChunk: number[] = [/* PCM samples */];
const result = await cactusSTT.streamTranscribeProcess({ audio: audioChunk });

console.log('Confirmed:', result.confirmed);
console.log('Pending:', result.pending);

// Stop streaming
const final = await cactusSTT.streamTranscribeStop();
final stream = model.createStreamTranscriber();
stream.insert(audioChunk1);
stream.insert(audioChunk2);

final partial = stream.process();
print('Partial: ${partial.text}');

final finalResult = stream.finalize();
print('Final: ${finalResult.text}');

stream.dispose();
model.createStreamTranscriber().use { stream ->
    stream.insert(audioChunk1)
    stream.insert(audioChunk2)

    val partial = stream.process()
    println("Partial: ${partial.text}")

    val final = stream.finalize()
    println("Final: ${final.text}")
}

Voice Activity Detection

Detect speech segments in audio before transcribing, using the Silero VAD model.

import { CactusVAD } from 'cactus-react-native';

const cactusVAD = new CactusVAD({ model: 'silero-vad' });

const result = await cactusVAD.vad({
  audio: 'path/to/audio.wav',
  options: {
    threshold: 0.5,
    minSpeechDurationMs: 250
  }
});

console.log('Speech segments:', result.segments);
// [{ start: 0, end: 16000 }, { start: 32000, end: 48000 }]

Using the Hook:

const cactusVAD = useCactusVAD({ model: 'silero-vad' });

const handleVAD = async () => {
  await cactusVAD.vad({ audio: 'path/to/audio.wav' });
};

VAD is currently available in the React Native SDK. CLI supports file and live microphone transcription.

Supported Models

  • Whisper Small/Medium - OpenAI Whisper with Apple NPU support
  • Moonshine-Base - Lightweight transcription model
  • Silero VAD - Voice activity detection

Performance Tips

  • Use VAD - Keep useVad: true (default) when transcribing to strip silence and improve accuracy
  • NPU Acceleration - Whisper models support Apple NPU for significantly faster transcription
  • Model Selection - Whisper Small is faster, Whisper Medium is more accurate
  • Memory - Always call destroy() / dispose() / close() when done to free resources