Transcription
Audio transcription with streaming, VAD, and cloud handoff support
Basic Transcription
import { CactusSTT } from 'cactus-react-native';
const cactusSTT = new CactusSTT({ model: 'whisper-small' });
// From file
const result = await cactusSTT.transcribe({
audio: 'path/to/audio.wav',
onToken: (token) => console.log(token)
});
// From raw PCM samples
const pcmSamples: number[] = [/* ... */];
await cactusSTT.transcribe({
audio: pcmSamples
});Using the Hook:
const cactusSTT = useCactusSTT({ model: 'whisper-small' });
const handleTranscribe = async () => {
await cactusSTT.transcribe({
audio: 'path/to/audio.wav'
});
};
return <Text>{cactusSTT.transcription}</Text>;final result = model.transcribe('/path/to/audio.wav');
print(result.text);// 16kHz mono PCM
final pcmData = Uint8List.fromList([...]);
final result = model.transcribePcm(pcmData);
print(result.text);val result = model.transcribe("/path/to/audio.wav")
println(result.text)val pcmData: ByteArray = ... // 16kHz mono PCM
val result = model.transcribe(pcmData)
println(result.text)# Transcribe an audio file
cactus transcribe openai/whisper-small --file recording.mp3
# Live microphone transcription
cactus transcribe UsefulSensors/moonshine-base
# With cloud fallback for noisy audio
cactus transcribe openai/whisper-small --file recording.mp3 --cloud-key YOUR_API_KEYStreaming Transcription
Real-time transcription with incremental results.
await cactusSTT.streamTranscribeStart({
confirmationThreshold: 0.99,
minChunkSize: 32000
});
// Feed audio chunks
const audioChunk: number[] = [/* PCM samples */];
const result = await cactusSTT.streamTranscribeProcess({ audio: audioChunk });
console.log('Confirmed:', result.confirmed);
console.log('Pending:', result.pending);
// Stop streaming
const final = await cactusSTT.streamTranscribeStop();final stream = model.createStreamTranscriber();
stream.insert(audioChunk1);
stream.insert(audioChunk2);
final partial = stream.process();
print('Partial: ${partial.text}');
final finalResult = stream.finalize();
print('Final: ${finalResult.text}');
stream.dispose();model.createStreamTranscriber().use { stream ->
stream.insert(audioChunk1)
stream.insert(audioChunk2)
val partial = stream.process()
println("Partial: ${partial.text}")
val final = stream.finalize()
println("Final: ${final.text}")
}Voice Activity Detection
Detect speech segments in audio before transcribing, using the Silero VAD model.
import { CactusVAD } from 'cactus-react-native';
const cactusVAD = new CactusVAD({ model: 'silero-vad' });
const result = await cactusVAD.vad({
audio: 'path/to/audio.wav',
options: {
threshold: 0.5,
minSpeechDurationMs: 250
}
});
console.log('Speech segments:', result.segments);
// [{ start: 0, end: 16000 }, { start: 32000, end: 48000 }]Using the Hook:
const cactusVAD = useCactusVAD({ model: 'silero-vad' });
const handleVAD = async () => {
await cactusVAD.vad({ audio: 'path/to/audio.wav' });
};VAD is currently available in the React Native SDK. CLI supports file and live microphone transcription.
Supported Models
- Whisper Small/Medium - OpenAI Whisper with Apple NPU support
- Moonshine-Base - Lightweight transcription model
- Silero VAD - Voice activity detection
Performance Tips
- Use VAD - Keep
useVad: true(default) when transcribing to strip silence and improve accuracy - NPU Acceleration - Whisper models support Apple NPU for significantly faster transcription
- Model Selection - Whisper Small is faster, Whisper Medium is more accurate
- Memory - Always call
destroy()/dispose()/close()when done to free resources