Cactus v1 is in beta!
After months of development and feedback from our community, we're launching the new Cactus SDK with significant architectural improvements and performance optimizations.
Cactus v0 served as our foundational release, proving that high-performance on-device AI inference was possible on mobile devices. However, as developers began building more complex applications, we identified several areas for improvement in scalability, developer experience, and platform consistency.
v1 represents a complete overhaul of our inference engine with optimized ARM-CPU kernels that deliver substantially better performance across all supported devices. We've rebuilt our SDKs from the ground up to provide consistent APIs across Flutter, Kotlin Multiplatform, and C++, while maintaining backward compatibility where possible.
The new architecture is more energy-efficient and more stable on lower-end devices. It also introduces hybrid completion modes that seamlessly fall back to cloud inference when needed, ensuring reliability in production applications. This addresses one of the most common requests from v0 users who needed guaranteed response times for critical user-facing features.
We've also completely redesigned our telemetry and monitoring systems to give developers granular insights into their AI model performance, usage patterns, and potential optimization opportunities. This data-driven approach enables teams to make informed decisions about model selection and deployment strategies. Get started with telemetry here.
For existing Cactus v0 users, we recommend migrating to v1 to take advantage of existing and upcoming performance improvements. React Native developers can continue using v0 while we finalize the v1 bindings, which are currently in development.
v0 | v1 | ||||
---|---|---|---|---|---|
React Native | Flutter | React Native | Flutter | Kotlin | |
LLM Inference | Soon | ||||
Tool calling | Soon | ||||
Embeddings | Soon | ||||
Voice transcription | Soon | Soon | |||
Voice synthesis | Soon | Soon | Soon | ||
Image embedding | Soon | Soon | Soon | ||
RAG | Soon | Soon | |||
Model format | GGUF | GGUF | Cactus | Cactus | Cactus |
* Production benchmarks using Qwen3 0.6B Q8 running CPU-only inference on an iPhone 16 Pro Max