Kotlin Multiplatform SDK
Complete guide to using Cactus SDK in Kotlin applications
Cactus Kotlin
Official Kotlin Multiplatform library for Cactus, a framework for deploying LLM and STT models locally in your app.
Installation
Dependency List
Add to settings.gradle.kts
dependencyResolutionManagement {
repositories {
maven {
name = "GitHubPackagesCactus"
url = uri("https://maven.pkg.github.com/cactus-compute/cactus-kotlin")
credentials {
username = properties.getProperty("github.username") ?: System.getenv("GITHUB_ACTOR")
password = properties.getProperty("github.token") ?: System.getenv("GITHUB_TOKEN")
}
}
}
}Add credentials
Add your GitHub username and token to local.properties:
github.username=your-username
github.token=your-personal-access-tokenYou can generate a personal access token by following the instructions GitHub's documentation. The token needs read:packages scope.
Or set them as environment variables: GITHUB_ACTOR and GITHUB_TOKEN.
Gradle Build
Add to your KMP project's build.gradle.kts:
kotlin {
sourceSets {
commonMain {
dependencies {
implementation("com.cactus:library:1.0.0-beta")
}
}
}
}Grant Permissions
Add the permissions to your manifest (Android):
<uses-permission android:name="android.permission.INTERNET" /> // for model downloads
<uses-permission android:name="android.permission.RECORD_AUDIO" /> // for transcriptionContext Initialization (Required)
Before using any Cactus SDK functionality, you must initialize the context in your Activity's onCreate() method:
import com.cactus.CactusContextInitializer
class MainActivity : ComponentActivity() {
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
// Initialize Cactus context (required)
CactusContextInitializer.initialize(this)
// ... rest of your code
}
}Language Model (LLM)
The CactusLM class provides text completion capabilities with support for function calling (WIP).
Basic Usage
import com.cactus.CactusLM
import com.cactus.CactusInitParams
import com.cactus.CactusCompletionParams
import com.cactus.ChatMessage
import kotlinx.coroutines.runBlocking
runBlocking {
val lm = CactusLM()
try {
// Download a model by slug (e.g., "qwen3-0.6", "gemma3-270m")
// If no model is specified, it defaults to "qwen3-0.6"
// Throws exception on failure
lm.downloadModel("qwen3-0.6")
// Initialize the model
// Throws exception on failure
lm.initializeModel(
CactusInitParams(
model = "qwen3-0.6",
contextSize = 2048
)
)
// Generate completion with default parameters
val result = lm.generateCompletion(
messages = listOf(
ChatMessage(content = "Hello, how are you?", role = "user")
)
)
result?.let { response ->
if (response.success) {
println("Response: ${response.response}")
println("Tokens per second: ${response.tokensPerSecond}")
println("Time to first token: ${response.timeToFirstTokenMs}ms")
}
}
} finally {
// Clean up
lm.unload()
}
}Inference Modes
CactusLM supports hybrid inference modes that allow fallback between local and cloud-based processing:
val result = lm.generateCompletion(
messages = listOf(ChatMessage(content = "Hello!", role = "user")),
params = CactusCompletionParams(
mode = InferenceMode.LOCAL_FIRST, // Try local first, fallback to cloud
cactusToken = "your_cactus_token"
)
)Available modes:
InferenceMode.LOCAL- Local inference only (default)InferenceMode.REMOTE- Cloud-based inference onlyInferenceMode.LOCAL_FIRST- Try local first, fallback to cloudInferenceMode.REMOTE_FIRST- Try cloud first, fallback to local
To get a cactusToken, join our Discord community and contact us.
Streaming Completions
val result = lm.generateCompletion(
messages = listOf(ChatMessage("Tell me a story", "user")),
params = CactusCompletionParams(maxTokens = 200),
onToken = { token, tokenId ->
print(token) // Print each token as it's generated
}
)Function Calling (Experimental)
import com.cactus.models.ToolParameter
import com.cactus.models.createTool
val tools = listOf(
createTool(
name = "get_weather",
description = "Get current weather for a location",
parameters = mapOf(
"location" to ToolParameter(
type = "string",
description = "City name",
required = true
)
)
)
)
val result = lm.generateCompletion(
messages = listOf(ChatMessage("What's the weather in New York?", "user")),
params = CactusCompletionParams(
maxTokens = 100,
tools = tools
)
)Available Models
You can get a list of available models:
lm.getModels()LLM API Reference
CactusLM Class
suspend fun downloadModel(model: String = "qwen3-0.6"): Boolean- Download a modelsuspend fun initializeModel(params: CactusInitParams): Boolean- Initialize model for inferencesuspend fun generateCompletion(messages: List<ChatMessage>, params: CactusCompletionParams = CactusCompletionParams(), onToken: CactusStreamingCallback? = null): CactusCompletionResult?- Generate text completion with support for streaming and inference modessuspend fun generateEmbedding(text: String, modelName: String? = null): CactusEmbeddingResult?- Generate text embeddingsfun unload()- Free model from memorysuspend fun getModels(): List<CactusModel>- Get available LLM modelsfun isLoaded(): Boolean- Check if model is loaded
Data Classes
CactusInitParams(model: String?, contextSize: Int?)- Model initialization parametersCactusCompletionParams(temperature: Double, topK: Int, topP: Double, maxTokens: Int, stopSequences: List<String>, tools: List<Tool>?, mode: InferenceMode, cactusToken: String?, model: String?)- Completion parametersChatMessage(content: String, role: String, timestamp: Long?)- Chat message formatCactusCompletionResult- Contains response, timing metrics, and success statusCactusEmbeddingResult(success: Boolean, embeddings: List<Double>, dimension: Int, errorMessage: String?)- Embedding generation resultInferenceMode- Enum for inference modes (LOCAL, REMOTE, LOCAL_FIRST, REMOTE_FIRST)
Embeddings
The CactusLM class also provides text embedding generation capabilities for semantic similarity, search, and other NLP tasks.
Basic Usage
import com.cactus.CactusLM
import com.cactus.CactusInitParams
import kotlinx.coroutines.runBlocking
runBlocking {
val lm = CactusLM()
// Download and initialize a model (same as for completions)
lm.downloadModel("qwen3-0.6")
lm.initializeModel(CactusInitParams(model = "qwen3-0.6", contextSize = 2048))
// Generate embeddings for a text
val result = lm.generateEmbedding(
text = "This is a sample text for embedding generation"
)
result?.let { embedding ->
if (embedding.success) {
println("Embedding dimension: ${embedding.dimension}")
println("Embedding vector length: ${embedding.embeddings.size}")
} else {
println("Embedding generation failed: ${embedding.errorMessage}")
}
}
lm.unload()
}Embedding API Reference
CactusLM Class (Embedding Methods)
suspend fun generateEmbedding(text: String, modelName: String? = null): CactusEmbeddingResult?- Generate embeddings for the given text.suspend fun getModels(): List<CactusModel>- Get a list of available models. Results are cached locally to reduce network requests.fun unload()- Unload the current model and free resources.fun isLoaded(): Boolean- Check if a model is currently loaded.
Data Classes
CactusInitParams(model: String? = null, contextSize: Int? = null)- Parameters for model initialization.CactusCompletionParams(model: String? = null, temperature: Double? = null, topK: Int? = null, topP: Double? = null, maxTokens: Int = 200, stopSequences: List<String> = listOf("<|im_end|>", "<end_of_turn>"), tools: List<CactusTool> = emptyList(), mode: InferenceMode = InferenceMode.LOCAL, cactusToken: String? = null)- Parameters for text completion.CactusCompletionResult(success: Boolean, response: String? = null, timeToFirstTokenMs: Double? = null, totalTimeMs: Double? = null, tokensPerSecond: Double? = null, prefillTokens: Int? = null, decodeTokens: Int? = null, totalTokens: Int? = null, toolCalls: List<ToolCall>? = emptyList())- The result of a text completion.CactusEmbeddingResult(success: Boolean, embeddings: List<Double> = listOf(), dimension: Int? = null, errorMessage: String? = null)- The result of embedding generation.ChatMessage(content: String, role: String, timestamp: Long? = null)- A chat message with role (e.g., "user", "assistant").CactusModel(created_at: String, slug: String, download_url: String, size_mb: Int, supports_tool_calling: Boolean, supports_vision: Boolean, name: String, isDownloaded: Boolean = false, quantization: Int = 8)- Information about an available model.InferenceMode- Enum for selecting inference mode (LOCAL,REMOTE,LOCAL_FIRST,REMOTE_FIRST).ToolCall(name: String, arguments: Map<String, String>)- Represents a tool call returned by the model.CactusTool(type: String = "function", function: CactusFunction)- Defines a tool that can be called by the model.CactusFunction(name: String, description: String, parameters: ToolParametersSchema)- Function definition for a tool.ToolParametersSchema(type: String = "object", properties: Map<String, ToolParameter>, required: List<String>)- Schema for tool parameters.ToolParameter(type: String, description: String, required: Boolean = false)- A parameter definition for a tool.
Helper Functions
createTool(name: String, description: String, parameters: Map<String, ToolParameter>): CactusTool- Helper function to create a tool with the correct schema.
Speech-to-Text (STT)
The CactusSTT class provides speech recognition capabilities using on-device models from providers like Vosk and Whisper.
Choosing a Transcription Provider
You can select a transcription provider when initializing CactusSTT. The available providers are:
TranscriptionProvider.VOSK(Default): Uses Vosk for transcription.TranscriptionProvider.WHISPER: Uses Whisper for transcription.
import com.cactus.CactusSTT
import com.cactus.TranscriptionProvider
// Initialize with the VOSK provider (default)
val sttVosk = CactusSTT()
// Or explicitly initialize with the WHISPER provider
val sttWhisper = CactusSTT(TranscriptionProvider.WHISPER)Basic Usage
With Vosk
import com.cactus.CactusSTT
import com.cactus.SpeechRecognitionParams
import kotlinx.coroutines.runBlocking
runBlocking {
val stt = CactusSTT()
// Download STT model (default: vosk-en-us)
val downloadSuccess = stt.download("vosk-en-us")
// Initialize the model
val initSuccess = stt.init("vosk-en-us")
// Transcribe from microphone
val result = stt.transcribe(
SpeechRecognitionParams(
maxSilenceDuration = 1000L,
maxDuration = 30000L,
sampleRate = 16000
)
)
result?.let { transcription ->
if (transcription.success) {
println("Transcribed: ${transcription.text}")
println("Processing time: ${transcription.processingTime}ms")
}
}
// Transcribe from audio file
val fileResult = stt.transcribe(
SpeechRecognitionParams(),
filePath = "/path/to/audio.wav"
)
// Stop transcription
stt.stop()
}Tool Filtering
The ToolFilterService enables intelligent filtering of tools to optimize function calling by selecting only the most relevant tools for a given user query.
Configuration
Configure tool filtering when creating a CactusLM instance:
import com.cactus.CactusLM
import com.cactus.services.ToolFilterConfig
import com.cactus.services.ToolFilterStrategy
// Enable with default settings (SIMPLE strategy, max 3 tools)
val lm = CactusLM(
enableToolFiltering = true,
toolFilterConfig = ToolFilterConfig.simple(maxTools = 3)
)
// Custom configuration with SIMPLE strategy
val lm = CactusLM(
enableToolFiltering = true,
toolFilterConfig = ToolFilterConfig(
strategy = ToolFilterStrategy.SIMPLE,
maxTools = 5,
similarityThreshold = 0.3
)
)
// Use SEMANTIC strategy for more accurate filtering
val lm = CactusLM(
enableToolFiltering = true,
toolFilterConfig = ToolFilterConfig(
strategy = ToolFilterStrategy.SEMANTIC,
maxTools = 3,
similarityThreshold = 0.5
)
)
// Disable tool filtering
val lm = CactusLM(enableToolFiltering = false)Configuration Parameters
strategy- The filtering algorithm:SIMPLE(default, fast) orSEMANTIC(slower but more accurate)maxTools- Maximum number of tools to pass to the model (default: null, meaning no limit)similarityThreshold- Minimum score required for a tool to be included (default: 0.3)
Available Voice Models
// Get list of available voice models
stt.getVoiceModels()
// Check if model is downloaded
stt.isModelDownloaded("vosk-en-us")STT API Reference
CactusSTT Class
CactusSTT(provider: TranscriptionProvider = TranscriptionProvider.VOSK)- Constructor to specify the transcription provider.suspend fun download(model: String = "vosk-en-us"): Boolean- Download an STT model (e.g., "vosk-en-us" or "whisper-tiny-en"). Defaults to last downloaded model.suspend fun init(model: String = "vosk-en-us"): Boolean- Initialize an STT model for transcription. Defaults to last downloaded model.suspend fun transcribe(params: SpeechRecognitionParams = SpeechRecognitionParams(), filePath: String? = null, mode: TranscriptionMode = TranscriptionMode.LOCAL, apiKey: String? = null): SpeechRecognitionResult?- Transcribe speech from microphone or file. Supports different transcription modes.suspend fun warmUpWispr(apiKey: String)- Warms up the remote Wispr service for lower latency.fun stop()- Stop ongoing transcription.fun isReady(): Boolean- Check if the STT service is initialized and ready.suspend fun getVoiceModels(provider: TranscriptionProvider = this.provider): List<VoiceModel>- Get a list of available voice models for the specified provider. Defaults to the instance's provider.suspend fun isModelDownloaded(modelName: String = "vosk-en-us"): Boolean- Check if a specific model has been downloaded. Defaults to last downloaded model.
Data Classes
TranscriptionProvider- Enum for selecting the provider (VOSK,WHISPER).SpeechRecognitionParams(maxSilenceDuration: Long = 1000L, maxDuration: Long = 30000L, sampleRate: Int = 16000)- Parameters for controlling speech recognition.SpeechRecognitionResult(success: Boolean, text: String? = null, eventSuccess: Boolean = true, processingTime: Double? = null)- The result of a transcription.VoiceModel(created_at: String, slug: String, language: String, url: String, size_mb: Int, file_name: String, provider: String = "vosk", isDownloaded: Boolean = false)- Contains information about an available voice model.TranscriptionMode- Enum for transcription mode (LOCAL,REMOTE,LOCAL_FIRST,REMOTE_FIRST).
Platform-Specific Setup
Android
- Works automatically - native libraries included
- Requires API 24+ (Android 7.0)
- ARM64 architecture supported
iOS
- Add the Cactus package dependency in Xcode
- Requires iOS 12.0+
- Supports ARM64 and Simulator ARM64
Building the Library
To build the library from source:
# Build the library and publish to localMaven
./build_library.shTelemetry Setup (Optional)
Cactus comes with powerful built-in telemetry that lets you monitor your projects. Create a token on the Cactus dashboard and get started with a one-line setup in your app:
import com.cactus.services.CactusTelemetry
// Initialize telemetry for usage analytics (optional)
CactusTelemetry.setTelemetryToken("your_token_here")Example App
Navigate to the example app and run it:
cd kotlin/example
# For desktop
./gradlew :composeApp:run
# For Android/iOS - use Android Studio or XcodeThe example app demonstrates:
- Model downloading and initialization
- Text completion with streaming
- Function calling
- Speech-to-text transcription
- Error handling and status management
Performance Tips
- Model Selection: Choose smaller models for faster inference on mobile devices
- Context Size: Reduce context size for lower memory usage
- Memory Management: Always call
unload()when done with models - Batch Processing: Reuse initialized models for multiple completions
- Model Caching: Use
getModels()for efficient model discovery - results are cached locally to reduce network requests