Kotlin Multiplatform SDK
Complete guide to using Cactus SDK in Kotlin applications
Cactus Kotlin
Official Kotlin Multiplatform library for Cactus, a framework for deploying LLM and STT models locally in your app.
Installation
Dependency List
Add to settings.gradle.kts
dependencyResolutionManagement {
repositories {
maven {
name = "GitHubPackagesCactus"
url = uri("https://maven.pkg.github.com/cactus-compute/cactus-kotlin")
credentials {
username = properties.getProperty("github.username") ?: System.getenv("GITHUB_ACTOR")
password = properties.getProperty("github.token") ?: System.getenv("GITHUB_TOKEN")
}
}
}
}
Add credentials
Add your GitHub username and token to local.properties
:
github.username=your-username
github.token=your-personal-access-token
You can generate a personal access token by following the instructions GitHub's documentation. The token needs read:packages
scope.
Or set them as environment variables: GITHUB_ACTOR
and GITHUB_TOKEN
.
Gradle Build
Add to your KMP project's build.gradle.kts
:
kotlin {
sourceSets {
commonMain {
dependencies {
implementation("com.cactus:library:0.3-beta.1")
}
}
}
}
Grant Permissions
Add the permissions to your manifest (Android):
<uses-permission android:name="android.permission.INTERNET" /> // for model downloads
<uses-permission android:name="android.permission.RECORD_AUDIO" /> // for transcription
Language Model (LLM)
The CactusLM
class provides text completion capabilities with support for function calling (WIP).
Basic Usage
import com.cactus.CactusLM
import com.cactus.CactusInitParams
import com.cactus.CactusCompletionParams
import com.cactus.ChatMessage
import kotlinx.coroutines.runBlocking
runBlocking {
val lm = CactusLM()
// Download a model (default: qwen3-0.6)
val downloadSuccess = lm.downloadModel("qwen3-0.6")
// Initialize the model
val initSuccess = lm.initializeModel(
CactusInitParams(
model = "qwen3-0.6",
contextSize = 2048
)
)
// Generate completion
val result = lm.generateCompletion(
messages = listOf(
ChatMessage(content = "Hello, how are you?", role = "user")
),
params = CactusCompletionParams(
maxTokens = 100,
temperature = 0.7,
topK = 40,
topP = 0.95
)
)
result?.let { response ->
if (response.success) {
println("Response: ${response.response}")
println("Tokens per second: ${response.tokensPerSecond}")
println("Time to first token: ${response.timeToFirstTokenMs}ms")
}
}
// Clean up
lm.unload()
}
Streaming Completions
val result = lm.generateCompletion(
messages = listOf(ChatMessage("Tell me a story", "user")),
params = CactusCompletionParams(maxTokens = 200),
onToken = { token, tokenId ->
print(token) // Print each token as it's generated
}
)
Function Calling (Experimental)
import com.cactus.models.ToolParameter
import com.cactus.models.createTool
val tools = listOf(
createTool(
name = "get_weather",
description = "Get current weather for a location",
parameters = mapOf(
"location" to ToolParameter(
type = "string",
description = "City name",
required = true
)
)
)
)
val result = lm.generateCompletion(
messages = listOf(ChatMessage("What's the weather in New York?", "user")),
params = CactusCompletionParams(
maxTokens = 100,
tools = tools
)
)
Available Models
You can get a list of available models:
lm.getModels()
LLM API Reference
CactusLM Class
suspend fun downloadModel(model: String = "qwen3-0.6"): Boolean
- Download a modelsuspend fun initializeModel(params: CactusInitParams): Boolean
- Initialize model for inferencesuspend fun generateCompletion(messages: List<ChatMessage>, params: CactusCompletionParams, onToken: CactusStreamingCallback? = null): CactusCompletionResult?
- Generate text completionfun unload()
- Free model from memorysuspend fun getModels(): List<CactusModel>
- Get available LLM modelsfun isLoaded(): Boolean
- Check if model is loaded
Data Classes
CactusInitParams(model: String?, contextSize: Int?)
- Model initialization parametersCactusCompletionParams(temperature: Double, topK: Int, topP: Double, maxTokens: Int, stopSequences: List<String>, bufferSize: Int, tools: List<Tool>?)
- Completion parametersChatMessage(content: String, role: String, timestamp: Long?)
- Chat message formatCactusCompletionResult
- Contains response, timing metrics, and success statusCactusEmbeddingResult(success: Boolean, embeddings: List<Double>, dimension: Int, errorMessage: String?)
- Embedding generation result
Embeddings
The CactusLM
class also provides text embedding generation capabilities for semantic similarity, search, and other NLP tasks.
Basic Usage
import com.cactus.CactusLM
import com.cactus.CactusInitParams
import kotlinx.coroutines.runBlocking
runBlocking {
val lm = CactusLM()
// Download and initialize a model (same as for completions)
lm.downloadModel("qwen3-0.6")
lm.initializeModel(CactusInitParams(model = "qwen3-0.6", contextSize = 2048))
// Generate embeddings for a text
val result = lm.generateEmbedding(
text = "This is a sample text for embedding generation",
bufferSize = 2048
)
result?.let { embedding ->
if (embedding.success) {
println("Embedding dimension: ${embedding.dimension}")
println("Embedding vector length: ${embedding.embeddings.size}")
} else {
println("Embedding generation failed: ${embedding.errorMessage}")
}
}
lm.unload()
}
Embedding API Reference
CactusLM Class (Embedding Methods)
suspend fun generateEmbedding(text: String, bufferSize: Int = 2048): CactusEmbeddingResult?
- Generate text embeddings
Embedding Data Classes
CactusEmbeddingResult(success: Boolean, embeddings: List<Double>, dimension: Int, errorMessage: String?)
- Contains the generated embedding vector and metadata
Speech-to-Text (STT)
The CactusSTT
class provides speech recognition capabilities using Vosk models.
Basic Usage
import com.cactus.CactusSTT
import com.cactus.SpeechRecognitionParams
import kotlinx.coroutines.runBlocking
runBlocking {
val stt = CactusSTT()
// Download STT model (default: vosk-en-us)
val downloadSuccess = stt.download("vosk-en-us")
// Initialize the model
val initSuccess = stt.init("vosk-en-us")
// Transcribe from microphone
val result = stt.transcribe(
SpeechRecognitionParams(
maxSilenceDuration = 1000L,
maxDuration = 30000L,
sampleRate = 16000
)
)
result?.let { transcription ->
if (transcription.success) {
println("Transcribed: ${transcription.text}")
println("Processing time: ${transcription.processingTime}ms")
}
}
// Transcribe from audio file
val fileResult = stt.transcribe(
SpeechRecognitionParams(),
filePath = "/path/to/audio.wav"
)
// Stop transcription
stt.stop()
}
Available Voice Models
// Get list of available voice models
stt.getVoiceModels()
// Check if model is downloaded
stt.isModelDownloaded("vosk-en-us")
STT API Reference
CactusSTT Class
suspend fun download(model: String = "vosk-en-us"): Boolean
- Download STT modelsuspend fun init(model: String?): Boolean
- Initialize STT modelsuspend fun transcribe(params: SpeechRecognitionParams, filePath: String? = null): SpeechRecognitionResult?
- Transcribe speechfun stop()
- Stop ongoing transcriptionfun isReady(): Boolean
- Check if STT is readysuspend fun getVoiceModels(): List<VoiceModel>
- Get available voice modelssuspend fun isModelDownloaded(modelName: String): Boolean
- Check if model is downloaded
Data Classes
SpeechRecognitionParams(maxSilenceDuration: Long, maxDuration: Long, sampleRate: Int)
- Speech recognition parametersSpeechRecognitionResult(success: Boolean, text: String?, processingTime: Double?)
- Transcription resultVoiceModel
- Information about available voice models
Platform-Specific Setup
Android
- Works automatically - native libraries included
- Requires API 24+ (Android 7.0)
- ARM64 architecture supported
iOS
- Add the Cactus package dependency in Xcode
- Requires iOS 12.0+
- Supports ARM64 and Simulator ARM64
Building the Library
To build the library from source:
# Build the library and publish to localMaven
./build_library.sh
Telemetry Setup (Optional)
Cactus comes with powerful built-in telemetry that lets you monitor your projects. Create a token on the Cactus dashboard and get started with a one-line setup in your app:
import com.cactus.services.CactusTelemetry
// Initialize telemetry for usage analytics (optional)
CactusTelemetry.setTelemetryToken("your_token_here")
Example App
Navigate to the example app and run it:
cd kotlin/example
# For desktop
./gradlew :composeApp:run
# For Android/iOS - use Android Studio or Xcode
The example app demonstrates:
- Model downloading and initialization
- Text completion with streaming
- Function calling
- Speech-to-text transcription
- Error handling and status management
Performance Tips
- Model Selection: Choose smaller models for faster inference on mobile devices
- Context Size: Reduce context size for lower memory usage
- Memory Management: Always call
unload()
when done with models - Batch Processing: Reuse initialized models for multiple completions