CactusCactus

Kotlin Multiplatform SDK

Complete guide to using Cactus SDK in Kotlin applications

Cactus Kotlin

Official Kotlin Multiplatform library for Cactus, a framework for deploying LLM and STT models locally in your app.

Installation

Dependency List

Add to settings.gradle.kts

dependencyResolutionManagement {
    repositories {
        maven {
            name = "GitHubPackagesCactus"
            url = uri("https://maven.pkg.github.com/cactus-compute/cactus-kotlin")
            credentials {
                username = properties.getProperty("github.username") ?: System.getenv("GITHUB_ACTOR")
                password = properties.getProperty("github.token") ?: System.getenv("GITHUB_TOKEN")
            }
        }
    }
}

Add credentials

Add your GitHub username and token to local.properties:

github.username=your-username
github.token=your-personal-access-token

You can generate a personal access token by following the instructions GitHub's documentation. The token needs read:packages scope.

Or set them as environment variables: GITHUB_ACTOR and GITHUB_TOKEN.

Gradle Build

Add to your KMP project's build.gradle.kts:

kotlin {
    sourceSets {
        commonMain {
            dependencies {
                implementation("com.cactus:library:0.3-beta.1")
            }
        }
    }
}

Grant Permissions

Add the permissions to your manifest (Android):

<uses-permission android:name="android.permission.INTERNET" /> // for model downloads
<uses-permission android:name="android.permission.RECORD_AUDIO" /> // for transcription

Language Model (LLM)

The CactusLM class provides text completion capabilities with support for function calling (WIP).

Basic Usage

import com.cactus.CactusLM
import com.cactus.CactusInitParams
import com.cactus.CactusCompletionParams
import com.cactus.ChatMessage
import kotlinx.coroutines.runBlocking

runBlocking {
    val lm = CactusLM()

    // Download a model (default: qwen3-0.6)
    val downloadSuccess = lm.downloadModel("qwen3-0.6")
    
    // Initialize the model
    val initSuccess = lm.initializeModel(
        CactusInitParams(
            model = "qwen3-0.6",
            contextSize = 2048
        )
    )

    // Generate completion
    val result = lm.generateCompletion(
        messages = listOf(
            ChatMessage(content = "Hello, how are you?", role = "user")
        ),
        params = CactusCompletionParams(
            maxTokens = 100,
            temperature = 0.7,
            topK = 40,
            topP = 0.95
        )
    )

    result?.let { response ->
        if (response.success) {
            println("Response: ${response.response}")
            println("Tokens per second: ${response.tokensPerSecond}")
            println("Time to first token: ${response.timeToFirstTokenMs}ms")
        }
    }

    // Clean up
    lm.unload()
}

Streaming Completions

val result = lm.generateCompletion(
    messages = listOf(ChatMessage("Tell me a story", "user")),
    params = CactusCompletionParams(maxTokens = 200),
    onToken = { token, tokenId ->
        print(token) // Print each token as it's generated
    }
)

Function Calling (Experimental)

import com.cactus.models.ToolParameter
import com.cactus.models.createTool

val tools = listOf(
    createTool(
        name = "get_weather",
        description = "Get current weather for a location",
        parameters = mapOf(
            "location" to ToolParameter(
                type = "string", 
                description = "City name", 
                required = true
            )
        )
    )
)

val result = lm.generateCompletion(
    messages = listOf(ChatMessage("What's the weather in New York?", "user")),
    params = CactusCompletionParams(
        maxTokens = 100,
        tools = tools
    )
)

Available Models

You can get a list of available models:

lm.getModels()

LLM API Reference

CactusLM Class

  • suspend fun downloadModel(model: String = "qwen3-0.6"): Boolean - Download a model
  • suspend fun initializeModel(params: CactusInitParams): Boolean - Initialize model for inference
  • suspend fun generateCompletion(messages: List<ChatMessage>, params: CactusCompletionParams, onToken: CactusStreamingCallback? = null): CactusCompletionResult? - Generate text completion
  • fun unload() - Free model from memory
  • suspend fun getModels(): List<CactusModel> - Get available LLM models
  • fun isLoaded(): Boolean - Check if model is loaded

Data Classes

  • CactusInitParams(model: String?, contextSize: Int?) - Model initialization parameters
  • CactusCompletionParams(temperature: Double, topK: Int, topP: Double, maxTokens: Int, stopSequences: List<String>, bufferSize: Int, tools: List<Tool>?) - Completion parameters
  • ChatMessage(content: String, role: String, timestamp: Long?) - Chat message format
  • CactusCompletionResult - Contains response, timing metrics, and success status
  • CactusEmbeddingResult(success: Boolean, embeddings: List<Double>, dimension: Int, errorMessage: String?) - Embedding generation result

Embeddings

The CactusLM class also provides text embedding generation capabilities for semantic similarity, search, and other NLP tasks.

Basic Usage

import com.cactus.CactusLM
import com.cactus.CactusInitParams
import kotlinx.coroutines.runBlocking

runBlocking {
    val lm = CactusLM()

    // Download and initialize a model (same as for completions)
    lm.downloadModel("qwen3-0.6")
    lm.initializeModel(CactusInitParams(model = "qwen3-0.6", contextSize = 2048))

    // Generate embeddings for a text
    val result = lm.generateEmbedding(
        text = "This is a sample text for embedding generation",
        bufferSize = 2048
    )

    result?.let { embedding ->
        if (embedding.success) {
            println("Embedding dimension: ${embedding.dimension}")
            println("Embedding vector length: ${embedding.embeddings.size}")
        } else {
            println("Embedding generation failed: ${embedding.errorMessage}")
        }
    }

    lm.unload()
}

Embedding API Reference

CactusLM Class (Embedding Methods)

  • suspend fun generateEmbedding(text: String, bufferSize: Int = 2048): CactusEmbeddingResult? - Generate text embeddings

Embedding Data Classes

  • CactusEmbeddingResult(success: Boolean, embeddings: List<Double>, dimension: Int, errorMessage: String?) - Contains the generated embedding vector and metadata

Speech-to-Text (STT)

The CactusSTT class provides speech recognition capabilities using Vosk models.

Basic Usage

import com.cactus.CactusSTT
import com.cactus.SpeechRecognitionParams
import kotlinx.coroutines.runBlocking

runBlocking {
    val stt = CactusSTT()

    // Download STT model (default: vosk-en-us)
    val downloadSuccess = stt.download("vosk-en-us")
    
    // Initialize the model
    val initSuccess = stt.init("vosk-en-us")

    // Transcribe from microphone
    val result = stt.transcribe(
        SpeechRecognitionParams(
            maxSilenceDuration = 1000L,
            maxDuration = 30000L,
            sampleRate = 16000
        )
    )

    result?.let { transcription ->
        if (transcription.success) {
            println("Transcribed: ${transcription.text}")
            println("Processing time: ${transcription.processingTime}ms")
        }
    }

    // Transcribe from audio file
    val fileResult = stt.transcribe(
        SpeechRecognitionParams(),
        filePath = "/path/to/audio.wav"
    )

    // Stop transcription
    stt.stop()
}

Available Voice Models

// Get list of available voice models
stt.getVoiceModels()

// Check if model is downloaded
stt.isModelDownloaded("vosk-en-us")

STT API Reference

CactusSTT Class

  • suspend fun download(model: String = "vosk-en-us"): Boolean - Download STT model
  • suspend fun init(model: String?): Boolean - Initialize STT model
  • suspend fun transcribe(params: SpeechRecognitionParams, filePath: String? = null): SpeechRecognitionResult? - Transcribe speech
  • fun stop() - Stop ongoing transcription
  • fun isReady(): Boolean - Check if STT is ready
  • suspend fun getVoiceModels(): List<VoiceModel> - Get available voice models
  • suspend fun isModelDownloaded(modelName: String): Boolean - Check if model is downloaded

Data Classes

  • SpeechRecognitionParams(maxSilenceDuration: Long, maxDuration: Long, sampleRate: Int) - Speech recognition parameters
  • SpeechRecognitionResult(success: Boolean, text: String?, processingTime: Double?) - Transcription result
  • VoiceModel - Information about available voice models

Platform-Specific Setup

Android

  • Works automatically - native libraries included
  • Requires API 24+ (Android 7.0)
  • ARM64 architecture supported

iOS

  • Add the Cactus package dependency in Xcode
  • Requires iOS 12.0+
  • Supports ARM64 and Simulator ARM64

Building the Library

To build the library from source:

# Build the library and publish to localMaven
./build_library.sh

Telemetry Setup (Optional)

Cactus comes with powerful built-in telemetry that lets you monitor your projects. Create a token on the Cactus dashboard and get started with a one-line setup in your app:

import com.cactus.services.CactusTelemetry

// Initialize telemetry for usage analytics (optional)
CactusTelemetry.setTelemetryToken("your_token_here")

Example App

Navigate to the example app and run it:

cd kotlin/example

# For desktop
./gradlew :composeApp:run

# For Android/iOS - use Android Studio or Xcode

The example app demonstrates:

  • Model downloading and initialization
  • Text completion with streaming
  • Function calling
  • Speech-to-text transcription
  • Error handling and status management

Performance Tips

  1. Model Selection: Choose smaller models for faster inference on mobile devices
  2. Context Size: Reduce context size for lower memory usage
  3. Memory Management: Always call unload() when done with models
  4. Batch Processing: Reuse initialized models for multiple completions

Next Steps