C++ Engine

Cactus Graph API

Build custom computation graphs with the PyTorch-like Graph API:

#include <cactus.h>

CactusGraph graph;

// Define inputs
auto a = graph.input({2, 3}, Precision::FP16);
auto b = graph.input({3, 4}, Precision::INT8);

// Build computation graph
auto x1 = graph.matmul(a, b, false);
auto x2 = graph.transpose(x1);
auto result = graph.matmul(b, x2, true);

// Set input data
float a_data[6] = {1.1f, 2.3f, 3.4f, 4.2f, 5.7f, 6.8f};
float b_data[12] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};

graph.set_input(a, a_data, Precision::FP16);
graph.set_input(b, b_data, Precision::INT8);

// Execute
graph.execute();

// Get output
void* output_data = graph.get_output(result);

// Clean up
graph.hard_reset();

Precision Types

Precision::FP32 - Full precision floating point
Precision::FP16 - Half precision (recommended for mobile)
Precision::INT8 - 8-bit quantized (best performance/size ratio)
Precision::INT4 - 4-bit quantized (smallest size)

Error Handling

int result = cactus_complete(...);

if (result != 0) {
    // Parse error from response JSON
    // error field will contain specific error message
}

Common error scenarios:

Model not found or corrupted
Insufficient memory
Invalid input format
Context length exceeded

Performance Tips

Use INT8 quantization for best performance/quality balance
Enable NPU on Apple devices for vision and transcription models
Implement cloud handoff for complex queries
Reuse model handles across requests (don't reinitialize)
Pre-allocate buffers for streaming to avoid memory allocation overhead

Next Steps

View Source Code

Explore the Cactus C++ implementation

Join Discord

Get help from the community

Benchmark Results

See performance metrics across devices

Cactus Graph API

Precision Types

Error Handling

Performance Tips

Next Steps

View Source Code

Join Discord

Benchmark Results

On this page