C++ Engine
Cactus Graph API, precision types, and native C++ engine internals
Cactus Graph API
Build custom computation graphs with the PyTorch-like Graph API:
#include <cactus.h>
CactusGraph graph;
// Define inputs
auto a = graph.input({2, 3}, Precision::FP16);
auto b = graph.input({3, 4}, Precision::INT8);
// Build computation graph
auto x1 = graph.matmul(a, b, false);
auto x2 = graph.transpose(x1);
auto result = graph.matmul(b, x2, true);
// Set input data
float a_data[6] = {1.1f, 2.3f, 3.4f, 4.2f, 5.7f, 6.8f};
float b_data[12] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
graph.set_input(a, a_data, Precision::FP16);
graph.set_input(b, b_data, Precision::INT8);
// Execute
graph.execute();
// Get output
void* output_data = graph.get_output(result);
// Clean up
graph.hard_reset();Precision Types
Precision::FP32- Full precision floating pointPrecision::FP16- Half precision (recommended for mobile)Precision::INT8- 8-bit quantized (best performance/size ratio)Precision::INT4- 4-bit quantized (smallest size)
Error Handling
int result = cactus_complete(...);
if (result != 0) {
// Parse error from response JSON
// error field will contain specific error message
}Common error scenarios:
- Model not found or corrupted
- Insufficient memory
- Invalid input format
- Context length exceeded
Performance Tips
- Use INT8 quantization for best performance/quality balance
- Enable NPU on Apple devices for vision and transcription models
- Implement cloud handoff for complex queries
- Reuse model handles across requests (don't reinitialize)
- Pre-allocate buffers for streaming to avoid memory allocation overhead