Plug in local, free, private AI into your mobile apps ๐ต
Cactus now supports @FlutterDev, @reactnative , and @kotlin bindings.
+ with function calling, you can also deploy complex agentic pipelines directly in-app.
Fully open-source ๐ต link in bio ๐
All cloud fallback is ๐๐ฅ๐๐ ๐๐ต๐ถ๐ ๐๐ฒ๐ฏ๐ฟ๐๐ฎ๐ฟ๐. Seriously. Make us regret this ๐
We just launched Hybrid Cloud inference and we're too excited for you to try it.
1. Go to https://t.co/341YSbH19V
2. Sign up and create a key
3. Run unlimited on-device transcription and LLM inference with cloud fallback
Cactus Hybrid Cloud runs inference on-device by default, as always. If the on-device model struggles, it automatically hands off inference to the cloud.
Demo for yourself:
๐ฏ๐ฟ๐ฒ๐ ๐ถ๐ป๐๐๐ฎ๐น๐น ๐ฐ๐ฎ๐ฐ๐๐๐-๐ฐ๐ผ๐บ๐ฝ๐๐๐ฒ/๐ฐ๐ฎ๐ฐ๐๐๐/๐ฐ๐ฎ๐ฐ๐๐๐
๐ฐ๐ฎ๐ฐ๐๐๐ ๐๐ฟ๐ฎ๐ป๐๐ฐ๐ฟ๐ถ๐ฏ๐ฒ
@btconometrics @Raspberry_Pi you can run @cactuscompute on any raspberry pi.
Cactus also runs zero-copy memory mapping, so you're not constrained by the 8GB RAM
At Cactus๐ต, we want on-device AI inference to be as fast as possible - thatโs why we decided to use Nitro for our React Native SDK. The performance is insaneโก๏ธand the DX is even better. Making heavy use of the object-oriented Hybrid Objects, and mixing C++ with Swift and Kotlin feels like a breeze.
Thanks @mrousavy - looking forward to shipping more features with Nitro ๐
Hackathon alert! London, SF, Boston. This Friday! ๐
@nothing is teaming up with @cactuscompute and @huggingface to hack on redefining on-device AI experiences!
Come build something memorable, meet the teams, and ship in 24 hours!
Signups are wild so far ๐ฅ
Cactus React Native v1 is live!
Deploy AI on-device with text inference, tool calling, embeddings and more โ powered by the fastest edge inference engine ๐ต
Our React Native bindings run on @margelo_com's Nitro Modules, yielding the fastest mobile inference we've seen so far.
More benchmarks for LFM2 models by @liquidai on Cactus (YC S25).
- When we get to INT4, file sizes should reduce 2x, speed increase 2x, and battery drain reduce 2x. For lossless quantisation, the model should always be post-trained for QAT in a specific way, a trick we mastered in my last role.
- NPUs are 5-11x energy-efficient and up to 10x faster for long-context, after we merge those, you should be able to design complex multi-agent workflows with large contexts safely on phones.
Many ther tricks up our sleeves. Sit back, relax and watch!