Man i wish that @OpenAI gpt-5.5 pro and @AnthropicAI opus-4.8 would not gaslight me into thinking that i am a genius with revolutionary ideas (which i don't think i am) every time i try to explore stuff 🥲
@ThePrimeagen i guess being sponsored by them has its perks… and is actually better than investing in them (@theo got gotten) or maybe it’s just the fastest fingers in west 🤷♂️
What do you think about "extending" this approach to non unified memory machines ?
In that case you have an even faster cache layer for the weights, the RAM.
If the model fits entirely you just load the experts you need from the RAM, if it doesn't fit you could have 2 layer cache with also ssd streaming.
@naval Depends on the software, but i generally agree that we need an agent native interface in most apps.
Currently doing some research in the space
https://t.co/810nZLs9hn
https://t.co/XBvRtFtvxZ
Basic integration of Reachy Mini by @huggingface and @pollenrobotics in my agent runtime.
For now just pose and pre-built actions are available, will look into dynamic behavior graph generation and execution later.
Next step is a e2e voice pipeline, using the work from @andimarafioti as a baseline.
Yeah i ordered the lite version a few weeks back, I'm waiting for the delivery. I already had a pi 5 lying around doing nothing.
I originally wanted to build it from scratch, but sourcing parts was looking more expensive than the lite, so i just got that instead.
Will probably print some mods and run my custom agent runtime tho, cant wait!
@andimarafioti@MistralAI yes, i think it makes sense.
When I'll get the Reachy Mini what i will probably do is add a raspberry running the agent runtime on device and delegate inference to the Spark. Maybe juice it up later with a jetson orin nano 👀
Modularity is always welcome in projects like these
@andimarafioti@MistralAI Quick guide made by gpt-5.5
https://t.co/egZsY2zNG1
Note that i have made some changes in the speech-to-speech repo to have it connect to an external provider
@andimarafioti@MistralAI vLLM-Omni 0.20.0 running in a GB10-optimized CUDA container based on https://t.co/bhCXl4Lm01
I tuned the Voxtral TTS stage config so both stages fit cleanly on the spark, with async chunking and shared-memory transfer enabled.
After the initial warmup/compile pass, it’s fast.
@puffybsd@pollenrobotics@huggingface i wanted to do this myself but sourcing the parts was more expensive than the product itself, so i just bought it ahah now waiting for it to show up...
did you go with custom firmware ?