nvidia going all in on local ai.
here's our take: it shouldn't depend on which chip you bought.
sparks, macs, the 5090 already on your desk, we cluster across all of it and split your favorite model pipeline-parallel so it runs fully private and local.
Thrilled to see @tryParallax live in production on @Theta_Network.
This is exactly why @Gradient_HQ built Parallax: turning the world’s GPU mesh into a sovereign, distributed token factory.
Congrats on the milestone! 🫡
glad we could help!
with the agentic adoption soaring, privacy and token cost are already the top concerns for both agent and human users.
that's what parallax's built for.
To make this work, we adapted Parallax, @Gradient_HQ's distributed inference framework, to run across EdgeCloud's global node network. One API endpoint, model split across many machines, no centralized cluster required.
@VitalikButerin buy a GPU, get together a group of friends. don’t carry the world on your own shoulders.
we’ve been building this for a while. try parallax for local ai.
@RoundtableSpace 35b model on a macbook with compressed cache is a solid result.
local inference keeps getting more accessible and it's fun to watch people push the limits of what consumer hardware can do!
@ollama local llm + mlx is a great combo! apple silicon keeps getting better for local inference and it's nice to see more players in the ecosystem lean into it properly.
TurboQuant tackles one bottleneck: KV cache memory. there's another one that matters just as much in distributed setups: communication latency between nodes.
we built Decentralized Speculative Decoding (DSD) to turn that idle network wait time into useful computation, 2.56x speedup on HumanEval, no retraining needed.
combine cache compression with latency compression and local inference starts looking very different. https://t.co/AF0wpqXIhd
hf-mount solves the storage side: any model, mounted locally like a drive. the next piece is actually running those models across whatever hardware you have.
that's what parallax does: schedule inference across a pool of heterogeneous GPUs so the model doesn't just live on your machine, it runs there too. mount + serve, fully local.
@oprydai you don't need to go into debt though. a couple of mac minis or an nvidia card can already run serious models locally. parallax lets you connect whatever hardware you have into one cluster. start small, add devices as you go. the whole point is using what's already on your desk.
@openclaw solid release. deepseek provider plugin + qwen pay-as-you-go opens up a lot of new local setups.
parallax users running openclaw stacks should have a smoother time with this one.
@wolfejosh the ceiling for on-device keeps moving.
a year ago people argued you couldn't run anything useful locally. now it's 400B on a phone.
parallax already supports mixed hardware clusters — apple silicon, nvidia, whatever you've got. the trend is clear.
the $3,469 single-night burn is a good reminder of what you're actually signing up for with cloud inference.
when the meter's always running, one stuck agent is a bill.
parallax runs models on your own machines. no token meter, no overnight surprises.
local ai has picked up fast since openclaw dropped.
with the latest wave of small capable models, more people are running serious workloads on their own hardware.
if you missed this good local ai tutorial from @yacinelearning or want a refresher on how distributed scheduling actually works under the hood, it's worth the rewatch over the weekend!
I am continuing my adventure into distributed AI system with the parallax scheduling strat from @Gradient_HQ
in this 37min tutorial I go through:
- heuristic used to make scheduling tractable
- dynamic programming formulation
- filling GPU with water
- shoving them into shelves