Tsera @Tserawho - Twitter Profile

Pinned Tweet

Tsera

@Tserawho

2 days ago

https://t.co/f7TsZRrTHP

4

16

2

44

76K

Tsera

@Tserawho

about 2 hours ago

STOP WASTING MONEY ON CLOUD CO-PILOTS FOR LOCAL DEVELOPMENT Standard workflow: $20/mo per developer, high latency, and constant code telemetry leaks to central servers. The alternative is already running locally on consumer hardware. ↓ One: The Stack Engine: exo (decentralized local AI cluster coordination) Model: qwen-2.5-coder-7b IDE: Zed (native performance, zero electronic overhead) Two: The Unit Economics Hardware: Standard MacBook Pro M-series + local network clustering. API Costs: $0.00. Latency: Sub-100ms for local code generation and semantic search. Three: The Architecture The exo orchestrator automatically splits the model weights across available local nodes (Macs, iPhones, iPads) using peer-to-peer networking. You aren't buying a massive GPU rig; you are utilizing the idle silicon already sitting on your desk. This effectively cuts your team's development dependency on external APIs from $2,400/year to a one-time local network configuration. The local AI cluster era is officially here. Save this to audit your infrastructure costs next week.

0

37

Tsera

@Tserawho

about 5 hours ago

@Flandermaxx Local hosting killed cloud

0

19

Tsera

@Tserawho

about 5 hours ago

@undefinedKi Ditch vectors, use Markdown

0

5

Tsera

@Tserawho

about 5 hours ago

@NikiStallo75181 This is the distribution of load distribution / tensor splitting between the nodes of the local cluster in the Exo information engine.

0

26

Tsera

@Tserawho

about 9 hours ago

BUILDING A HOME SUPERCOMPUTER PROTOCOL: NO CLOUD, NO SUBSCRIPTIONS Four Mac Studio units linked via 10Gbps Ethernet running an open-source inference engine. The stack: - Hardware: 4x Apple Mac Studio M2 Ultra (stacked locally) - Framework: Exo (distributed local inference engine) - Interconnect: Standard 10Gbps LAN / Wi-Fi fallback - Model: Local LLaMA-3 / Mistral orchestration Unit Economics: - Cloud API Cost: $0.00/mo forever - Token Throughput: 106.21 TFLOPS aggregate performance - Network Latency: Sub-millisecond peer-to-peer discovery - Setup Time: Under 2 hours from unboxing to local API endpoint Stop renting intelligence from OpenAI when you can own the physical layer. The era of centralized AI monopolies is ending on consumer desks. ↓

2

0

104

Tsera

@Tserawho

about 6 hours ago

@NikiStallo75181 yep💪🏻

0

4

Tsera

@Tserawho

1 day ago

RUNNING MASSIVE LLMS ON MAC HARDWARE JUST FLIPPED THE ECONOMICS. The old playbook said you needed a cluster of Nvidia H100s to serve heavy open-source weights. Apple silicon was just for local prototyping. This demo breaks that assumption completely. The architecture: 2x to 4x Mac Studio nodes running in tandem. Unified memory pooled natively over Thunderbolt RDMA. Apple's MLX framework executing distributed inference. The unit economics for Kimi K-2.5 (a massive 670GB model): RAM Required: ~670 GB loaded directly into unified memory. Two-node setup: 23.4 tokens per second. Four-node setup: Scales up to 29.0 tokens per second. Time to first token drops immediately as memory pressure shifts down. Hardware clustering via MLX and RDMA turns consumer-grade desktop enclosures into a decentralized AI data center. The infrastructure cost barrier for local, giant-scale inference just vanished. Watch the full scaling breakdown below ↓