I did some more tests with Kimi K2.6 locally on two Mac Studios via @exolabs measuring both parallel execution and optimal context size.
Parallel requests scaled up to 8 concurrent calls reaching ~62 tok/s aggregate.
For context length, ~85k prompt tokens was still fast, ~87k slowed down sharply, and 128k technically ran but took ~7.5 minutes.
Managed to run Kimi-K2.6 on two mac studios via @exolabs. Got 21 tokens/s. Crazy to think it outperforms gpt-5.4 and opus-4.6 on some benchmarks and you can run it locally.
I think the key to success is to blend the ancient and the bleeding edge of technology. The classical education (latin, greek, aristotle) with AI papers / hardware and robotics.
heres my small ai lab I have atm. On one mac studio I do the dev work remotely from my phone or macbook pro. The other one is for extra compute (kimi, minimax). DGX spark for experiments. All connected with 10 gigabit ethernet.