Chris Maher N7CPM

Verified account

@defilan

Platform Engineering Director at Ashley. Running LLMs on K8s so your data stays yours. Creator of LLMKube (open source). Benchmarks, not hype.

Gig Harbor, WA

Joined January 2018

528 Following

244 Followers

563 Posts

Pinned Tweet

Chris Maher N7CPM

8 days ago

LLMKube 0.8.0 shipped: → Foreman, an opt-in kube-native orchestrator for agent fleets across heterogeneous local LLM hardware → Coder + verifier + reviewer pipeline running on Apple Silicon + NVIDIA + Intel → Intel oneAPI / SYCL GPU support from a first-time community contributor (PR #557) → Foreman authored 2 of its own PRs into this release (#508, #588), signed-off-by Foreman Bot https://t.co/sh4Z8jlCDB

1

0

0

0

51

Chris Maher N7CPM

3 days ago

@no_stp_on_snek haha, don't tempt me! This native Texan always looks for excuses to visit the home land ;)

0

1

0

0

21

Chris Maher N7CPM

5 days ago

@RaminNasibov Star Wars Galaxies. I know it wasn't everyone's favorite but pre-NGE was one of my favorite community gaming experiences of all time

0

0

0

0

156

Chris Maher N7CPM

5 days ago

Two $500 consumer GPUs. 32 GB total. A 35B model just ran at its full 256K context on them. 512K with YaRN. Standard f16 KV runs out of memory before it even reaches native. TurboQuant KV on consumer Blackwell. Writeup 👇 https://t.co/n4a9xDTAUh

0

0

0

0

15

Who to follow

Verified account

Take control of your codebase.

Verified account

@ApacheCamel 🐫| @Java_Champions | Engineer @IBM | Member @TheASF | Author Camel in Action

Senior News Writer, @InformaTTGT $TTGT. Thoughts here are my own. beth.pariseau at informatechtarget dot com @[email protected]

defilan retweeted

8 days ago

Opus 4.8 is insane guys. It one shotted my session usage limit.

421

25K

976

643

1M

Chris Maher N7CPM

7 days ago

@BlockedPaths Exactly what I was going for! For folks who have homelabs, that's very much the case. I have seen businesses in the same spot too. I really believe there is a lot to offer around heterogeneous inference systems.

0

0

0

0

6

Chris Maher N7CPM

8 days ago

LLMKube 0.8.0 shipped: → Foreman, an opt-in kube-native orchestrator for agent fleets across heterogeneous local LLM hardware → Coder + verifier + reviewer pipeline running on Apple Silicon + NVIDIA + Intel → Intel oneAPI / SYCL GPU support from a first-time community contributor (PR #557) → Foreman authored 2 of its own PRs into this release (#508, #588), signed-off-by Foreman Bot https://t.co/sh4Z8jlCDB

1

0

0

0

51

Chris Maher N7CPM

10 days ago

@Youssofal_ Strongly with you on this. A lot of what we ship on our local-model orchestrator is scaffolding fixes: schema strictness, scheduler timing, cmd deadlocks, port-stale on respawn. The model is rarely the bottleneck; the harness almost always is. Whole layer is underbuilt.

0

2

0

1

906

Chris Maher N7CPM

18 days ago

LLMKube 0.7.9 is out! Your Mac is now a Kubernetes inference node running mlx-server, an OpenAI-compatible MLX runtime. Qwen3.6-35B on an M5 Max: 102.7 tok/s, 107ms TTFT. Plus kubectl scale and autoscaling fixes. https://t.co/ONWMQvkemS

0

0

0

2

143

Chris Maher N7CPM

22 days ago

@JadenHorst @0xgaut 100%! There’s no excuse for not seeing whats coming. These providers will raise rates when investors want profit instead of growth and people will be blindsided. Now is the time to get really serious about local inference.

0

1

0

0

37

Chris Maher N7CPM

22 days ago

LLMKube 0.7.8 ships ModelRouter Phase 1! One OpenAI endpoint. Local + Anthropic/OpenAI/Bedrock/Vertex/LiteLLM. Fail-closed PII. Per-rule timeouts. Half-open circuit breaker. Audit log per request. Local-first agentic, hybrid when you need it. https://t.co/MUrsWlrcUo

0

1

0

0

66

Chris Maher N7CPM

23 days ago

Just saw the great video on LLMKube by @marceldempers ! Worth a watch! https://t.co/9GM9NY6OCt

0

3

1

0

111

Chris Maher N7CPM

25 days ago

@ClementDelangue It's been amazing to see and be a part of. What was once something deemed impossible is now part of my daily flow locally. It just keeps getting better! Crazy how fast things are evolving too.

0

0

0

0

456

Chris Maher N7CPM

25 days ago

LLMKube 0.7.7 shipped: → vllm-swift on Apple Silicon w/ TurboQuant KV cache passthrough → OpenShift / OKD / MicroShift a first-class deploy target → vLLM tuning fields (gpuMemoryUtilization, cpuOffloadGB) from a French community PR → Longhorn FSGroup fix from a user bug report with a full reproducer https://t.co/TjUUwDwvD0

0

1

0

0

52

Chris Maher N7CPM

27 days ago

@MemoryReboot_ Fair if your model is one box doing everything. Different approach: Mac Studio as a node in a heterogeneous K8s cluster. llmkube's Metal Agent schedules across Apple Silicon, NVIDIA, AMD. Tool calling on unified memory, throughput on CUDA. Best of both worlds.

1

17

0

22

3K

Chris Maher N7CPM

30 days ago

@Tuggernutz87 I've done mostly driving around town and use hurry mode but have been really impressed. Already noticed on the freeway it didn't want to left lane camp. That alone is a HUGE improvement!

0

1

0

0

17

Chris Maher N7CPM

about 1 month ago

@Tuggernutz87 @yunta_tsai @teslascope Finally getting the update! How about you?

1

1

0

0

22

Chris Maher N7CPM

about 1 month ago

@Tuggernutz87 @yunta_tsai @teslascope I'm right there with you. Been anxiously waiting for that update to come through.

0

1

0

0

32

Chris Maher N7CPM

about 1 month ago

Weekend shipping update: LLMKube 0.7.6. Headline fix: the metal-agent will stop killing your only model when system RAM spikes. Priority eviction + friendly-fire guard + per-service opt-out. Mutable modelRef. vLLM parallelSlots from a community PR. https://t.co/fhmbL83sNv

0

0

0

0

47

Last Seen Users on Sotwe

Trends for you

Most Popular Users