Matthew Villnave @RSLMatt - Twitter Profile

Pinned Tweet

14 days ago

Dense models on CPU aren't compute-bound. They're memory-residency problems. SDI v0.1.1 shows policy selection matters more than context volume. Local inference. No hype.

RSLMatt's tweet photo. Dense models on CPU aren't compute-bound.
They're memory-residency problems.

SDI v0.1.1 shows policy selection matters more than context volume.
Local inference. No hype. https://t.co/AsizfPiSIc

0

83

Matthew Villnave

@RSLMatt

about 2 hours ago

@iamlukethedev @Claw3DCity I recently made the switch myself. Very happy I did.

1

0

16

Matthew Villnave

@RSLMatt

about 4 hours ago

I guess I've been on X for one year. #MyXAnniversary

0

4

Matthew Villnave

@RSLMatt

about 8 hours ago

All in the name of local inference. Current single long-running agent usage while I research new ways to run models locally.

0

6

Matthew Villnave

@RSLMatt

about 16 hours ago

Nothing like waking up and noticing your OptiPlex tried hanging itself up in the night. Crisis averted. Tho

0

13

Matthew Villnave

@RSLMatt

6 days ago

Well, I downloaded Hermes to my MacBook as a trial run to test it out. I like it enough that I'm building a new agent on the OptiPlex I'll be calling Optimus. Hopefully I have fewer issues with Hermes. I've been a hardcore OpenClaw fan, but I don't know…that Harness seems to just be letting me down recently. I might just fuck around and build my own harness. I already have one started that I call “vault brain.” Might have to fuck around and blow the dust off it.

0

34

Matthew Villnave

@RSLMatt

8 days ago

@atlasfunded @vinod4471 @NikkiRajp40683 @shohansafa No reply had 0 like? Excuse me? https://t.co/ZfiBmCRCDo

Matthew Villnave

@RSLMatt

10 days ago

@atlasfunded Nothing is free

1

0

134

0

1

0

97

Matthew Villnave

@RSLMatt

8 days ago

@atlasfunded @atlasfunded what the hell?!? I have 0 likes

0

19

Matthew Villnave

@RSLMatt

10 days ago

@atlasfunded Nothing is free

1

0

134

Matthew Villnave

@RSLMatt

10 days ago

@MountainDew

0

12

Matthew Villnave

@RSLMatt

10 days ago

@omgsidewalks Bitting the pillow

0

13

Matthew Villnave

@RSLMatt

11 days ago

Hey, don't forget to show your agent some appreciation.

0

2

0

32

Matthew Villnave

@RSLMatt

11 days ago

@SPACECANNABIS You know whats never made sense to me?.. Putting stuff on the outside of the paper..

0

1

0

312

Matthew Villnave

@RSLMatt

12 days ago

@nikitabier

0

1

0

22

Matthew Villnave

@RSLMatt

12 days ago

Lab update from The Forge: Still deep in the CPU inference rabbit hole. The current experiment: Can a low-bit base model stay resident in memory while compact residual “sidecars” get paged in only when needed? In plain English: base model in RAM, correction layers stored separately, load/decode/apply only the useful pieces = maybe higher effective quality without keeping the full higher-precision model resident. Not claiming speedups, quality parity, or “30B on a toaster.” Current work is boring-but-important systems plumbing: packed .trit sidecar format, Python/C++ decode parity, pager + manifest system, runtime hook wiring, decode-first safety, layer activation, cached decoded buffers, GGML graph materialization probes. Latest wall: The sidecar can be fetched, decoded, cached, and shadow-computed during generation. But true injection into the model graph needs the correct GGML tensor materialization path. You can’t just shove decoded floats into a graph tensor before the backend gives it real memory. That’s the current battle. Still early and very breakable, but no longer just an idea. It’s becoming a real question: Can model precision be paged like memory?

0

19

Matthew Villnave

@RSLMatt

12 days ago

@brockpierson snapchat

0

12

Matthew Villnave

@RSLMatt

12 days ago

@aditiitwt

0

17

Matthew Villnave

@RSLMatt

12 days ago

@beffjezos “Tell me I can’t, I dare you.”

0

81

Matthew Villnave

@RSLMatt

12 days ago

That’s exactly the thing I’ve been trying to avoid…. “technically generated” becoming mistaken for “actually viable.” A lot of local AI tests quietly drift into benchmarking swap behavior, page cache luck, or recovery latency instead of the inference path itself. That’s why I started adding explicit tripwires and staged failure gates. If the box silently changes regimes mid-run, the result becomes muddy fast. Right now I’m less interested in proving “one clean demo” and more interested in mapping: where the system breaks what kind of break it is and whether the runtime recovers cleanly after pressure. The interesting part to me is that some of the newer paged-residency simulations are suggesting memory may not actually be the primary blocker anymore…. Runtime correctness and compute behavior might be the harder wall now.

2

0

32

Matthew Villnave

@RSLMatt

14 days ago

Dense models on CPU are not just compute-heavy. They are memory-residency problems. Weights have to fit. KV cache grows with context. RAM bandwidth becomes the wall. And once the system hits swap, every later inference run can become misleading. That is the problem SDI is trying to attack.

1

0

83