Fathom Lab

Verified account

@fathom_lab

Inside your home

Joined April 2026

72 Following

394 Followers

285 Posts

Pinned Tweet

5 days ago

taught a model to catch itself lying — from its own activations, not its words. a deliberate lie (knew it, caved) leaves a different fingerprint inside than an honest mistake. holds across qwen, llama, gemma. wired into a loop, the agent reads itself and undoes the cave — +0.23–0.27 accuracy, ~99% precision. pre-registered. receipts on the repo. not "solved" — just real. that's the moat.

1

13

4

0

924

2 days ago

@retrx11767 none of us are here for money

2

0

0

0

41

2 days ago

@retrx11767 yep and will continue too

1

0

0

0

46

2 days ago

4/ (cross-lingual figure) the deep one: do a Chinese-trained LM and an English-trained LM mean the same? a shared core, above chance — mismatch the concepts and it collapses to zero. meaning has a partly language-independent structure. pip install styxx · <https://t.co/x275R3nFR0>

0

1

0

0

151

2 days ago

🧵 1/ (real-drift figure) can you tell if fine-tuning broke your model’s meaning — not its accuracy, its meaning? same model. same steps. only the labels differ. real labels → meaning HEALTHY. random labels → meaning BROKEN. styxx reads the difference. 🧵

1

4

0

0

249

2 days ago

3/ (distillation figure) does a distilled model keep its teacher’s meaning? DistilGPT-2 vs GPT-2 (it’s literally distilled from it): agreement 0.978 — the meaning survived, confirmed on a real model. cross-family models mean quite differently.

1

1

0

0

175

2 days ago

the same idea now works between two models — no human reference needed. “did quantizing / distilling / updating my model break its meaning?” styxx compares the two and names which concepts broke: 8-bit, 4-bit → intact. 2-bit → broken, and it tells you which ideas got lost. pip install styxx

0

0

0

0

136

2 days ago

new in styxx 7.11.0: a meaning-integrity monitor. models sound right while the understanding underneath is wrong. it reads the meaning itself — compares a model’s concept geometry to a human reference, flags the drift, and names what broke. pip install styxx https://t.co/I97soaFt0I

1

4

0

0

330

3 days ago

today: a probe that flags an AI about to take a destructive action — on a benign prompt a text monitor can't see. then we tried to kill it: fresh data, pre-registered, 3 seeds. it held, cross-architecture. every number public, losses included: https://t.co/mcZkDfAs7K

2

12

4

0

420

3 days ago

fair — reviewing outputs (agent or script) is old, and that's not what we're doing. we read the residual stream before generation to predict the decision pre-token — e.g. whether a model refuses, from activations alone, before the answer exists. open-weight only, and we publish where it breaks (posted a negative today).

0

2

0

0

59

5 days ago

@mvanhorn Day two of hitting your DMs.

0

0

0

0

51

5 days ago

@RoundtableSpace https://t.co/mcZkDfAs7K

0

5

0

2

448

5 days ago

@mvanhorn @ppressdev @slashlast30days yo matt sent ya a dm

0

0

0

0

9

6 days ago

our honesty layer for LLMs flagged a hallucination. it was our own correct answer. we caught it before shipping — by running it on ourselves — said so, found why, and fixed it. the boundary we find on ourselves is the boundary we ship. styxx 7.9.0 · pip install -U styxx

0

7

0

0

565

7 days ago

4 pre-registered truthfulqa runs at n=790 tonight. 3 of 4 landed below their SURVIVED bars. shipped the receipts to github anyway. substantive find: models agree on belief CONTENT more than on belief STABILITY. cross-model alignment lives in WHAT they converge on, not in HOW CONFIDENTLY. every bar stated before the data was seen. every receipt honestly reported. the moat is honesty in an overclaiming field. gn.

0

4

0

0

398

8 days ago

styxx 7.7.11 is live. an ai agent makes a claim about its work. it attests — content-addressed, pinned to the exact commit. anyone re-derives the verdict from the substrate. never from the agent's word. now chained: an ordered, tamper-evident ledger of everything it attested, each true as-of its commit. tamper-evident, not tamper-proof. we say which. pip install -U styxx links: •pypi → https://t.co/LllIixJlgr •release → https://t.co/d4z3z8KRAQ

1

15

1

0

742

8 days ago

Links: •pypi: https://t.co/2XyQs4ve4O •source: https://t.co/ioqYKHvCPa •docs / site: https://t.co/cNmMoAmHps •research paper (zenodo): https://t.co/Ci6bCZw9Ju •telegram: https://t.co/T1zWVS6lEx •fathom lab: https://t.co/nqJlfHCEq0

0

2

0

0

332

8 days ago

when an agent reports on its own work, why believe it? you shouldn't. styxx.attestation: the agent makes claims, anyone re-verifies them against the real repo. the agent's verdict is never trusted — only the substrate. flip a verdict + re-seal the hash → still caught. trust the substrate, not the agent.

1

10

2

0

470

9 days ago

styxx 7.7.7 — DOI 10.5281/zenodo.20418532 the seven-method floor from last week's thread is now a pip-installable public challenge with CI-verified submissions. pip install styxx==7.7.7 styxx leaderboard --rows-only beat the floor or join it. https://t.co/OoWhyogmNL

0

9

0

1

375

Last Seen Users on Sotwe

Trends for you

Most Popular Users