José Miguel Arroyo @jmarroyo - Twitter Profile

1 day ago

@bcherny I wonder if with that kind of behavior the default 5mins TTL for non subscriptions Claude code setups shouldn't be revised. While the subagents work and the main agent sits idle, the short lived prefix cache is bound to be missed often, no?

0

139

José Miguel Arroyo @jmarroyo

2 days ago

@thepanta82 @GergelyOrosz No one has ever known how to interview devs 😅 Lately every time I've heard mentions of studies that tried to correlate interview process and outcome there hasn't been a strong signal, ever. In person, take home, remote, etc -> none effectively predict performance at new job 🥲

0

5

José Miguel Arroyo @jmarroyo

4 days ago

@GergelyOrosz @hoseakidane_ Everyone says that ^^ Hard to say what'll stick in the end (especially for teams outside AI labs without infinite tokens) https://t.co/7RfHSv1aCn

dax

@thdxr

7 days ago

it's so crazy ai is bad at your job but good at everyone else's job

178

6K

265

259

202K

0

138

jmarroyo retweeted

Alexander Goslin

@xandurglar

6 days ago

Introducing InfiniteDiffusion, my independent paper accepted to #SIGGRAPH2026! I have one RTX 3090 Ti. No funding, advisors, or team. By day I'm a new grad SWE at Walmart. The paper has two main contributions: - InfiniteDiffusion: a new approach to infinite generation with diffusion models. - Terrain Diffusion: the world’s first learned procedural terrain generator. Here’s why this matters, and how they are connected. 🧵

152

6K

625

4K

911K

Who to follow

Vanderwerf

@VanderwerfA

José Miguel Arroyo @jmarroyo

8 days ago

On the topic of how much models love redundant comments: Since models produce their output sequentially, the comments come first, before the code. I wonder to what extent the comments act as some "chain of thought" extension from the model before it produces actual code

0

7

José Miguel Arroyo @jmarroyo

13 days ago

@xovemnormie @mitchellh That is a genius way to put it 😂

0

2

0

207

José Miguel Arroyo @jmarroyo

13 days ago

This announcement feels surreal https://t.co/bXSy1aF7JF 🤯 A midjourney spa (like actually a spa to put their scanners in, not a strange acronym) was not on my 2026 bingo card. I feel that bootstrapped labs like this have a spark the VC-backed startups can no longer have.

0

19

José Miguel Arroyo @jmarroyo

18 days ago

What is _understanding_, truly?

Valerio Capraro

@ValerioCapraro

20 days ago

Claude Fable 5 doesn’t truly understand. And here is a beautiful proof: The Beninatto-Trombetti test is a translation test for professional translators. It measures the ability to infer context, revise the surface form, and generalize beyond literal mapping. For example, the correct translation of: “Solo 3 parole: non sei solo” is not: “Just 3 words: you are not alone” but: “Just 4 words: you are not alone.” An LLM that understands the sentence must also update the meta-linguistic claim inside the sentence. Claude Fable 5 is arguably the most advanced LLM currently available. And yet it still fails this simple test. LLMs are extraordinary machines for recombining existing knowledge. But they don’t truly understand. We are still far from AGI.

ValerioCapraro's tweet photo. Claude Fable 5 doesn’t truly understand. And here is a beautiful proof:

The Beninatto-Trombetti test is a translation test for professional translators. It measures the ability to infer context, revise the surface form, and generalize beyond literal mapping.

For example, the correct translation of:

“Solo 3 parole: non sei solo”

is not:

“Just 3 words: you are not alone”

but:

“Just 4 words: you are not alone.”

An LLM that understands the sentence must also update the meta-linguistic claim inside the sentence.

Claude Fable 5 is arguably the most advanced LLM currently available. And yet it still fails this simple test.

LLMs are extraordinary machines for recombining existing knowledge. But they don’t truly understand.

We are still far from AGI.

232

1K

111

631

409K

0

21

jmarroyo retweeted

Awni Hannun

@awnihannun

22 days ago

It's very cool that Apple shipped a 20B parameter on-device. You can't put 20B parameters in RAM at any reasonable precision. To make it work they are using pretty exotic architecture by today's standards. A small model predicts from the query (or prompt) which experts to load from Nand into RAM. The key distinction from a typical MoE is that you do this once per query and then generate all the tokens with the same experts (instead of switching the experts for every token).

awnihannun's tweet photo. It's very cool that Apple shipped a 20B parameter on-device.

You can't put 20B parameters in RAM at any reasonable precision. To make it work they are using pretty exotic architecture by today's standards.

A small model predicts from the query (or prompt) which experts to load from Nand into RAM. The key distinction from a typical MoE is that you do this once per query and then generate all the tokens with the same experts (instead of switching the experts for every token).

77

3K

295

1K

227K

jmarroyo retweeted

Georgi Gerganov

@ggerganov

about 1 month ago

llama.cpp now has an official website: https://t.co/vztdUpdBWL Our goal is to make local AI accessible to everyone, and improving the user experience is a big part of that. On the new landing page you’ll find a single-line cross-platform installer. The installation provides a single unified `llama` entrypoint which you can use to run/serve models and interface with 3rd-party agentic applications. While oriented towards simplified user experience, the new `llama` application also provides all the advanced functionality of the existing llama.cpp tooling with which experienced users are already familiar. Also note that all GGUF models that you might have already downloaded with llama.cpp in the past will be automatically available to use without downloading again (they are stored in the common HF cache on your machine). We have many improvements in the pipeline both at the UX and at the engine level and we plan to iteratively ship new things over the coming months. One of the main focuses will be seamless integration with local-friendly 3rd-party agents (such as Pi). In the meantime, we’ll continue to listen for feedback from the community and adjust accordingly, so keep letting us know what you think and need.

96

3K

485

1K

166K

José Miguel Arroyo @jmarroyo

about 1 month ago

@thsottiaux /status not being up to date on first call, but up to date when calling again -> I guess that's a load management design but if people spam it to make it refresh, are you really getting any benefit out of this annoying UX? Instead refresh on the first call -> cache for a while?

0

14

José Miguel Arroyo @jmarroyo

about 1 month ago

@Steve_Yegge Hard to do if you're already at a job though

0

25

José Miguel Arroyo @jmarroyo

about 1 month ago

@Steve_Yegge We've been having a lot of discussions about this. I really like the idea of making "work count twice". Being able to leverage your interview process at company XYZ for your next interview process at ABC takes away a bit of the sting of working without assurance of being hired.

1

0

1

1K

jmarroyo retweeted

Steve Yegge

@Steve_Yegge

about 1 month ago

I feel like nobody has kicked any really big beehives today. So here goes. https://t.co/bPX11SYkdQ

58

974

51

884

599K

José Miguel Arroyo @jmarroyo

about 1 month ago

@bcherny A bit late to the party but in case anyone might know, do the classifying requests for auto mode use a model behind the scenes (it feels like it, beyond the permissions setup)? If so, does that count towards usage quota/API usage?

0

11

jmarroyo retweeted

Tibo

@thsottiaux

about 1 month ago

A little secret. About 5% of our production traffic is on the Pi harness, about another 5% is on OpenCode. Reminder you can use your ChatGPT account in a flourishing set of other tools. We’ll continue to make Codex awesome, but you have options.

405

8K

295

1K

900K

José Miguel Arroyo @jmarroyo

about 2 months ago

"92% of GPT 5.5's performance" At this point what does that event mean? Is it a smart model? Is it fast? What level of "effort"? What benchmark? This is just HypeOps isn't it?

Bindu Reddy

@bindureddy

about 2 months ago

Gemini 3.2 Flash - Capitalizing on DeepMind's clever distillation techniques... Rumors are that benchmarks show it's hitting 92% of GPT 5.5's performance on coding and reasoning tasks while being 15-20x cheaper on inference costs. The latency improvements are insane - sub-200ms for most queries. Google's distillation + sparsity techniques are paying off massively. They've essentially compressed a frontier model into a flash variant without the usual quality cliff.

157

4K

185

963

921K

0

24

José Miguel Arroyo @jmarroyo

about 2 months ago

@GergelyOrosz Wasn't it always like that? Many years ago when I was presented with such a position it was a "go do consultant work for customer X". The core platform was in another country and when I asked if _they_ were hiring, I didn't seem to qualify for that 😅 Didn't seem like a good deal

0

2K

José Miguel Arroyo @jmarroyo

about 2 months ago

Do HTML tokens come at a discount? 😅

Thariq

@trq212

about 2 months ago

HTML is the new markdown. I've stopped writing markdown files for almost everything and switched to using Claude Code to generate HTML for me. This is why.

900

12K

1K

15K

4M

0

24

José Miguel Arroyo @jmarroyo

about 2 months ago

@_lopopolo Otoh, coupling the harness to post training makes it way more expensive to modify your harness down the road, so this isn't free 😅 (e.g. Notion on fine tuning models early on on the Latent Space podcast)

0

20

José Miguel Arroyo @jmarroyo

about 2 months ago

@_lopopolo I agree with the other comment that it depends on the harnesses being open source/hackable. One reason modern coding agents feel useful is that they make custom implementation "cheap". A harness without a customisation path is bound to be friction for someone somewhere.

1

0

90

José Miguel Arroyo

@jmarroyo

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users