Omar 若い王

@Reido2012

🇯🇲 🇬🇧 🇺🇸 | AI Engineer 👨🏿‍💻 Machine Learning, Startups, Mental Health

Boston, MA

Joined January 2012

936 Following

590 Followers

7.3K Posts

Reido2012 retweeted

elvis

@omarsar0

5 days ago

Very good advice on self-improving agents. (bookmark it) This is something I am seeing in my own experiments with coding agents and harnesses for long-horizon tasks. What I have found is that stronger models do not always evolve better agents. The current believe in self-evolving agents is that a bigger model writes better prompt and skill edits, so devs put their best model in the evolver seat. New research shows that intuition is mostly wrong. The work separates two abilities that usually get conflated. Producing harness updates stays flat across model capability, so Qwen3.5-9B writes edits roughly as good as Claude Opus 4.6. Benefiting from those updates follows an inverted-U that peaks at mid-tier models, while weak models fail to even activate the edits and strong models have little headroom left. This is important to understand as it tells you where to spend. Put a cheap model on the evolver and your expensive model on the solver, because the gains land solver-side, not evolver-side. Paper: https://t.co/8kJwR7NhmV Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

omarsar0's tweet photo. Very good advice on self-improving agents.

(bookmark it)

This is something I am seeing in my own experiments with coding agents and harnesses for long-horizon tasks.

What I have found is that stronger models do not always evolve better agents.

The current believe in self-evolving agents is that a bigger model writes better prompt and skill edits, so devs put their best model in the evolver seat.

New research shows that intuition is mostly wrong.

The work separates two abilities that usually get conflated. Producing harness updates stays flat across model capability, so Qwen3.5-9B writes edits roughly as good as Claude Opus 4.6. Benefiting from those updates follows an inverted-U that peaks at mid-tier models, while weak models fail to even activate the edits and strong models have little headroom left.

This is important to understand as it tells you where to spend. Put a cheap model on the evolver and your expensive model on the solver, because the gains land solver-side, not evolver-side.

Paper: https://t.co/8kJwR7NhmV

Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

743

109

55K

Reido2012 retweeted

Mitchell Hashimoto

@mitchellh

9 days ago

I've got an agent in a loop optimizing a renderer with the goal to minimize frame times (and tests to measure). It got times down from 88ms to 2ms and allocations down from ~150K to 500. Sounds good, right? Wrong. This is exactly why agent psychosis is a big fucking problem. As an experiment, I rewrote the Ghostty core render state in Go, with access to identically laid out data structures as Ghostty and the exact same validation tests. I made a purposely naive renderer (simple, correct, but slow). 88ms per frame with 150,000 allocations (horrendous, lol)! I then kickstarted a Ralph loop to bring the frame times down. I told it it can't modify input data structures or the public API or tests (they're correct), but it can do anything else it wants. It got to work. It has worked for about 4 hours. I've spent around $350 on this experiment so far. The results? 88ms => 1.5ms 150K allocs => ~500 allocs Incredible right? Nope. My hand-written renderer I ported has frame times (same benchmark) of ~20us (0.020ms) and 0 allocations in the update path. This is the problem with psychosis and lacking systems understanding. If you don't understand the system, you're going to accept that this is an incredible result. If you understand the system, you'll see better solutions immediately and can do roughly 75x better on throughput. The people who blindly trust agent output are in the former camp. They're sheeple, overdrinking from a fountain of mediocrity. Standard disclaimer: I use AI all the time. I like AI. The point I'm making is to not blindly accept results. Think. Analyze. Learn.

305

968

780K

Omar 若い王 @Reido2012

8 days ago

This is how I operate. If you have not already noticed.

Garry Tan

@garrytan

8 days ago

There are two loops in every founder's head. The autism loop: run your own model to the floor, ignore consensus, hold a thesis when everyone says you're wrong. That makes conviction. The empathy loop: feel what the user feels, sense what the market wants before it has words. That makes traction. Most people crank one and starve the other. Pure conviction builds something brilliant nobody wants. Pure empathy builds consensus mush. PG put the whole job in four words: make something people want. The autism loop makes the something. The empathy loop knows it's wanted. The founder is the bridge. Most great founders show up dominant in the first loop. That's why they're contrarian enough to try at all. The work is grafting on the second. There is no place in the world that helps founders make the two loops work together to make great startups than Y Combinator. It is the most gratifying part of our work.

124

149

983

115K

Reido2012 retweeted

Balaji

@balajis

14 days ago

The digital divide has reversed. Digital is cheap, ubiquitous, often fake. Physical is the premium product now.

254

474

679

282K

Who to follow

Piotr Nawrot

@p_nawrot

LLM Efficiency @NVIDIA - views have always been only my own 🥇🥈 @ Flunkyball Polish Championships

Siyan Zhao

@siyan_zhao

CS PhD @UCLA | prev intern @AIatMeta, @Amazon | interested in RL, diffusion LLMs | bachelors @uoft

Zekun Wang (ZenMoore) 🔥

@ZenMoore1

#LLM #MLLM #GenAI Researcher @Kling_ai

Reido2012 retweeted

Thomas Wolf

@Thom_Wolf

16 days ago

New inflection point in the accelerating growth of open-source models usage is coming

233

107

78K

Reido2012 retweeted

Science girl

@sciencegirl

15 days ago

Footage comparing before and after shows the dramatic progress made by Gath, a Parkinson’s patient diagnosed 12 years ago, after only two days using the Produodopa treatment.

240

565K

Reido2012 retweeted

NXT EU

@NXT4EU

16 days ago

These @Dyson strawberries are from an advanced fully automated farm. Europe is a leader in Agri-Tech! 🇪🇺

264

661

Reido2012 retweeted

Qodo @QodoAI

16 days ago

"I think that typing code manually is dying." - @shanselman on The Agentic Review podcast We got into what that means and much more on the last episode. If you're thinking about where the craft is actually heading, this one's worth your time.

18K

Reido2012 retweeted

Mario Zechner

@badlogicgames

16 days ago

> These engineers can review their agent's code much faster than reviewing human code. wat

101

138K

Reido2012 retweeted

Ed Zitron

@edzitron

16 days ago

Jesus christ, OpenAI had a negative 122% operating margin for Q1 2026 and growth in ChatGPT has stalled. https://t.co/wud3PKugnV

edzitron's tweet photo. Jesus christ, OpenAI had a negative 122% operating margin for Q1 2026 and growth in ChatGPT has stalled.
https://t.co/wud3PKugnV https://t.co/2ggaccHEpm

500

600

516K

Reido2012 retweeted

LeRobot

@LeRobotHF

16 days ago

We built a bipedal robot for about $2,500. A real, mostly 3D-printed robot you can build, repair, simulate, train, and control. Today we’re releasing LeRobot Humanoid: an open robot-learning platform with hardware, runtime, identification tools, and training environments. Blog post: https://t.co/zu2etb1NZo Repo: https://t.co/4myLRUtZ3W

179

481

197K

Reido2012 retweeted

Bojan Tunguz

@tunguz

16 days ago

Try to build as much code as possible over the next few months. The prices you are seeing now for AI will probably not last too long.

152

301

Reido2012 retweeted

@jason

@Jason

17 days ago

Never mind about the Rolex… had no idea buying one was like buying a Ferrari, where you have to beg, jump though hoops and then buy products you don’t want to get the one you do want! 😂 thought it would be fun to own, but would much prefer to just order something on a website (like a Tesla!) and be done with it. These retail and reseller channels make things far too time consuming Is there a nice watch I can just… order?

905

320

Reido2012 retweeted

Jonas Geiping

@jonasgeiping

24 days ago

We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information. In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format 🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in) 🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency 🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security 🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized. Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.

168

156K

Reido2012 retweeted

François Chollet

@fchollet

23 days ago

The quantity of code that devs ship has roughly 10xed. But net developer productivity (value created by unit of time) is only up by a bit, if at all. Part of it is that the additional code is solving more incremental problems. A bigger part is that the new code is creating problems of its own.

172

151

455

246K

Reido2012 retweeted

Yuchen Jin

@Yuchenj_UW

24 days ago

I’m so glad AI killed LeetCode interviews. For 10 years, tech companies made every engineer grind the same puzzles and prove they could invert a binary tree from memory. Today, the dumbest AI model can walk in and one-shot the entire interview. Thank you, AI.

224

152

326

672K

Reido2012 retweeted

SHAV★

@shavnyuy

24 days ago

Burkina Faso again, showing us how good infrastructure can be when you use local materials intelligently. The Koudougou Central Market won the Aga Khan Award for Architecture. 29,000 square metres. 1,800 vendors. Primary construction material: compressed earth blocks sourced from two kilometres away. The vaulted arches were built by local enterprises with specialized earth construction expertise. The layout does two things simultaneously; maximum commercial density and maximum natural ventilation. Two overlapping grids create shade between neighboring structures while channeling cross breezes through the alleys. The market cools itself. In Burkina Faso’s extreme heat, that’s not a design gesture, it’s survival infrastructure. The deeper story is economic. Local materials meant local labour, earth extraction, brick production, vault construction, all community income. A skilled workforce was trained and retained. Demand for earth construction in the region grew long after the build ended. Then the market opened. 1,800 vendors with dignified, shaded, ventilated trading spaces. Better conditions mean longer trading hours, stable livelihoods and a neighbourhood that actually prospers. A market that works climatically is a market that works economically. 📍Koudougou Central Market, Koudougou, Burkina Faso. Laurent Séchaud / SDC.

shavnyuy's tweet photo. Burkina Faso again, showing us how good infrastructure can be when you use local materials intelligently.

The Koudougou Central Market won the Aga Khan Award for Architecture. 29,000 square metres. 1,800 vendors. Primary construction material: compressed earth blocks sourced from two kilometres away. The vaulted arches were built by local enterprises with specialized earth construction expertise.

The layout does two things simultaneously; maximum commercial density and maximum natural ventilation. Two overlapping grids create shade between neighboring structures while channeling cross breezes through the alleys. The market cools itself. In Burkina Faso’s extreme heat, that’s not a design gesture, it’s survival infrastructure.

The deeper story is economic. Local materials meant local labour, earth extraction, brick production, vault construction, all community income. A skilled workforce was trained and retained. Demand for earth construction in the region grew long after the build ended.

Then the market opened. 1,800 vendors with dignified, shaded, ventilated trading spaces. Better conditions mean longer trading hours, stable livelihoods and a neighbourhood that actually prospers. A market that works climatically is a market that works economically.

📍Koudougou Central Market, Koudougou, Burkina Faso. Laurent Séchaud / SDC.

834

236

147

20K

Omar 若い王 @Reido2012

23 days ago

@GoogleDeepMind Good way to indicate where my attention is without having to explain every little detail. Good way to indicate to Gemini where I want to focus its attention for its actions.

Reido2012 retweeted

Google DeepMind @GoogleDeepMind

25 days ago

We’re reimagining a 50-year-old interface - the mouse pointer - with AI. 🖱️ These experimental demos show how people can intuitively direct Gemini on their screens using motion, speech, and natural shorthand to get things done 🧵

461

Reido2012 retweeted

Yann LeCun

@ylecun

27 days ago

@eladgil BS. Attention was born in Montréal PyTorch in NYC. AlphaGo in London AlphaFold in London ESMFold in NYC Llama 1 in Paris. Llama 2 in Paris+NYC+SV DeepSeek in Hangzhou Plus: DINO in Paris JEPA in Montréal+Paris+NYC SV is 3 mos ahead on topics SV is singularly obsessed with.

182

499

738K

Omar 若い王

@Reido2012

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users