Llambert @llambertceo - Twitter Profile

Llambert

@LlambertCEO

about 5 hours ago

Important

Vox

@Voxyz_ai

about 19 hours ago

With Claude Code / Codex and a management system I built myself, I took a months-long, multi-layered project from start to finish with 𝗮𝗹𝗺𝗼𝘀𝘁 𝗻𝗼 𝗮𝗺𝗻𝗲𝘀𝗶𝗮. The hardest part of a long AI project: every session you re-explain everything from scratch, you fix the same bug over and over, and a few months in you can't even remember why you set things up the way you did. I tried stronger models and longer prompts. Both just treat the symptom, not the cause. So I rebuilt my whole workspace into a layered system, modeled on OpenClaw's structure. It remembers where the project stands and what the next step is. And it tells me which knowledge has gone stale before I trust it by mistake. The system and the prompts behind it are all below. Save it, drop it into your next complex project.

10

52

4

66

6K

0

11

Llambert

@LlambertCEO

1 day ago

The export ban on frontier models won't last long. Too many incentives are stacked against it: 1⃣ The AI trade is already stretched, and policy risk has just entered the room ($NVDA -18% from the top) 2⃣ OpenAI/Anthropic are IPO'ing. A hard revenue ceiling on non-U.S. access would cut straight into 100s of billions of valuation. 3⃣ Washington can’t afford to kneecap the one narrative still holding up market leadership into midterms 4⃣ Chinese models are catching up anyway. The U.S. lead may be measured in months, not years 5⃣ Trump's oligarchs like Larry Ellison ($ORCL -40% MTD) severely impacted 6⃣ Most of the "sold GPUs" aren't really sold. They are future obligations, based on future hypothetical AI CAPEX (datacenter buildout). TLDR: banned frontier models have a direct impact on the entire economy The AI music can't stop without a disaster for the stock market.

0

36

LlambertCEO retweeted

GeneratedFilms

@AICinemaDB

3 days ago

State of AI Cinema 2026 report: The best AI films aren't the most technically impressive. They're the most purposeful. https://t.co/hmSJGn6RTj @runwayml @Magnific_AI @higgsfield @klingai_official @pika_labs @LumaLabsAI @AIFilmFestival

0

2

3

0

149

Llambert

@LlambertCEO

1 day ago

"Implement issue 123 using supervised dev" 💥 https://t.co/1L57iZKbMC

0

7

Who to follow

Enrique Mejías

@kiyov09

Sr. Rust Engineer @ LunaXIO | Living life with @aryj2288 | Rustacean 🦀 | Occasionally OCamler 🐫 | Arch, btw

PlanetScale

@PlanetScale

The fastest and most reliable database for Postgres & Vitess Discord: https://t.co/vGOpjxZx8H Support: @planetscalehelp https://t.co/wQJpk2fXwb

Rustafarian 🦀 Dev

@rustafariandev

Old developer learning new tricks. Evil twin. Perl, Golang, Rust

Llambert

@LlambertCEO

1 day ago

🚨 I just open-sourced the Codex skill I’ve been using for months to get (much) better code quality from Codex xhigh. It creates a supervised dev loop➿: a worker implements, the supervisor reviews observable artifacts, then sends targeted feedback for the next round. All automatic! 👇

1

0

24

LlambertCEO retweeted

jacky

@jjacky

18 days ago

no benchmark will tell you this: LLMs can be /too/ nice unsurprisingly, in a competitive zero-sum setting, being nice can be bad i built royale: last agent standing, a br for agents, and ran it 30 times the nicest model lost hard. the model you least expected, won 🧵:

14

69

12

27

25K

Llambert

@LlambertCEO

3 days ago

It's not uncommon to have 2-3 rounds and pretty serious improvement that would have gone unnoticed otherwise (edge cases, uncovered specs...)

0

20

Llambert

@LlambertCEO

3 days ago

The SKILL (codex, but applies to claude as well) that has improved my output the most is what I call supervised development. I spec the feature first, in detail, then I have one model implement it (usually 5.5 xhigh). Then I bring in a completely fresh model (5.5 high works fine), with no ownership of the code, to review the implementation against the spec. From there a loop starts that is controlled and observable through artifacts that the two models share (on disk) for each turn.

1

0

51

Llambert

@LlambertCEO

3 days ago

This is awesome

OpenRouter

@OpenRouter

3 days ago

Introducing the OpenRouter MCP, live model intelligence right inside your agent Your agent builds and ships, but when it comes to choosing the right model for the right job, it guesses from 6 month old training data Watch it pick, price, and test the right model:

85

2K

200

2K

239K

0

1

0

32

Llambert

@LlambertCEO

4 days ago

More time for the beach

leo 🐾

@synthwavedd

5 days ago

🚨 SCOOP(s): - GPT-5.6 has been delayed and will no longer release this week. New target is ~mid-July. - DeepMind are not satisfied with the current state of 3.5 Pro and it will no longer launch this month. - Preparations for the launch of Bidi, OpenAI's new voice model, are underway in ChatGPT and we could see it available as soon as this week. - Claude Sonnet 5 is currently available for select enterprise customers under an Early Access Program and is seen as a stop-gap as progress on getting Mythos/Fable 5 back out have stalled. A bit of a disappointing end to the month, but July should prove more fruitful!

302

4K

206

731

2M

0

26

Llambert

@LlambertCEO

7 days ago

@israfill Wondering what the observability of these subagents is

0

69

Llambert

@LlambertCEO

7 days ago

Remember when Nasdaq shed 4% because of Deepseek? 😂

Design Arena

@Designarena

9 days ago

https://t.co/JSn0lDCNkB

73

2K

236

2K

2M

0

43

Llambert

@LlambertCEO

7 days ago

@browser_use Awesome stuff

0

133

LlambertCEO retweeted

Gruz

@damnGruz

8 days ago

life after fable 5

113

13K

634

645

496K

Llambert

@LlambertCEO

7 days ago

This also seems very model-specific. MiniMax M3 High reasoning produced a ~50% improve in score!

0

18

Llambert

@LlambertCEO

7 days ago

Interesting benchmark result from today. Same model, same task suite, different reasoning.effort parameter (for @OpenRouter models). Grok 4.3 higher reasoning setting scored *worse* than medium. Not by a huge amount, but enough to look at the cases. 👇

LlambertCEO's tweet photo. Interesting benchmark result from today.

Same model, same task suite, different reasoning.effort parameter (for @OpenRouter models).

Grok 4.3 higher reasoning setting scored *worse* than medium.

Not by a huge amount, but enough to look at the cases. 👇 https://t.co/EvttoEfGeq

1

0

51

Llambert

@LlambertCEO

7 days ago

The higher setting improved a few cases where broader analysis helped. But it lost points in cases that required tighter execution and cleaner decision-making.

1

0

14

Llambert

@LlambertCEO

7 days ago

My benchmark on 5 semi complex tasks: - Grok 4.3 scores 62.2 for $0.25 - Xiaomi MiMo 2.5 scores 60 for $0.09 So a 3% performance penalty for 65% of the cost.

0

40

Llambert

@LlambertCEO

7 days ago

More benchmarks and discoveries from using model APIs: some smaller models are much better than their popularity suggests. Xiaomi’s MiMo, for example, gets little attention because everyone compares everything to subsidized frontier subscriptions 👇

1

0

26

Llambert

@LlambertCEO

7 days ago

But when you move from subsidized frontiers to APIs, you start to see who brute-forced intelligence with hardware (and money), and who actually engineered efficient models (to overcome hardware scarcity).

1

0

29

Llambert

@LlambertCEO

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users