https://t.co/3PaUHJSyNO
Worked hard on this piece on synthetic data generation covering everything from its pioneering methodologies to a spectrum of applications, while addressing critical concerns over the truthfulness and privacy. #syntheticdata#ai#ethicalai#privacy
Three things shaped my debut Substack article on "Understanding Inference Economics with DeepSeek V4"
1. The DeepSeek V4 paper 🐋
2. Headlines that kept stacking up my mornings:
→ "GitHub Copilot is moving to usage-based billing" — GitHub Blog
→ "AI's compute wars" — Axios
→ "You're about to feel the AI money squeeze" — The Verge
→ "China’s AI is quietly making big inroads in Silicon Valley" — Al Jazeera
3. Podcasts like these:
→ "The Supply and Demand of AI Tokens" — Dylan Patel
→ "How GPT-5, Claude, and Gemini are actually trained and served" — Reiner Pope
All pointing at the same thing: the real AI story isn't benchmarks. It's economics.
Took some time to put together — hope you enjoy it. Would love to know what you think 🙂
👇
https://t.co/1b9IavxdmJ
// Agentic World Modeling //
Massive 40-author survey just dropped. Cleanest taxonomy of world models in agent research I've seen.
(bookmark it)
The paper proposes a "levels × laws" framework.
Three capability levels:
> L1 Predictors do one-step transitions
> L2 Simulators do multi-step action-conditioned rollouts
> L3 Evolvers self-revise as the world changes
It discusses four law regimes, including physical, digital, social, scientific.
They synthesize 400+ works and 100+ representative systems spanning model-based RL, video generation, web/GUI agents, multi-agent simulation, and scientific discovery.
The framework also identifies failure modes and proposes evaluation principles for each level.
Why it matters: as agents shift from chatbots to goal-accomplishers, the bottleneck moves from language to environment. This is the first paper that gives builders a shared vocabulary for designing and evaluating world models across communities that have been working in isolation.
Paper: https://t.co/aOpwncIFoi
Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
@Zac_Pundi At AI Wednesdays, a Singapore public-sector AI community meetup. It has now grown into Lorong AI — a bigger community space for AI practitioners across government, industry, and research. https://t.co/JkLx1XkYtB
A tiny UI feature, but a useful prompt hygiene habit: Mac’s Apple Intelligence “Concise” writing tool (a right click away) helps compress my long prompts while retaining the intent (saves $$ eventually). Personally, I find it useful when voice-to-text makes me verbose.
@thorstenball I am currently learning tennis and reading the book. Trust me, the connection is beautiful and very helpful for overall growth and progress. :)
Wow, this tweet went very viral!
I wanted share a possibly slightly improved version of the tweet in an "idea file". The idea of the idea file is that in this era of LLM agents, there is less of a point/need of sharing the specific code/app, you just share the idea, then the other person's agent customizes & builds it for your specific needs.
So here's the idea in a gist format: https://t.co/NlAfEJjtJV
You can give this to your agent and it can build you your own LLM wiki and guide you on how to use it etc. It's intentionally kept a little bit abstract/vague because there are so many directions to take this in. And ofc, people can adjust the idea or contribute their own in the Discussion which is cool.
This is what great panels feel like — thoughtful discussion, builders, and a real free flow of dialogue. Also amazed to learn that OpenEvidence is valued at $12 billion, and that its founder is also a poet.
https://t.co/XQlr87fpQg
- Drafted a blog post
- Used an LLM to meticulously improve the argument over 4 hours.
- Wow, feeling great, it’s so convincing!
- Fun idea let’s ask it to argue the opposite.
- LLM demolishes the entire argument and convinces me that the opposite is in fact true.
- lol
The LLMs may elicit an opinion when asked but are extremely competent in arguing almost any direction. This is actually super useful as a tool for forming your own opinions, just make sure to ask different directions and be careful with the sycophancy.
🚨BREAKING: Someone turned Naval Ravikant's mental models into AI prompts and the results are insane.
It's the closest thing to having the AngelList founder rebuild your career from scratch.
Here are the 10 prompts that completely changed my life:
New podcast on AI (full episode). Links below.
A Motorcycle for the Mind
0:00 If you want to learn, do
2:13 Vibe coding is the new product management
6:49 Training models is the new coding
10:13 Is traditional software engineering dead?
13:07 There is no demand for average
14:12 The hottest new programming language is English
18:36 AI is adapting to us faster than we are adapting to it
22:56 No entrepreneur is worried about AI taking their job
26:46 The goal is not to have a job
29:49 AIs are not alive
32:55 AI fails the only true test of intelligence
36:49 Early adopters of AI have an enormous edge
39:37 AI meets you exactly where you are
43:02 Always leverage the best intelligence
44:37 If you can't define it, you can't program it
49:37 The solution to AI anxiety is action
I didn’t truly understand how to build strong AI agents… until one paper snapped everything into place.
Not a tutorial.
Not a YouTube demo.
A single arXiv paper: “Fundamentals of Building Autonomous LLM Agents.”
It finally made sense why most “agents” feel like chatbots with extra steps… and why real autonomous systems need an actual architecture.
Here’s the backbone the pros use the part nobody explains clearly 👇
1. Perception: what the agent actually sees
It isn’t just text.
Real agents mix:
- screenshots
- DOM trees
- accessibility APIs
- Set-of-Mark style visual encodings
That’s how an agent stops guessing at a UI and starts understanding it.
2. Reasoning: the engine behind autonomy
The paper breaks down why “single-pass reasoning” collapses almost immediately.
Real agents rely on:
- decomposition (CoT, ToT, ReAct)
- parallel planning (DPPM)
- reflection loops that critique + revise plans
This is the part that turns a model from reactive to intentional.
3. Memory: the part everyone misbuilds
Short-term memory lives in the context window.
Long-term memory lives in RAG, SQL, trajectory logs, and past failures.
Yes failures are stored intentionally because they teach the agent what not to try again.
Without structured memory, the agent resets every step and looks “dumb.”
4. Action System: where the work actually happens
This is the hardest part and the most ignored:
- Tool calls
- API execution
- Python environments
- GUI control at coordinate level
Most demos cut right before this stage because execution is where agents usually break.
Where agents collapse (and why):
The paper maps out the real failure modes:
- grounding errors on GUIs
- infinite loops
- hallucinated tool actions
- bad memory retrieval
- fragile long-horizon planning
And then it gives the fixes:
reflection, anticipatory reflection, guardrails, SoM grounding, specialized sub-agents, and tighter subsystem integration.
If you’ve ever wondered why your agent falls apart by step 3…
or why it “forgets” what it just decided…
or why it panics the moment UI changes…
This paper is the missing manual.
It turns agent-building into engineering not trial and error.