Karpathy said something you'll regret ignoring:
"Remove yourself as the bottleneck. Maximize your leverage. Put in very few tokens, and a huge amount of stuff happens on your behalf."
Loop engineering is the exact thing that does that.
In a hand-run session, the operator handles two things:
- deciding what the agent runs next
- and checking its output before the next step
Both are manual, and both decide how far the agent gets on its own without the operator.
Loop engineering moves both steps into the system.
A core operating structure surrounds the loop, and the diagram below depicts it.
- A schedule decides what to run
- Loop is the maker that produces the work
- A separate checker agent grades the output
- A file on disk holds the state they both read.
The loop runs until either done, max iterations, or an exhausted budget.
Here are some practical engineering considerations:
1) A model grading its own output justifies what it already did instead of catching where it failed.
That's why a separate checker's findings return to the maker as the next instruction. And the cycle repeats until the checker finds nothing left to fix.
2) A loop with no stop condition burns tokens, and the cost climbs fast once sub-agents and long runs add up.
That's why the exit must be set before the loop runs, not while it is running.
A simple exit could be:
↳ fix only the major issues, run one final pass, and stop after two loops, with "all tests pass and lint clean" as the rule that ends it.
3) State has to live on disk, not in context.
The model forgets everything between runs, so an MD file or a knowledge graph holds what is done and what is still open.
Each run reads it and writes back to it, which lets a loop pick up again after days.
4) The lower the verification bar, the safer the loop.
Boring, repetitive checks like a stale version string or a missing test are trivial to verify, so a loop runs them with little risk while the operator is away.
Judgment-heavy work is loopable too, but only as far as the checker can confirm the result.
Let's look at how an unattended loop fails in two ways.
1) It reports done when nothing is actually verified.
The separate checker exists to prevent it, but it merges code faster than anyone reads it, so over weeks, the team stops understanding its own codebase while every check stays green.
Green tests say the code passed the tests, not that anyone knows what shipped. Someone still has to read what the loop merges.
2) The checker keeps a running loop honest, but it only catches failures inside a run.
The harness around the loop, like the prompts, tools, and checks wrapped around the model, still drifts and breaks in production as models change.
That repair loop is usually run by hand based on observability traces.
My co-founder wrote a detailed walkthrough (with code) on making that harness repair itself, where a failing trace gets diagnosed, the fix is verified against the exact input that failed, and the failure is locked as a regression test so it cannot recur.
Read it below.
Kimi 2.7 ranked 2nd after Fable 5 and before GPT-5 xhigh
We have re-run our ErdosBench smoke test on 14 problems with Kimi 2.7, Qwen 3.7 Max, Grok 4.3 and compared it with the top performers from previous runs.
Kimi 2.7 is amazingly good. More below.
Hermes Agent v0.16.0 (2026.6.5) just dropped
This is arguably the biggest Hermes release yet.
Hermes is no longer just an agent framework. It’s becoming a complete platform.
Here’s what actually matters ↓
1. Native Desktop Apps Are Here
Hermes now has real desktop apps for:
• macOS • Windows • Linux
With:
• One-click install • Auto updates • Drag-and-drop files • Clipboard image paste • Cmd+K command palette • Session search & archive • Streaming chat • Inline model picker
The “it’s just a CLI” era is officially over.
2. Run Hermes Anywhere
The desktop app can connect to:
• Your homelab • A cloud server • A teammate’s instance
Heavy compute stays remote.
The desktop becomes a lightweight control center.
3. Full Web Admin Dashboard
No more living in config files.
Manage:
• Channels • MCP servers • Credentials • Memory • Webhooks • Gateway controls
All from the browser.
4. Faster Onboarding
Hermes is getting much easier to hand to non-technical users.
New Quick Setup via Nous Portal gets users from install to first conversation in minutes.
5. Lots of Quality-of-Life Improvements
• Fuzzy model picker everywhere • New /undo [N] command • Leaner default skills • NVIDIA Skills Hub integration • Full Simplified Chinese support
Other wins
• Security updates and CVE fixes • Stability improvements across the stack • Better multi-profile support • More polished desktop and web experiences
No major breaking changes.
This release feels like a turning point.
Hermes is starting to look less like a tool for AI enthusiasts and more like something you could hand to an entire company.
Siap siap banyak pengangguran 🥶
Peter Steinberger, creator OpenClaw, datang ke Microsoft Build buat jelasin gimana OpenClaw bakal diintegrasi jadi aplikasi native Windows, lengkap sama fitur keamanan baru yang namanya Microsoft Execution Containers
Dia bilang, "Sekarang lu bisa jalanin OpenClaw langsung di lingkungan perusahaan lu dengan lebih aman"
Demo-nya juga sempat ditampilin langsung di atas panggung pake Surface Laptop Ultra
#microsoft #ai #tech
I'm 100% Codex pilled now
Been using Codex and Claude Code side by side hours a day for 2 months straight
No longer using them side by side. Codex has become incredible
What did it for me is the self testing. Every change it makes it self tests in it's own browser
I went from about 40% of my changes being buggy on first go to at most 3% maybe? So much more reliable and allows me to get in an awesome flow state
Listen, Claude can literally drop an update tomorrow that changes all of this, but for now I'm really blown away by Codex
Do yourself a favor and don't have loyalty to any company. Use every tool. Use whatever is the best at the moment. Switch whenever they're no longer the best. No point in tribalism
But at the moment I'm REALLY enjoying my time with Codex
example:
people predicting who's going to win the election, the system becomes so in sync to a point where people can literally CHANGE the fate of the elected candidate by relying on the system as it is driven by LOTS of capital.
it gets way darker, but this was a light example.
Buat yang MALAS MEMBACA tapi BUKAN PARJO PARCOK. Saya bantu translasi article the Economist biar ga IKUTAN DUNGU teriak antek asing dan "semua akan hilang ketika IHSG bullish":
"Presiden Indonesia, Prabowo Subianto, pernah menyaksikan negaranya hancur sebelumnya. Itu terjadi pada tahun 1998, saat krisis keuangan Asia. Kala itu, runtuhnya ekonomi memicu protes massa dan tumbangnya bapak mertua Pak Prabowo, Suharto, seorang diktator yang terkenal korup. Peristiwa itu juga melemparkan Pak Prabowo, yang sempat berharap bisa menggantikan Suharto, ke dalam pengasingan politik. Butuh waktu seperempat abad baginya untuk merangkak kembali, hingga akhirnya berhasil meraih kursi nomor satu pada tahun 2024.
Jadi, Anda mungkin berpikir dia akan sangat berhati-hati terhadap krisis fiskal lainnya: Anda salah.
Pemimpin negara dengan mayoritas Muslim terbesar di dunia ini telah memusatkan kekuasaan dan mengelilingi dirinya dengan sekelompok penjilat. Dia mendepak menteri keuangan yang dihormati dan menggantinya dengan Purbaya Yudhi Sadewa, yang pernah menyebut IMF "bodoh" dan mengatakan kepada The Economist pada bulan April bahwa presiden tidak perlu khawatir tentang "perkembangan ekonomi global [atau] harga minyak dunia". Para pelaku bisnis di Indonesia takut untuk bersuara, mungkin karena Pak Prabowo adalah mantan jenderal antikritik dengan rekam jejak hak asasi manusia yang dipertanyakan, atau mungkin karena belakangan ini dia kerap mengintimidasi bisnis-bisnis besar.
Pak Prabowo tampaknya mengisolasi diri dari kenyataan. Jadi, dia mungkin tidak akan mendengarkan nasihat yang masuk akal. Namun, inilah beberapa masukan untuknya. Proyek-proyek kesayangannya tidak terjangkau. Sebelum perang Iran, menghabiskan proyeksi 10% dari anggaran hanya untuk dua proyek saja—makan siang gratis di sekolah dan jaringan 80.000 koperasi desa—hanya sekadar pemborosan. Sekarang, krisis energi telah menghapus semua ruang untuk melakukan kesalahan. Pak Prabowo harus mengubah arah atau menghadapi risiko krisis.
Dia harus memotong pengeluaran untuk proyek-proyek kesayangannya, atau memangkas subsidi bahan bakar fosil Indonesia yang sangat besar, atau melanggar undang-undang yang membatasi defisit anggaran sebesar 3% dari PDB. Setiap pilihan memiliki risiko. Memangkas proyek mubazirnya akan membuatnya tampak lemah. Membiarkan harga energi naik akan mengundang kerusuhan. Jadi, Pak Prabowo mungkin akan mengambil jalan ketiga: membiarkan defisit menembus batas hukumnya.
Itu akan menjadi sebuah kesalahan. Memang benar, batas 3% adalah angka sewenang-wenang yang disalin-tempel dari Perjanjian Maastricht Eropa. Namun sejak krisis 1998, angka itu telah menjadi sinyal bahwa pemerintah Indonesia serius menjaga disiplin fiskal. Sekarang para investor mulai cemas. Pembayaran bunga sebagai bagian dari pendapatan pemerintah melonjak tajam. Lembaga pemeringkat kredit sedang bersiap untuk menurunkan peringkat. Di bawah kepemimpinan Pak Prabowo, modal asing senilai $6 miliar telah keluar dan rupiah telah melemah sebesar 11% terhadap dolar ke rekor terendah. Menjebol batas anggaran akan mendorong biaya pinjaman menjadi lebih tinggi.
Bahkan saat dia membuat ekonomi menjadi lebih genting, Pak Prabowo juga mengikis demokrasi Indonesia. Oposisi legislatif hampir sepenuhnya dilumpuhkan, dan proposal untuk mengakhiri pemilihan langsung gubernur provinsi bukan merupakan pertanda baik. Masyarakat sipil diintimidasi. Ruang untuk berbeda pendapat sangat sedikit, dan jika ada, minim pergulatan kreatif antar-gagasan yang saling bersaing. Terlalu banyak hal yang bergantung pada naluri seorang mantan tentara tunggal yang mendapat saran buruk.
Dia perlu mendengar kebenaran yang pahit. Ya, bahan bakar murah memang populer. Namun hal itu mendorong konsumsi di tengah situasi kelangkaan. Ya, orang-orang menyukai makan siang gratis di sekolah. Namun memberikannya kepada semua orang adalah pemborosan. Lebih bijaksana untuk fokus pada ibu hamil dan balita dari keluarga miskin, yang membutuhkan nutrisi lebih baik guna mencegah stunting (tengkes). Ya, petani Indonesia kerap diperas oleh tengkulak saat membeli pupuk. Namun ada cara yang lebih murah untuk mengatasi hal ini ketimbang membangun 80.000 koperasi desa, yang kemungkinan besar justru rentan korupsi. Dan ya, batas defisit 3% bisa saja dinaikkan suatu hari nanti. Namun pertama-tama, Pak Prabowo harus meyakinkan pasar bahwa keuangan Indonesia berada di tangan yang aman.
Persimpangan jalan baru
Indonesia telah membuat kemajuan besar dalam seperempat abad terakhir. Di bawah serangkaian pemerintahan yang cukup pragmatis, pendapatan per kapita telah meningkat lebih dari dua kali lapor dan demokrasi mulai berakar. Pak Prabowo bukanlah penguasa kleptokratis seperti mendiang bapak mertuanya, tetapi dia sedang mengikis kemajuan yang telah dicapai negaranya sejak masa-masa kelam dulu.
Presiden harus berhenti mencoba membungkam oposisi di legislatif, media, dan masyarakat sipil. Perbedaan pendapat yang tidak menemukan saluran dalam politik akan tumpah ke jalanan, seperti yang terjadi dalam kerusuhan tahun lalu. Bersikeras bahwa oposisi harus "sopan" adalah resep yang suatu hari nanti justru bisa mengubahnya menjadi kekerasan.
Masih ada harapan. Pak Prabowo peduli dengan warisan kepemimpinannya. Jadi, dia perlu menyadari bahwa negara kepulauan yang sangat besar, luas, dan multi-etnis seperti Indonesia tidak bisa begitu saja diberi perintah layaknya sebuah unit tentara. Indonesia membutuhkan seorang panglima tertinggi yang mendengarkan banyak suara, bukan yang mengelilingi dirinya dengan orang-orang yang hanya bisa berkata "ya""
My Hermes is finally on the road to become the ultimate analyst
What it can do really well now
- Know my preference, my portfolio, my theses, my approach
- Generate visuals (like Claude artifacts)
- Daily briefings on macro, tech, X bookmarks, PMs opps, top insights
- Equities tracking, expanding investment strategies
- Do everything with Jeff's preference in mind
Result = Hermes performs a job of Data/Research/PM/Investment analyst with fraction of the cost
($60/month on DeepSeek inference + $10-15/month on X API + $20/month on Claude, everything else is free)
6 LLM Knowledge Base terms you need to know in 2026:
(Most teams are missing at least 3, their AI agents pay the price)
𝟭. 𝗟𝗟𝗠 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗕𝗮𝘀𝗲 A system where an LLM ingests your raw content, compiles a structured wiki, and answers questions by navigating its own index. Karpathy built one for himself. The hard part? Building one that works for your entire team.
𝟮. 𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗜𝗻𝗴𝗲𝘀𝘁𝗶𝗼𝗻 Auto-pulls knowledge from every tool where real work happens. Slack, CRM, meetings, docs, without anyone babysitting the pipeline. A personal KB pulls from the web. A team KB has to pull from the inside.
𝟯. 𝗦𝗼𝘂𝗿𝗰𝗲 𝗧𝗿𝘂𝘀𝘁 Not all content is equal. Source Trust tells agents (and humans) what's a verified company decision vs. someone's opinion in a Slack thread. Without it, every doc carries the same weight, which means none of them really do.
𝟰. 𝗙𝗿𝗲𝘀𝗵𝗻𝗲𝘀𝘀 𝗠𝗼𝗻𝗶𝘁𝗼𝗿𝗶𝗻𝗴 Actively re-checks what the KB thinks it knows. When two sources contradict each other, it flags the conflict and demotes the staler one. It doesn't wait for someone to notice, because that's exactly the maintenance work humans defer indefinitely.
𝟱. 𝗦𝗲𝗹𝗳-𝗠𝗮𝗶𝗻𝘁𝗮𝗶𝗻𝗶𝗻𝗴 Docs update themselves as work happens. A decision on a call lands in the right doc automatically. A roadmap change propagates everywhere it needs to go. No copy-pasting. No "someone should update this."
𝟲. 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗗𝗿𝗶𝗳𝘁 The slow, invisible gap that opens between what your docs say and what's actually true. A decision gets reversed. A process changes. A feature ships. The doc stays the same. Nobody notices, until your AI agent confidently gives someone the wrong answer. Knowledge Drift is the disease. Everything else on this list is the cure.
Which am I missing?