Hari ini selesai juga akhirnya project kedua gue. Jadi, singkatnya di project ini, gue pengen bikin simple text rpg game, tapi emphasisnya di tes bahasa Inggris. You can visit here: https://t.co/qTunKdOCl9
#pamerajadulu
Kapan Harus Memakai Machine Learning (ML) DAN Kapan Harus Memakai Simulasi (Rumus) Fisika?
Banyak programmer salah pilih: mahal!
Tulisan ini ditulis secara sederhana sebagai sebuah pengantar
[Utas]
@txtdrprogrammer@HIMAFI_ITB@physicsitb
Someone documented the engineering principles behind AI agents that actually work in production.
It's called 12-Factor Agents.
Here's what each factor actually means and why it matters:
Factor 1 - Natural Language to Tool Calls
The LLM's only job is to decide what to do next, outputting structured JSON. Your deterministic code does the actual execution. This separation is what makes agents debuggable.
Factor 2 - Own your prompts
If a framework hides your prompts from you, you can't debug output quality. Visibility is non-negotiable.
Factor 3 - Own your context window
The context window is the agent's entire working memory. What you put in, in what order, with what compression, determines output quality more than model choice. This is context engineering, the most underrated skill in agent development.
Factor 4 - Tools are just structured outputs
Tool calling is not magic. It's JSON schema. The LLM outputs a structured object. Your code pattern-matches on it and executes. Demystify this and everything else gets simpler.
Factor 5 - Unify execution state and business state
Don't maintain two separate state systems. The agent's execution state and your application's business state should live in one place or you'll spend your life keeping them in sync.
Factor 6 - Launch/Pause/Resume with simple APIs
Production agents get interrupted. Users change their minds. Systems go down. Design for pause and resume from the start, not as an afterthought.
Factor 7 - Contact humans with tool calls Human approval isn't a special interrupt mechanism. It's just another tool the agent can call. This reframe makes human-in-the-loop trivial to add and trivial to remove.
Factor 8 - Own your control flow
Let the LLM decide what action to take. Keep the if/else and switch statements in your code. The moment a framework owns your control flow, debugging becomes reverse-engineering.
Factor 9 - Compact errors into context window
A failed tool call is information, not an exception to throw. Put the error back into context so the agent can reason about what went wrong and try differently.
Factor 10 - Small, focused agents
One agent. One job. Reliability degrades with scope. The agents that work in production do one thing well and hand off cleanly to the next.
Factor 11 - Trigger from anywhere
Email, Slack, webhook, cron, mobile app. The same agent should be triggerable from any surface without rewriting the core logic.
Factor 12 - Make your agent a stateless reducer
Given the same context window, the agent always produces the same next action. Test it like a function. Debug it like a function. This is the architectural principle that makes everything else tractable.
The fastest path to production AI is understanding these principles well enough to apply them inside what you're already building.
22k+ stars.
GitHub Repo: https://t.co/nQjPc8w3V1
awesome-harness-engineering - Awesome list for AI agent harness engineering: tools, patterns, evals, memory, MCP, permissions, observability, and orchestration. https://t.co/SJvF5GUdjj
We just launched a new project that teaches you how to build Flash Attention with CUDA, step by step.
By the end, you’ll have a working Flash Attention kernel built from the ground up.
The project covers:
-CUDA primitives warm-up
-Matrix operations
-Naive attention baseline
-Online softmax math
-Tiled attention building blocks
-Fused Flash Attention kernel
-Causal Flash Attention
It will be open to everyone for the first 2 weeks, then it will become part of our premium projects.
Self Improving AI (SIA) beats Karpathy's autoresearcher agent by improving itself!
SIA is a Self Improving AI framework to autonomously improve the performance of any AI system (Model / Agent) on a benchmark task.
Most agent frameworks are static. Fixed harness, fixed model weights, fixed memory layer. They plan, act, and use tools. SIA operates on a different layer entirely.
SIA focuses on one problem: how do you design structured feedback loops that allow an agent to evaluate its own performance, adapt its strategy, and get better over time?
After every run, SIA evaluates itself and improves three things. It updates its own harness. Updates the weights of its underlying model. Updates its own memory layer to handle new complexities. The agent rewrites itself based on what it learned.
On MLE-Bench, OpenAI's benchmark for evaluating an agent's ability to train ML models, SIA climbed to the top of the leaderboard. Beat every specialized ML research agent including MLEvolve and AIRA-dojo. Then kept improving and displaced its own previous versions on the leaderboard.
I've shared the link to the paper and the repo in the replies!
Saya QRT biar pada baca.
List-nya gini:
- Sewa studio yang cukup terjangkau (di Fatmawati)
- Claude Project buat drafting skripnya
- ChatGPT atau Gemini buat Deep Research konten yang menarik
- Editing: rough cut + fine cut di DaVinci, efek sama caption di Capcut
Lanjut >>
Took some inspiration from @vboykis and converted my first ever talk into a blog post.
I talk about the role of agentic search in context engineering.
Together we build an intuition on the strengths and weaknesses of a selection of search tools.
🔗 https://t.co/nuGJ5Zm9Du
i can't believe people don't know you can just make your skills better using iterative AutoResearch
we did it for our browser skills and created /autobrowse, read about how we make our skills up to 90% faster and cheaper to run.
For curious developers 🧠
I built "The Anatomy of an LLM", an interactive explainer showing how text becomes tokens, vectors, attention, transformer blocks, and finally generated text.
https://t.co/fgCeZuQwJf
Folks: when you write skills, ask your agent to be token efficient, relax grammer. I see too many skills that write books in the skill description, and all that crap is loaded into every context.
I wrote a skill that finds the worst offenders. https://t.co/kfaaJpxMXE
Every engineer should read this.
The principles for building reliable software systems have been around for a long time. Max outlines them beautifully.
Here's to getting that 99.99% on your status page.
https://t.co/HFDcriLodl