This is the direction we’re building with AgentOpt.
Not just agents that run tasks, but agents you can measure, test, and optimize end to end.
MEGA Optimus makes that loop executable.
MEGA Optimus is here.
Point it to a project folder. Optimus writes the spec, builds the eval harness, and runs the optimization loop end to end.
Reduce your token spend and latency while improving accuracy.
https://t.co/Dx7mxZeVwZ
Autonomous AgentOpt
@iKunalmathur Love the local-first angle!
Curious how you’re optimizing Notio for production, especially around messy voice inputs, category edge cases, and insight quality.
Happy to connect.
Building AI agents is getting easier.
Improving them is still hard.
Looking to connect with founders and builders working on reliability, evals, optimization, token efficiency, or AgentOps.
Curious what you’re building and what you’re struggling with.
Let’s connect, learn from each other, and build better agents together!
Hey AI builders 👋
Looking to connect with people working on:
🚀 Agent Systems
📊 AI Evals
🔄 Agent Optimization
⚡ AgentOps
🛠️ Developer Tools
🤖 AI Infrastructure
🔍 AI Security
What are you building right now?
And what’s been the hardest part of measuring, evaluating, or improving it?
Drop it below 👇
@mytwillot Love this use case.
Messy tweet data is exactly where evals get hard: categories drift, edge cases pile up, and “looks right” isn’t enough.
Happy to compare notes.
@JustJerry121 Glad to connect as well.
We treat failures as optimization assets.
Once a failure is captured, we turn it into a repeatable eval case, add it to the evaluation set, and reuse it to validate future improvements.
Love this pattern.
We tested a persona-style SOUL.md used with OpenClaw and found that the operating contract itself can become part of the security surface.
After hardening it, the agent became safer without losing the behavior that made it useful.
Hermes’ SOUL.md could be even stronger with that layer too.
Related test:
https://t.co/4Kvh0vPbE1
@garrytan The 10x less code direction makes sense for many workflows. Curious how you think this applies to enterprise use cases where deterministic outcomes, auditability, and compliance are critical.
Building AI agents is getting easier.
But making the entire agent system reliable, cost-efficient, and better over time is still hard.
We are building an optimization infrastructure for agent systems to solve this.
It measures system performance, identifies what needs to improve across prompts, workflows, tools, and code, tests changes against real tasks, and keeps only what actually works.
Explore MEGA Code: https://t.co/a7KUh2UKJU
@swyx@bentannyhill@Zach_Kamran Maybe not full self-driving yet.
Observability and evaluation have come a long way. Automated improvement is the next frontier.
Everyone is building agents.
But how do you know they’re actually behaving as intended?
Do you rely on logs, evals, user feedback, manual review, or something else?
And when they fail, how do you turn that failure into measurable improvement?
@ellen_in_sf Great breakdown.
Reducing token usage during a session clearly matters.
Beyond that, helping agents use fewer tokens while maintaining or improving task performance over time is just as important.