AGI surpasses human-level performance at computer use.
We’re excited to announce that AGI, Inc. is now the global leader on OSWorld-Verified, the industry benchmark for AI computer-control.
agi-0 is the first agent to reach a superhuman score on OSWorld, with a score of 76.2%.🔥
Learn more about in it our company blog post from @_gundawar:
👇
https://t.co/2V36lNC3M1
🏆 Grand Prize Winners: Daydreamer
@diegocaples@_gundawar
They're tackling the "GPT Moment for Robotics."
Their agent uses a video diffusion model to imagine a successful outcome, executes it in the real world, and then uses VLM feedback to self-improve, training only on its successes.
𝗗𝗟𝗥 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵𝗲𝗿𝘀 𝗴𝗮𝘃𝗲 𝗮 𝗿𝗼𝗯𝗼𝘁𝗶𝗰 𝗮𝗿𝗺 𝗳𝘂𝗹𝗹-𝗯𝗼𝗱𝘆 𝘁𝗼𝘂𝗰𝗵 𝘀𝗲𝗻𝘀𝗶𝘁𝗶𝘃𝗶𝘁𝘆 𝘄𝗶𝘁𝗵 𝗻𝗼 𝗮𝗿𝘁𝗶𝗳𝗶𝗰𝗶𝗮𝗹 𝘀𝗸𝗶𝗻 𝗻𝗲𝗲𝗱𝗲𝗱.
They used internal force-torque sensors at 8 kHz + deep learning. The robot can feel where you touch it, recognize letters drawn on its surface, and respond to virtual buttons placed anywhere on its body.
What's interesting is the infrastructure behind it. To train these models, you need high-frequency sensor streams, manifold learning to unfold trajectories, and the ability to iterate fast.
They collected 2,300 samples from 20 people and hit 95.5% accuracy on digit recognition.
This is what's possible when you have the right data infrastructure.
📄 https://t.co/yadvb1iKnW
Video credit: @DLR_en
AGI, Inc. is now the global leader on the AndroidWorld benchmark, with state-of-the-art verified performance of 97.4%
This is a huge milestone for Android use, and just a sneak preview of what's coming - bringing trustworthy, reliable agents to every screen 🚀
Did you know that when they say stuff like "The A18 uses TSMC's 3nm process" or "announced the 2nm node"
The 3nm, 2nm actually doesn't mean anything?! It's just like a version number. They make it up. Literally nothing measures 2nm or 3nm.
I certainly didn't know.
Waymo is so safe that if every car was driven like a Waymo, about 9% of America's life expectancy gap would disappear.
9 percent
Americans die in car accidents *that often*.
@jxmnop Because the models produced by this method are very different than the models learned by gradient descent. While this does give us a “ground truth” to benchmark interp methods on, the results don’t generalize to actual learned models.
SOTA AI agent that reliably works...
where Claude, Gemini, and o3 fail...
to do the boring chores in life...
@FeatherlessAI is making this possible, as part of our work into AI reliability
Surpassing existing frontier models & agents by 50%+
🚀 INTRODUCING REAL Bench: Our New Standard for Web AI Agent Evaluation
We're thrilled to announce the release of REAL Bench - our groundbreaking benchmark to transform how web AI agents are evaluated!
Why we created REAL Bench:
✅ We built functional replicas of popular websites to test what agents can REALLY do
✅ We wanted to measure ACTUAL performance, not academic abstractions
✅ We compared leading frameworks including BrowserUse (31%) and StageHand (19%)
What web tasks would YOU like to see AI agents tackle? Join our community to be part of the agentic revolution reshaping AI! ⚡
👉 Explore REAL Bench → [https://t.co/wdDqtPhk2a]
🛠️ Try REAL Bench and get your REAL score today → [https://t.co/baXfnhs2pC]
Learn to build AGI agents you actually want to work with 🔥
Sign up and follow 👉: https://t.co/xzjlVPtBFl
In collaboration with @AndrewYNg and @DeepLearningAI!
AI agents that can browse the web, fill out forms, and even place online orders are no longer just research demos—they’re being built today.
But real-world websites are complex. Layouts change. Popups appear. And one wrong click can cascade into booking the wrong flight or buying the wrong product.
In our new course, Building AI Browser Agents, made in collaboration with @agi_inc, you’ll learn how to build web agents and how to make them more reliable using AgentQ, a framework that helps agents self-correct.
Guided by instructors @divgarg and @namangarg0, you’ll build agents step-by-step: from scraping and summarizing, to signing up for newsletters, to navigating the open web and choosing optimal actions.
👉 Learn for free: https://t.co/Poa7kJ4WM7
Good AGI agents complete tasks. Great ones check their own work. Discover how to build them in our new course with @DeepLearningAI
Enroll Now! https://t.co/P1R495nEXQ
We won 1st Place! 🏆 Our hackathon project 'AutoRL: Reinforcement Learning is all you Need' trains open-source LLMs via RL to master tools (MCPs) rivaling closed-source models. Proud of the team: @diegocaples, @thomastjoshi, @xdotli! Thank you @JvNixon! #RL#LLM#ML#AI#AGIHouse
Anthropic brought Model Context Protocol to life.
We gathered 200+ elite hackers for 12 hours to build the open source future of AI agent connections.
Here's what we saw at the Finally Connected MCP Hackathon, where LLMs met the real world, with @AnthropicAI, @SmitheryDotAI, @kodjima33, @ExaAILabs, by @JvNixon:
1/ AutoMCP 🥇 1st Place
ToolMaster RL - Training open-source LLMs to excel with MCPs through reinforcement learning.
This project creates an environment where models learn tool usage through trial and error rather than prompt engineering.
"Reinforcement Learning is All You Need" for transforming mediocre open-source models into tool-using experts that rival closed-source alternatives.
Diego Caples, @diegocaples
Thomas Joshi, @thomastjoshi
Meghna Natraj, @NatrajMeghna
Xiangyi Li, @xdotli