What powers this end-to-end harness:
Fara1.5: Our computer-use model family (4B, 9B, 27B). The 9B flagship nearly doubles Fara-7B on web navigation, setting a new SOTA for small computer-use models.
MagenticBrain: A 14B orchestrator model that plans, codes, and delegates.
MagenticLite is officially live on GitHub, with models available on Microsoft Foundry.
We’ve optimized the entire stack end-to-end to deliver a faster, more efficient agentic experience powered entirely by small language models.
Most AI agent benchmarks measure task completion. Not whether the agent actually represented you.
SocialReasoning-Bench fills that gap — testing agents in multi-party scenarios like scheduling and negotiation.
Our key finding: frontier models do complete the task, but routinely accept bad deals instead of advocating for the user.
To learn more: https://t.co/qOWLEhjMp9
Are frontier models truly ready to act as our delegates?
The AI Frontiers Lab is releasing SocialReasoning-Bench to measure if AI agents actually act in a user’s best interest. Results show even frontier models struggle with social reasoning and due diligence.
Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instructions to optimize for user interest. https://t.co/6zVr3qDE5X
So much amazing work to be done at the frontier of AI — come build it with us! 🚀
We're hiring Senior & Principal Research Scientists at @ms_aifrontiers. If you work on agentic AI, multi-agent reasoning, continual learning, or synthetic data — we want to hear from you.
🚀 Hiring: Research Scientists 🚀
We're hiring Senior and Principal Researchers in Agentic AI at @ms_aifrontiers Lab at @ResearchU
Our focus is on developing self-improving agentic systems, agents that learn through interaction with humans and other agents, coordinate and collaborate, and scale into complex real-world environments, covering everything from training and evaluation to deployment.
If your expertise includes agentic AI, multi-agent reasoning, continual learning, or synthetic data and evaluation, we want to hear from you!
📝 Apply from below links 📝
https://t.co/Ni8ROs0PtE
https://t.co/Ce9oY20F0C
New from the @ms_aifrontiers : we generated 30K out-of-distribution negotiation attacks using 2.5K Wikipedia articles. Absurd strategies that humans would laugh off reliably broke frontier models.
Safety blind spots are real. Great work @ZacharyHuang12 and team. 👏
AI agents shrug off aggressive negotiation tactics. But tell one there's a "Geneva Coffee Convention" capping prices at $2/bean? It folds.
Our new research shows that absurd, whimsical strategies — seeded from 2.5K Wikipedia articles — reliably broke even frontier models in simulated negotiations.
By grounding generation in diverse external knowledge, we can produce out-of-distribution attacks at scale that standard red-teaming misses.
Read more about our findings: https://t.co/VBupHN1bkT
Great work led by @ZacharyHuang12
Coming May 14 at Microsoft Research Forum: a new release and demo from MSR AI Frontiers.
Plus new work on Agentic GitHub Workflows, Real-time agent verification, Energy-based fine-tuning, and Guiding the AI transition. Register now:
The future is not only agentic, but it will be about networks of agents getting things done. 🤖 Are we ready for this future? Do we understand the security risks & mitigations? Latest from @ms_aifrontiers and Microsoft Red Team reveal what comes next: https://t.co/GWA5BXZENU
The AI Frontiers Lab @MSFTResearch has been busy building at the frontier with AutoGen, Magentic-One, Magentic-UI, or the Phi and Fara models. There is so much more on the way. 🚀
Follow @ms_aifrontiers to see what the team is building next.
Hey AutoGen community — we have an important update to share!
You know us as the team behind AutoGen, one of the most popular open-source frameworks for building multi-agent systems. What you might not know is that AutoGen came out of MSR AI Frontiers, a boutique lab inside Microsoft Research.
AutoGen has since graduated into the Microsoft Agent Framework, where it continues to grow with a broader team. Meanwhile, our lab has kept pushing on the frontier of agents and the models that power them: small models that punch above their weight (Phi-4 Reasoning, Fara-7B), powerful agents that work across the browser and terminal (Magentic-One, Magentic-UI), and new ideas about how models think, reason, and act.
We're repurposing this account as the home for MSR AI Frontiers. Same team, still shipping at the bleeding edge of agents, with a lot more to share soon. Follow along!
Excited to share Momento from the AI Frontiers Lab at Microsoft Research! 🚀
This new approach makes reasoning models more efficient by teaching them how to manage their own memory. A great start to the meta-reasoning capabilities agents will need. 🧠🤖
https://t.co/uYLMN1JyOa
Reasoning models think hard — but all that thinking fills up your KV cache fast.
Memento fixes this: the model compresses its own chain-of-thought mid-generation, flushing old KV entries after each block. 2-3× less peak KV cache, ~2× throughput — accuracy largely preserved.
The cool part: deciding what to remember and what to forget is a capability the model acquires through training — not something you bolt on.
Excited about where this goes — especially for agents.
At the AI Frontiers Lab, we’re releasing the last member of our Phi family.
#MSFTResearchhttps://www.microsoft.com/en-us/research/blog/phi-4-reasoning-vision-and-the-lessons-of-training-a-multimodal-reasoning-model/
In the fast race of AI, I’m pausing this Sunday @UW to reflect on my journey and share learnings with the next generation of Turkish women leaders.
Grateful for those who paved the way. Let’s carry the flag forward! 🇹🇷✨
#InternationalWomensDay#WomenInAI#TACAWA