New benchmark to empirically validate policies for preventing reward hacking!
Yet another axis of evaluation (environment configuration) to experiment with harbor :)
Harbor now natively supports evaluating ACP agents:
𝚑𝚊𝚛𝚋𝚘𝚛 𝚛𝚞𝚗 --𝚊𝚐𝚎𝚗𝚝 𝚊𝚌𝚙:𝚓𝚞𝚗𝚒𝚎
This brings 36 agents directly to Harbor, plus all future ACP agents the day they are added.
Copilot Cowork is now generally available worldwide, now with multi-model support!
Every organization can put long-running agents to work on complex, multi-step tasks, grounded in your organization's unique knowledge and know-how. https://t.co/1fJNjGOs5o
MAI-Code-1-Flash is now rolled out to 100% of GitHub Copilot Free, Education, Pro, Pro+, and Max subscribers in VS Code. Copilot CLI roll out and Enterprise/Business preview on its way. Give it a try and let us know what you think!
FrontierCode is a great example of how you can push evals beyond correctness and reward other outcomes, like quality.
Congrats to the @cognition team!
(P.S. built on @harborframework 🫶)
I don't know anyone who doesn't have the utmost respect for Karpathy. This short documentary shows once again what a great scientist he is. A huge win for Anthropic.
During the #MicrosoftBuild keynote, @cassidoo took the stage to demo the new GitHub Copilot app, showing how agents, multi-model reviews, custom UI canvases, and enterprise-ready deployment through Rayfin come together in one workflow.
Watch @shanselman and Samantha Song demo OpenClaw on Windows at #MicrosoftBuild.
In the full walkthrough, you’ll see how the Windows Companion app helps you set up your own claws, connect to existing agents, and run OpenClaw across Windows or WSL.
Seven new models launching at Build: let’s go!
Reasoning. Code. Image. Transcribe. Voice.
Built from scratch on a clean data lineage, designed for efficiency, working seamlessly as a family of models
Thread 🧵
#MSBuild
5/With our 7 new MAI models + Frontier Tuning, we are helping every company move from just consuming frontier models to fully participating in the frontier ecosystem.
John von Neumann’s letter of recommendation for Alan Turing for a Procter Fellowship at Princeton for the year 1937-38, ca. June, 1937.
📷 Princeton University
What's better than writing a book about GC? Writing a GC! I am excited to share that I've joined Microsoft as Principal Software Engineer, to work on the evolution of the .NET Garbage Collector and in general the future of the .NET runtime. Stay tuned for much more!
After my two-year detour into agentic AI, and my deep .NET background, I find it a perfect match for today's evolution of .NET and serving heavy AI workloads. The intersection of AI with low-level programming and hardware-aware algorithms is a great place to be. Not to mention AI-assisted work and engineering is already deep in my heart.
#dotnet 💜
Today we’re launching Command Line, a new technical blog about how and why we build.
Our first stories go behind the architecture, design decisions, and research shaping the next wave of developer tools.
Stay tuned for more next week at #MicrosoftBuild https://t.co/JKbHGpz130
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price.