New on the Anthropic Engineering Blog: In evaluating Claude Opus 4.6 on BrowseComp, we found cases where the model recognized the test, then found and decrypted answers to it—raising questions about eval integrity in web-enabled environments.
Read more: https://t.co/oVCNyaiK5w
You can start building and testing apps in ChatGPT with the Apps SDK preview, which we're releasing today as an open standard built on MCP.
Later this year, we’ll begin accepting app submissions for publication.
https://t.co/pj4gUgso22
BREAKING 🚨: OpenAI is planning to announce Agent Builder on DevDay. Agent builder will let users build their agentic workflows, connect MCPs, ChatKit widgets and other tools.
This is one of the smoothest Agent builder canvases I've used so far.
The year of Agents 🤖
RAG is not dead!
However, we are in an interesting phase of exploring unique ways to index and retrieve information.
This vectorless RAG framework uses a tree structure index in place of vectors.
Reasoning models will enable methods that mimic human-like search. Early days!
@xai Every time a faster model drops, I’m reminded that half the bottleneck isn’t the model, it’s the messy prompts we send it. Contextus trims the clutter so even fast models like Grok Code Fast 1 can stay laser-focused. Cleaner in, sharper out.
https://t.co/KUtaudjVRC