This is really big news. Google introduced the Open Knowledge Format (OKF) - a standardized way to store information in a directory of markdown files. Makes it really easy to make a digital brain that agents can use.
These files can serve as a living wiki. You can give agents the ability to query them or edit them. They can interlink.
Seems to me this could replace Notion or Obsidian. I can think of so many uses for this.
Google's blog post: https://t.co/DqSjg4UpvH
An easier to understand explanation is the SPEC.md file:
https://t.co/A3qSz3Tfas
I gave those two links to Antigravity and asked how we could use it for any of the projects we're working on. It came up with so many ideas. I would imagine Claude Fable 5 would whip up some pretty amazing things based on this system.
Currently creating an OKF library of our pepper garden. It's going to be a fun weekend.
Introducing Vocs v2: a minimal docs framework designed for agents and humans.
Flexible docs that stay simple at the source, rich in the browser, and easy for agents to consume.
$ 𝚗𝚙𝚖 𝚒𝚗𝚒𝚝 𝚟𝚘𝚌𝚜
first impression of claude 4.8 is it's extremely convincing but still a slopus. tried it to criticize a new project and it identified it fell into a local minima and invented a new parser for when we could've used ast.
almost convinced me, glad i checked myself that ast is not emitted in older versions of the compiler we are targeting. codex chose a gnarly but ultimately justified approach. claude didn't bother to verify any of its claims and has used absolutist language like "delete https://t.co/zuys0EhoHP", which is basically 80% of the codebase.
when presented with evidence:
> That contradicts my earlier byte-count check, and it matters enormously
> My earlier "v0.2.9" was a double false-positive (a git log -S hit on an internal symbol, plus a verification grep that mis-read a VersionException as success). Corrected in the review with a note owning the error
the biggest bullshitter model in the world! if you rely on claude for anything, god help you.
Open Sourcing Centaur: Multiplayer, self-hosted, secure agents for Slack.
Centaur has been transforming how @paradigm and @tempo invest, build and research.
Now you can run it yourself on infrastructure you control. Instructions below.
Introducing Files SDK
A unified storage SDK for object and blob backends. One small, honest API. Web-standards I/O. An escape hatch when you need the native client.
→ 18 providers - S3, R2, Vercel Blob, Google Drive, etc.
→ upload, download, head, delete, copy, list, url
→ Works everywhere - Node, Bun, Deno, edge runtimes, browsers
→ Tools for OpenAI, Vercel AI and Claude Agents SDKs
Introducing Mirage, a unified virtual filesystem for AI agents!
6 weeks. 1.1M+ lines of code. We rewrote bash from the ground up so cat, grep, head, and pipes work across heterogeneous services. S3, Google Drive, Slack, Gmail, GitHub, Linear, Notion, Postgres, MongoDB, SSH, and more, all mounted side-by-side as one filesystem.
Bash that AI agents already know works on every format! cat, grep, head, and wc parse .parquet, .csv, .json, .h5, even .wav! One pipe can stitch S3, Drive, GitHub, Slack, and Linear together, same Unix semantics throughout.
Workspaces are versioned too. Snapshot, clone, and roll back the whole thing with one API call. A two-layer cache turns repeated reads into local lookups, so agent loops stay fast and cheap.
Drop a Workspace into FastAPI, Express, or a browser app. Wire it into OpenAI Agents SDK, Vercel AI SDK, LangChain, Mastra, or Pi. Run it alongside Claude Code and Codex.
Site: https://t.co/zo1orc2wA9
GitHub: https://t.co/zeRAKri7I9
#AIAgents #OpenSource #AgenticAI #Strukto #Filesystem #VFS
Today we're releasing ZAYA1-8B, a reasoning MoE trained on @AMD and optimized for intelligence density.
With <1B active params, it outperforms open-weight models many times its size on math and reasoning, closing in on DeepSeek-V3.2 and GPT-5-High with test-time compute. 🧵
Introducing Hermes Curator!
The new system built in to Hermes Agent now helps you keep your skills that the self improvement loop creates in check, by consolidating and pruning automatically.
The curator does multiple things:
- keeps track of how often you use each skill, when it was last updated/created, etc
- Once a week runs automatically (configurable)
- Uses the analytics plus it's own scanning of your skills and consolidates or prunes them if necessary
- Skips externally installed skills, built in skills, and skills you "pin" that you dont' want touched. It will only attempt curation over agent created/updated skills or user written skills.
- It will then determine whether skills can be consolidated, pruned, or otherwise made more manageable. It will convert some skills that are too specific into references, templates or scripts for larger/broader skills, or integrate them directly into a consolidation of an existing skill.
You can also disable it entirely in the config.yaml and/or run it manually with `hermes curator run `
Learn more on the docs here:
https://t.co/6woLLRtDLP
A new feature sneaked in the Codex app’s latest update. You can now do /side (or use the ... menu) to spawn a side chat! Useful when you're deep in a thread and want to have a side question in the current context!
openai built a model that HIDES personal data in text so nothing leaks
i flipped it INSIDE OUT
same 1.5B weights, same label taxonomy, but instead of masks you get structured spans, name, email, phone, bank account, address, secrets, char offsets and all
point it at logs, dumps, stolen inboxes and it just... returns every private thing in the pile
We're open-sourcing Cua Driver - our new macOS driver that lets any agent (Claude Code, Codex, your own loop) drive any app in the background, with true multi-player and multi-cursor built-in.
1/8
GPT-5.5 takes OpenAI back to the clear number one in AI. OpenAI’s new model tops the Artificial Analysis Intelligence Index by 3 points, breaking a three-way tie with Anthropic and Google
OpenAI gave us pre-release access to test all five reasoning effort levels: xhigh, high, medium, low and non-reasoning.
➤ OpenAI topping five headline evaluations: GPT-5.5 (xhigh) leads Terminal-Bench Hard, GDPval-AA and our newly hosted APEX-Agents-AA. The model trails only other OpenAI models in CritPt and AA-LCR, and comes second to Gemini 3.1 Pro Preview on three additional evaluations. The largest gains are on AA-Omniscience (+14 pts), our knowledge and hallucination benchmark, and τ²-Bench Telecom (+7 pts), a customer service agent benchmark.
➤ 20% more expensive to run our Intelligence Index: Per-token pricing has doubled from GPT-5.4 to $5/$30 per 1M input/output tokens. However, a ~40% token use reduction largely absorbs the hike - resulting in a net ~+20% cost to run our Intelligence Index.
➤ Effort a clear ladder for balancing intelligence and cost: GPT-5.5 (medium) scores the same as Claude Opus 4.7 (max) on our Intelligence Index at one quarter of the cost (~$1,200 vs $4,800) - although Gemini 3.1 Pro Preview scores the same at a cost of ~$900. GPT-5.5 (low) approximates Claude Opus 4.7 (Non-reasoning, high) on our Intelligence Index at half the cost to run (~$500 vs ~$1 ,000).
➤ Number one in GDPval-AA with an Elo of 1785: GPT-5.5 (xhigh) leads Claude Opus 4.7 (max) by ~30 pts and Gemini 3.1 Pro Preview by ~470 pts. GDPval-AA is Artificial Analysis’ benchmark that leverages OpenAI’s GDPval dataset to evaluate models on real-world economically valuable tasks.
➤ Top AA-Omniscience accuracy, but trailing the frontier on hallucination: Our private AA-Omniscience benchmark rewards factual knowledge across diverse topics, but punishes hallucination. GPT-5.5 (xhigh) has the highest accuracy at 57% - meaning the model can recall facts in the Omniscience corpus more effectively than any other model. However, it has a hallucination rate of 86% - vs Opus 4.7 (max) at 36%, and Gemini 3.1 Pro Preview at 50%. This makes it more likely to answer a question when it does not ‘know’ the answer. The 14 pt gain in AA-Omniscience from GPT-5.4 (xhigh) was largely driven by knowledge, with a modest improvement in hallucination.
Congratulations to the team at @OpenAI and @sama on the launch
Introducing GPT-5.5
A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done.
Now available in ChatGPT and Codex.
We just released code for Meta-Harness! https://t.co/OdU7zocdPl
Aside from replicating paper experiments, the repo is designed to help users implement good Meta-Harnesses in completely new domains! Just point your agent at ONBOARDING.md and have a conversation
Our existing $200 Pro tier still remains our highest usage option. And as a thank you to our existing Pro users on the $200 tier, we’re extending our 2x Codex usage promo (until May 31st) and we’ve reset your Codex rate limits (yes, again).
Say hello to agentOS (beta)
A portable open-source OS built just for agents. Powered by WASM & V8 isolates.
🔗 Embedded in your backend
⚡ ~6ms coldstarts, 32x cheaper than sbxs
📁 Mount anything as a file system (S3, SQLite, …)
🥧 Use Pi, Claude Code/Codex/Amp/OpenCode soon