We’ve been researching new ways for ChatGPT memory to carry context across conversations and keep it useful over time.
Today, that work is rolling out as a more capable memory system in ChatGPT. https://t.co/0MyFKCe2Mu
a few months ago in an interview, someone asked me where we wanted to take the codex app. i answered with something along the lines of: we intend to make it the best app ever built for desktop, full stop.
it probably felt like a little much, but i meant it. and it feels less potentially hyperbolic now than it did then.
Codex is the best way to build software. it is now the best way to do many other things too. we will lean into both, and much of it will mean we blend a lot with ChatGPT. we will do this only when we can deliver something incredible and better than what the two separate current things can deliver alone.
we will also combine the best parts of cloud and local environments (and yes, windows+linux). and the best of instant responses and long-running objectives, like /goal. we will do so with Taste™
outside of the incredible gpt models, 3 things have made the Codex app what it is:
- an opinionated view of how agents should work with a high quality bar
- a tight and honest dogfooding loop
- you
those 3 things will continue to be P0.
LFG.
The next evolution of Hermes Agent is here!
Introducing Hermes Desktop: everything you love about Hermes, now native on your machine.
First demoed in Jensen's GTC keynote, it's now in public preview.
@pvncher@RepoPrompt Resend just works and the team there is awesome to interact with.
Congrats on the new role btw and really nice to see what you’re doing for everyone who supported RepoPrompt! More than most would I think
grok-build-0.1 is now available via the xAI API in public beta.
This is the same model that powers the Grok Build CLI and excels at agentic coding.
Priced at $1/m input and $2/m output, it’s extremely cost effective, intelligent, and fast.
You don't need 10 apps to run a software development process. You can do this all from one task board in Notion...
Triage → Agents capture all feedback and tasks from any source, then enrich, organize and coordinate it all.
Plan → Agents draft PRDs, do research, and pull it together with collaborative docs and AI meeting notes.
Build → Assign tasks (that have all the context already attached) to coding agents and track all in-progress work.
Reviews → Give feedback (to you agents or team) and approve work, assign another agent for a second opinion.
Ship → Agents write status reports, update dashboards, prep release notes. And cross-functional like sales and marketing team work their magic from the same place.
Full demo, when? SOON!
BREAKING:
Anthropic just dropped Opus 4.8—and it is a MONSTER
We've been testing for about a week @every and our verdict is they could've just called it Opus 5, it's that good.
Here's our vibe check:
- Beats GPT-5.5 on Senior Engineer bench. On our toughest benchmark Opus 4.8 scores a 63—a hair higher than GPT-5.5's score of 62, and a full 30 points higher than Opus 4.7. It tackled a ground-up rewrite of a production codebase, and actually built something that works.
HOWEVER: Coding performance varied a lot at different reasoning levels. We recommend using it on xhigh for best results.
- Incredibly good writer. Opus 4.8 scored a 79.6 on our writing benchmark—measuring models on real-world writing tasks we do all of the time like essay writing, promo email writing, and more. It beats GPT-5.5 by 6 points. It produces well-written prose with fewer "AI-isms". It's also very good at writing in your voice given the right context.
HOWEVER: Writing performance also varied with reasoning levels. Medium reasoning had higher incidence of AI-isms—we found best results with high.
- Beast at knowledge work. Opus 4.8 is very good at general knowledge work tasks like report creation, research and more. It produced the best PowerPoint one-shot we've ever seen on our deck generation benchmark.
- Emotionally intelligent, willing to question the frame. I've also found it to be quite good at talking through psychological or interpersonal issues. It has a high EQ, and it's also good at not glazing and helping to expand your perspective. Its thought process feels extremely rich and dynamic.
THE BAD:
These days a model is only as good as its harness, and Codex is still a far superior harness to the Claude Desktop app. This has kept me using Codex + GPT-5.5 as my daily driver, but I am flipping back and forth a lot more between Codex and Claude.
Anthropic is back baby!
Read the rest on @every:
https://t.co/vuORiDXkxX
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price.
Huge credit to the OAI team for solving the unit distance problem with 5.5 - it is now my go to example that models can in fact pull together disparate ideas into new discoveries.
As with all 4 minute miles, we had to try and cross it too! Turns out mythos solves it with a cute, simple proof. This implies some serious overhang in discoveries!
American Airlines has just officially announced that they are adopting SpaceX's @Starlink! American is the largest airline in the world by passenger volume (225 million), and 2nd in fleet size.
Installations begin Q1 2027. Over 500 of its narrowbody aircraft will get Starlink.
“As a premium global airline, we are continuously seeking out world-class partners like Starlink to deliver what our customers need and want,” said American Airlines Chief Customer Officer Heather Garboden.