It’s time to fly!
Excited to share the first short brand film for Codex. Catch it airing during Game 1 of the NBA Finals tonight.
https://t.co/1J4Epczj8T
@aiedge_ These benchmarks are less and less useful as the models get more advanced and as the benchmarks become polluted. Best to build your own evaluation process, starting with something pretty lightweight.
This is wild.
OpenAI just dropped Codex Sites.
Now anyone can give it a plan, dashboard, launch doc or idea, and turn it into an interactive app with a URL.
5 wild examples:
It’s super interesting to watch and not unique to this. You can do this via prompting with any agentic tool like this as long as you give the agents a shared communication layer and permission.
A small amount of this coordination is absolutely essential when you have multiple agents working on long-running and adjacent work
@simonw The per tool is the interesting part for me. This looks line they are actively encouraging people to experiment with a range of tools. The engineer who uses that full budget on Codex, CC, and Cursor is going to get 3x the productivity boost
@AnthonyBerlin@thsottiaux For anyone who ran into this and has stuck threads or sessions now:
1/ click the broken session and copy session id
2/ start a new session and ask it to recover context for that session id and complete whatever that session was working on. It’ll even resume goals
@karatzas_thomas Local TS workflows feel like the right direction. The debugging loop is tighter when the agent is running on the same machine as your editor.
@luchian_mvp@fastifyjs@trpcio@honojs@middleapi@DrizzleORM@PostgreSQL@Docker@Minio That makes sense because it gives the model a clear goal to work/iterate towards. With this approach you can even use /goal loops to get all the way to a finished product autonomously.
Just be careful if you want the API endpoints to work well for other uses as well.
@Layton_Gott Better would be to set yourself up so you can use all of those at the same time and things like skills and agents are setup to work across whichever you are using. No need to have to migrate each time you want to try a different model
This is a big one. We use this internally and it’s been amazing to see the things people have created and what being able to easily build and deploy so easily really unlocks.
And this is only the beginning for this feature.
Building apps has never been easier.
With Sites, Codex can turn your work, ideas, and plans into an interactive website or app your team can explore, use, and share with a URL.
Rolling out to Business and Enterprise plans, before expanding more broadly.
@StepFun_ai@kilocode curious what the multi-step part actually changes on a real bug fix. does it mostly help with the planning loop or the tool calling reliability?
@KaiXCreator you get more done, but without the boring parts. so you're spending all of your time on the mentally taxing parts, the parts AI can't do for you.
you get much more done, and the net mental tax is lower but more densely distributed.
and it is also so much more satisfying
@Taniyatweets_ writing code by hand won't be useful for long. being able to understand *how* code is written and how it works will be, though. and engineering as a skill is still wildly important to steer things in the right direction, prompt properly, etc
taste, though, is the real thing
@DanKornas In general, runbooks/playbooks with slim INDEX.md as the router is the pattern I'm having loads of luck with right now. And Codex is pretty solid at keeping them up to date for me
It really depends on skills installed and how you prompt. I never see three paths when I'm using Codex. I finish my prompt, move on to other tasks, and come back to something worth trying out and giving feedback on.
Superpowers & GSD push for options by adding that to your prompt.