AI pricing is kind of insane if you think about it.
$200/month = $8,000/month employee
If you just spend a few days setting up and maintaining dedicated loops. You have the power to scale like never before.
The $200/m plans give you more than enough limits to have 1-2 models running 24/7. Plus, the models now are genuinely good and exceed the capability of an average worker.
What am I getting wrong?
This is still pretty good if you think about it.
The point is planning, not execution. You only need the smart model to think; a cheaper one can carry out the plan.
I have Fable 5 read the code and draft a tight plan (input tokens are cheap, and it catches issues other models miss), then hand the detailed tasks to GPT-5.5 or Opus 4.8 and loop. Works well. Fable 5 is better at planning than at coding anyway.
These restrictions will be lifted at some point too.
I ask Fable 5 to look at code extensively for issues and draft proper concise plans since input tokens are cheaper and Fable 5 catches things other models will never see. Then I give the detailed tasks to the GPT 5.5 or Opus 4.8 and loop. Works well.
Fable 5 is also really good at planning and other stuff that coding.
@fabian_rol59505@thsottiaux I was talking about issues with Codex and rate limits, but you have a point. I hope you find a solution to your problems.
Have you tried tools to manage context? I heard this tool is great.
https://t.co/XLGliQFkSC
@marcgmbh@elonmusk Imagine if the app had memory persistence and some kind of simulation to test how much other people would make or lose if they had that much money.
If GPT-5.6 lands in the same tier as Fable/Mythos, OpenAI might hit the same wall Anthropic did: a model strong enough that releasing it to the public becomes a problem, not just a launch.
The models are improving faster than the systems around them can keep up. The frontier keep growing, what we can actually use lags behind. Crazy times.
Imagine a linter, but for AI coding agents.
codegraph-mcp MVP3 adds a linter-style validation loop for agent edits: verified repo context, proof labels, blockers, warnings, unknowns, and recovery steps when the graph raises an error.
Designed to cut hallucinated edits by forcing agent claims through repo-grounded validation.
https://t.co/Law9edShh8
The worst thing an AI agent breaks is the thing nothing warns you about.
codegraph-mcp gives your AI a checked, accurate map of your whole codebase, so it works from what's actually there instead of guessing. A few tools do something close, but most just hand the AI text that looks related. codegraph-mcp proves what it hands over. Written in Rust, so it's fast.
It also has a linter for invented code. The moment the AI writes a function or import that was never there, codegraph flags it, like a spellcheck, but for made-up code instead of typos.
Now something finally warns you.
https://t.co/Law9edShh8
Anthropic might be the most chaotic frontier lab alive.
In the span of 3 months:
- Accidentally leaked all of Claude Code via an npm slip
- Pentagon-blacklisted for refusing to make AI weapons
- Shipped their most powerful public model ever, only for a government export-control order to pull it offline worldwide three days later.
The 'safety lab' keeps making the wildest headlines.
I usually task a new model with something that the previous model couldn't do (Involves multi-sequenced and difficult tasks), then I grade based on how close it got to the goal and the quality of the task completed.
5.5 is the first model that can literally do anything. It is only limited by user capability and phrasing prompts so model can infer actual intent from just your words.
For the most part, i've noticed these models getting better and better, no regressions so far. But the models still lack 'intelligence'. Opus 4.8 seems like it has this intelligence, but I need to use it more. Definitely not complaining though.
I was ready to pivot to cheaper models and a new agent-driven development workflow, but they’ve kept me hooked. I hope smarter, cheaper models are coming. Paying high costs for 'somewhat intelligent' isn't sustainable if I'm continuously wasting time & compute on avoidable errors.
Running SWE-bench on codegraph. Going to see real results and feedback to further improve the tool itself.
I love that I can have Codex setup and do all this end to end while I watch and manage. Automated research is crazy.