Small Teaser of our IO (Industrial Operator) 2.0
An AI system for Industrial Control Rooms. Version 2 going to production in early September.
Version 3.0 (2025 Q4) in the making will use a new "Omniscient feature" we are developing based on hierarchical Realtime agents, where the industrial process intelligence is distributed across an intelligent tree. Testing now with the new @OpenAI gpt-realtime... Teaser coming soon too.
@Teknium@jargoti20@NousResearch some people quickly blame the harness. its harness+Model we experience. many llms are not good enough for long horizon taks regardless of how good harness might be.
As expected, for most of the coding tasks we are asking, we dont see incredible different to gpt5.5-medium that is already solving mostly what we are typically asking for. However, in our IO long horizon operations, the differences appear in minutes. benching soon
so when they release Mythos, GPT5.6 etc. and some people start to say that they don't see the difference ... here you see the answer. trivial tasks and routine tasks are saturated, only people working on hard open-ended challenges will notice
Not liking this token-saving strategy from OpenAI when you paste content in ChatGPT. Basically they are creating a temp file with you content and then letting the model use tools to search, possibly get a summary etc. I want to have control over that optimization.
@_overment@plainionist so this would happen with any harness supporting MCP. Could be a bad MCP toolset implementation, or it could be that MCP server does wonders and it is worth to add when needed.
@_overment@plainionist is it? I would assume first the OP was surprise because even without using the tools the number of tokens was double than now. But this could be just because he was using MPC servers adding a lot of tools with extensive instructions to the context...
So, if we believe in exponentials, we should expect to have a Mythos-level model at <10usd/mtok before xmas,... right? 👀 people are not building for that, or are they?
maybe. but our programmers use much more than 200usd per month of compute for coding. i guess regardless of coding harness. even myself (that i had pretty much stopped programming before 2024) i need now tokens like oxygen :D I think we would hit similar cost, without the flexibility of switching model to best available if needed. migrate rules.
you might be totally right on a proper comparable setup blind test. but i guess other priorities than optimizing that cost. if cursor keep improving the value we generate is massive.
so when they release Mythos, GPT5.6 etc. and some people start to say that they don't see the difference ... here you see the answer. trivial tasks and routine tasks are saturated, only people working on hard open-ended challenges will notice
>>I gave a 10min voice note to Cursor, left to go eat dinner
we record 15-20 meetings including all relevant stakeholders (feature owner, architect, cyber, ux), then ask for a plan, review on the fly (mostly headlines), correct, run and test (95% success rate if discussion is rich enough). We call it EVC (Extreme Vibe Coding) session. Never going back to any other thing. Eliminated information loss from legacy processes.
@Cool_Goose in fact, maybe this is just explained by some regression in Opus4.7 compared to Opus4.5/4.6. So when 4.7 was internally released, probably many swapped model from 4.6 to 4.7 for trivial tasks. I assume this doesnt mean that Mythos was used for those and performed worse.
@_overment 60 inference any programmer can do in a day, no? I think jensen went to far when he mentioned the 250k usd/year per programmer, but 25k sounds maybe quite reasonable?