@ThierryBorgeat it'd be better to avg across these. this may be a result of b200 availability. set of all usecases you want h200s for and don't really want h100s for highly intersects with the set of all usecases you want b200s for, and where b200s are massively better suited.
@boneGPT nothing fundamentally wrong with trying to automate compliance. the fraud part is the wrong part lol.
also, it’s not like they’re turning into a compliance startup
@sama today you can proxy this behavior with prompting, temp, and thinking, but it leads to superficial-feeling results, and is a pretty high dimensional search space for a feature that should be somewhat natively controllable.
would love to see better control of model fill-in-the-blank tendencies.
i think a lot of the pref split btwn opus and gpt 5.5 comes down to how much the model is inclined to fill in the blanks of a request.
in coding, this looks like opus is better at planning, and 5.5 is better at implementation. the reality is, 5.5 is a higher bandwidth channel from what the user said into execution, vs opus tends to separate the user and execution with it's own sort of projection over it, which fills underspecification with its own "taste".
i feel like this gets obfuscated with hallucination, and maybe more negatively selected for in OAI RL training.
in a perfect world, this is a dial, similar to thinking. with adaptive-defaults. giving users/systems a better way to segment agent behavior by task.