@zhengyaojiang Where are the prompts you used?
I assume you didn’t make the animation and not sharing the prompts?
Or did you just waste our time?
If you don’t publish the prompts it was complete waste of time because folks at labs will just add this capability in the next model release.
@swyx The main point here is that Gemini Ultra 3.1 from 6 months ago used the same number of flops. This shows that raw flops makes no difference. It is like saying that Sun has lots of energy, yes, bit it has less intelligence than a mosquito:)
@ajvazan@davideciffa All big businesses use cloud services, devices manufactured abroad and tap water.
For LLMs both OpenAI and Anthropic have ZDR (zero data retention) on by default. OpenAI further (unlike Anthropic) does not forward data to Google Search, they have their own web index.
@ajvazan@davideciffa Local AI makes no sense anyway, so ride the wave by using OpenAI models, they are currently cheapest and most powerful, everything else is slowing you down
@kinglycrow@dexhorthy Just be careful with “you’re absolutely right” and “brilliant idea” vs Codex which never compliments you, but just has way more raw power:)
@LLMJunky@nummanali Harness is just a single prompt to Codex to implement.
Kids these days cobble together hardware demos in clubs and voice prompt “also check latest OpenAI/codex architecture and implement this here, in Rust”.
@thomasrice_au What a weird statement.
If model can’t do something useful the right path is to ask your coding agent to add datagen and verifier configs (and Slack human evals project manager) and ask your bot to babysit the next mini, small, mid and big runs and tell you early what’s wrong.
@reach_vb@Dimillian The energy drinks are just coffee and sugar, for brain function much better is very dark chocolate. Now with agents you actually don’t want to be awake, you want maximum creativity within 24h, even if it is just a single voiced in prompt.