For nano-banana, steering towards structured thinking works better than an open-ended request to think:
"You must think at extremely great length, for several hundred words, focusing on having a structured and thorough thinking process, before returning the image. Think in the 1st person, and in terms of what the user has asked, then return the generated image."
The length adds some latency and cost here, but can make a big difference for spatial understanding β¬οΈ
Hybrid models are the new default, so recent non-reasoning models often share some post-training with their reasoning counterparts.
So they come up with higher quality reasoning traces than past models:
@MarcelFromMimic @ycombinator@garrytan programmer trying to learn more about design: is there a typography reason the text lines up like this?
(and how did you decide on indenting the subtitle)
@michellechen Is there any reason why Llama 4 was a priority over Gemma 3? Gemma 3 seems like a much better fit to the Workers AI architecture and value prop
(updated Gemma w/ LORA support would also add a lot more value to the feature than the current base models supported)
(more seriously, I believe the big winner in AI Wattpad won't be built on agentic flows. If you focus on one-shot performance you get content that isn't as shareable, but feels much more engaging than possible otherwise)
Spellbound's average sessions for AI generated fanfic are exceeding 1 hour because we've focused on post-training for 1 shot performance that gets the reader to turn the next page)