Casted a spell so only cool people see this profile. Did it work?
I post about AI, evals, hackathon projects, vibe coding and things that make me laugh.
Introducing Ideogram 4.0: the best open image model in the world.
Think it. Make it. Own it.
Download the weights, fine-tune on your own data, and run it on your hardware. Live on every Ideogram plan and the API today.
@contraben@ideogram_ai@GeminiApp@imagine@bfl_ai Contra Labs has assembled quite the team to test out and benchmark the models. This clearly shows that.
This update tempts me to go back to Ideogram now ;)
This is worth noticing.
Diffusion models routinely fail at mutually exclusive prompts. Red or blue balloons yield purple blends or dual objects. Cat or dog produces hybrids. Object counts collapse entirely. FineGRAIN benchmark already shows near-zero success on hard constraints.
I am calling it the Ice Cream Effect: the model outputs a single statistical average of multiple legal interpretations, satisfying none of the explicit constraints while preserving high aesthetic quality.
Formalize it with Constraint Preservation Score (CPS) as the fraction of prompt constraints met and Aesthetic Preservation Score (APS) for realism. Run either-or tests versus single-alternative controls across Stable Diffusion variants, GANs, and autoregressive models. Paired t-tests on CPS drops with stable APS confirm the averaging.
If you are building evals or prompting pipelines, this is the artifact worth stress-testing right now.
Thoughts?
Funny how the providers themselves are not the OGs in the industry.
The last paragraph speaks volumes. Every designer who stayed relevant while the world shouted about AI killing design. Every developer who felt dispensable after AI hit. Guess what? The best ones flourished anyway.
Bolt too will flourish. It is about progressing in the right direction.
This is worth noticing.
The key takeaway is the level of precision one can get just by focusing on task driven prompts. Models over time have shown how simple prompts can still deliver really effective results.
Gemini NB for me has always been the tweaking tool, and this study justifies its place in my workflow.
> be Bowie Knife99
> just a guy with an Xbox
> buy Forza Horizon 6 on launch day
> drive like you always do - like a lunatic
> ram, swerve, and pit-maneuver strangers into walls
> the Drivatar system silently records every felony
> uploads an AI clone of your driving to the cloud
> deploys hundreds of you into other people's races, 24/7
> within days, thousands cry about you on Reddit, X, and Steam
> the community calls you "the Herobrine of Forza"
> official Xbox UK tweets "Happy Bank Holiday Monday to everyone except bowie knife99"
> a fan opens a fake X account in your name to taunt your victims
thousands of players are at war with hundreds of clones of you. you don't know any of this is happening. happy bank holiday.
AI companies are raising millions.
Burning millions.
But too shy to spend a few thousand on proper product testing & refinement?
The irony is wild.