@bryanprimu@radjathaher@tibudiyanto@morpig OCBC dan BCA pak. yang BCA sudah datang langsung ke cabang untuk pembukaan rekening. Dilayani dengan baik selama di cabang tapi setelah itu ketika saya follow up tidak ada info lagi. Tidak dijelaskan juga kekurangan apa yang harus saya lengkapi.
how did we fix the ai design slop problem in llms - DeepSeek/Kimi/Qwen or Claude/GPT?!
i've been thinking about "why do all ai-generated designs look the same?" is it a model problem or a harness problem?
context: we're fixing the llm design problem with `/design` for @CommandCodeAI - atm it has 16 modes, 24 reference documents, ~4,500+ lines of encoded design taste from some of the best designers in the world. it reads your codebase, identifies what's broken, and edits real files. no figma. no markdown mockups. the output stops looking like ai slop.
i've been staring at ai-generated uis for a while now and noticed something that i think is underappreciated: llms can write css fluently but have essentially zero design taste. and the failure mode is not random, it's a very specific, very small distribution.
let me explain.
when you ask a model to build a landing page, it reaches into the mode of its training distribution. the mode of all landing pages on the internet is: centered hero, gradient text, glassmorphism card, three identical feature tiles, indigo accent, Inter font, bounce animation. this is the "average website." the llm is doing exactly what we trained it to do - predicting the most likely next token given "build a landing page." the most likely landing page is the average landing page. the average landing page is mediocre by definition.
this is not a capability problem. the model knows oklch(). it knows prefers-reduced-motion. it knows golden ratio. it knows how to set a 65ch measure. it just doesn't know when to use these things, because "when" is taste, and taste is not well-represented as a statistical prior over internet css.
so we thought what if we gave every llm a design taste with `/design`. here's what we found:
1/ the failure design dataset is surprisingly small.
we talked to a bunch of designers with great design taste and asked them to label AI-generated UIs. what are the tells? turns out there are basically ~10 and they account for ~90% of the "this looks AI-generated" signal:
- tech gradient (blue-violet glossy energy on everything)
- generic tech hue (indigo because "software" not purple btw)
- feature tile grid (icon + heading + sentence x N, all equal weight, nothing prioritized)
- accent rail (colored stripe on card edge = decoration pretending to be organization)
- unearned blur (glassmorphism without a depth system)
- stat monument (oversized numbers filling space where a product story belongs)
- icon topper (rounded-square icon above every heading as template filler)
- bounce everywhere (elastic easing because the API has it, not because it's purposeful)
- default type (whatever font the training distribution likes this year)
- center stack (everything centered because no composition decision was made)
this is super similar to what we see in other llm tool failures. tool calling errors? 4-16 types. fixing that made deepseek outperform opus 4.7, i wrote about that before!
so i started researching maybe a dozen common patterns are design tells? 10. the failure distribution is narrow and we could repair ai design. this means it's a tractable and deterministic problem. `/design smell` hunts all these and scores severity on a /10 scale.
2/ the deeper problem is compositional, not cosmetic.
the more interesting thing i found was that most of these tells are symptoms, not causes. the actual bug is that the model chooses layout before it chooses purpose.
a dashboard and a landing page have completely different jobs. a dashboard is a Monitor surface - status, alerts, metrics, live data. a landing page is a Decide surface - proof, risk reduction, one clear action. these need fundamentally different spatial compositions. but the LLM reaches for the same centered-hero-plus-cards layout for both, because that's the mode of the training distribution.
so we built work-pattern-first composition. before the agent touches any visual property, it must identify which of 7 patterns the surface serves:
- Monitor: status boards, alerts, metrics, live priority
- Operate: command bars, canvases, inspectors, direct manipulation
- Compare: tables, matrices, split views, ranked lists
- Configure: grouped settings, forms, previews, commit areas
- Learn: article flow, walkthrough rhythm, progressive sections
- Decide: focused pitch, proof, risk reduction, one dominant action
- Explore: search, filters, maps, galleries, reversible discovery
this is essentially chain-of-thought for design - force the model to reason about the *purpose* of the layout before generating the layout. i think there's a general lesson here. when an LLM is generating something compositional (code, UI, writing), forcing it to commit to a structural frame *before* generating tokens within that frame helps a lot. it's the same reason chain-of-thought helps with math. you're reducing the entropy of the generation by conditioning on a high-level plan.
this single constraint eliminated more generic-looking UIs than any aesthetic rule we wrote.
many phenomenal skills exist in the space, i bet they had the taste for great design but didn't know they were fixing the chain-of-thought problem instead of the style problem. i think that's why their skills are super loopy instead of being reliably good.
3/ validate-then-repair, again.
my first version tried to audit and fix design simultaneously. this what many design skills do and fail. it's the "preprocess" approach and it fails for the same reason it failed in tool calling: you're encoding a prior about what's broken, and you get false positives that silently corrupt things. it would recolor something that needed relayout, or polish typography on a composition that was fundamentally wrong.
the thing that worked: separate diagnostic from treatment, but make them a mandatory pair.
audit modes (`checkup`, `smell`, `review`) produce structured reports. treatment modes (`redesign`, `relayout`, `recolor`, `typeset`, `motion`, `interaction`, `responsive`) consume those reports before making changes. the audit localizes the problem. the treatment mode only spends "repair budget" where the audit actually disagreed.
same shape as tool calling repair. let the design system complain first, then fix only what it complained about. the validator does the localization work for you. cheap-then-careful, fast-path-then-evidence. i keep seeing this pattern everywhere.
treatment modes don't just do report cleanup. they run their own full pass after absorbing the report. the report is more context, it's not a todo list.
4/ why oklch() color fn matters for llms
personally, i always struggled a bit with the oklch() css fn but llms understand it super well.
this one is fun. llms default to hsl because that's what's in the training data. HSL lightness is perceptually nonlinear - hsl(60, 100%, 50%) (yellow) and hsl(240, 100%, 50%) (blue) have the same L value but look completely different to a human eye. so when the model tries to build a "consistent" palette by keeping L constant, the result looks wrong in ways the model can't diagnose from the css alone.
oklch has perceptually uniform lightness. this means the model can reason about color mathematically and have the result match perceptually. equal steps in the number space produce equal steps in the visual space. it's the right abstraction for an llm to work in, because it makes the optimization landscape smooth small changes in the parameters produce small changes in the output. hsl has cliffs and plateaus everywhere.
i think this generalizes: when you're designing an interface for an llm to work through (whether it's a color space, a schema, or an api), choose representations where the distance in parameter space correlates with the distance in output space. the model optimizes over parameters. if the mapping from parameters to outputs is nonlinear and full of discontinuities, the model will struggle even if it "knows" the right answer in principle.
we go further: the agent picks emotion before hue. calm vs urgency vs trust vs momentum. then it builds the palette in oklch with constraints - clamp chroma at lightness extremes, tint neutrals toward brand hue, 60-30-10 distribution. the agent can't default to indigo. the system requires a reason before a hue. no more indigo slop. and it's indigo, not purple.
5/ state coverage is the most honest metric.
the most quantitative signal we found: count the number of interaction states per component. a human designer ships 7-9 states (idle, hover, active, focus, loading, empty, error, disabled, overflow). an AI agent ships 1-2 (idle, maybe hover). this is a clean, measurable proxy for design quality that requires zero subjective judgment.
we just... count. does this button have a focus state? does this form handle empty? does this list handle overflow? the median AI-generated component has 1.5 states. the median human-designed component has 6+. roughly an order of magnitude. the gap is enormous and trivially detectable.
6/ a meta-observation beats an infinite loop.
the biggest failure mode of AI design tools i found is you detect problem → attempt fix → the fix creates a new problem → attempt fix → loops forever. the agent re-runs the same mode hoping for a different result. it never converges.
we solved this by reward model written in plain English. after each mode completes, the system recommends 2-3 specific next modes:
redesign → checkup, review (validate the change)
smell → finish, refine (fix what was found)
recolor → responsive, motion (test viewports, add transitions)
finish → typeset, recolor (fine-tune the details)
the flow is: build → audit → refine → style → frontend → ship. the agent knows what to do next instead of re-running what it just did. this is a trivial intervention - a lookup table, basically but it eliminated the looping problem almost entirely which is super common in most design skills out there.
7/ truthful completion is the hardest constraint.
the most insidious AI design behavior: claiming work that isn't visible. "added hover states" when no hover CSS was written. "improved spacing" when margins didn't change. "enhanced motion" when no keyframes exist.
every mode has a "bar" - the minimum visible change required for the mode to count as complete. `typeset` must change body text, heading scale, labels, button text, form text, metadata, and responsive behavior. changing only the hero headline is not enough. `motion` must add animation to at least 8 transition moments. changing one easing value is not enough.
the agent can't claim "motion improved" because it changed a duration from 200ms to 250ms. the user must be able to see new or clearly better behavior. this is surprisingly hard to enforce and the single most important quality constraint in the system.
8/ finally here's my meta-observation about design taste in general
what we built is basically a reward model for design, implemented as structured english instead of a neural network. it defines what good looks like across 24 reference documents, gives the llm a rubric, and lets it self-evaluate. the 10 smells are negative rewards. the 9 states are a completeness check. the 7 work patterns are a structural prior. i'm sure this will grow.
this is taste engineering in the limit. you're not writing instructions. you're writing a curriculum. the model already has the capability (it can write any CSS). what it lacks is the policy on when to use which capability, and what "good" looks like.
i find it interesting that the policy is so compact. ~4,500 lines to encode "design taste" well enough that the output passes designer review. that suggests taste (at least for UI design) is lower-dimensional than it feels. it's not an infinite space of subjective preferences. it's a finite set of principles, applied consistently, with a small catalog of common violations.
the model didn't change. we told it what good taste looks like. same lesson as tool calling: "capability gap" is usually "contract gap." the model knows how to write css. it just hasn't been told what good css looks like for *this specific surface*.
i now believe that different llms have different baseline design capabilities, but it's your coding agent, the harness, that makes the difference in the end. the model didn't get better at design. the harness taught it what designers actually look for.
i'm sharing my learnings so every harness out there can benefit not just our agent.
try it yourself with what we built in Command Code.
`npm i -g command-code && cmd` then `/design smell` on any project. read the md or html report.
i care about design more than most engineers do, and seeing this work feels super good.
a lot of what looks like a model capability gap is actually a contract gap. fix your harness. design slop is your "coding agent skill issue," not the model's.
🚀Rolldown 1.0 is here!🚀
Rust-based high-performance JavaScript bundler.
🏎️ Runs at native speed that’s 10~30x faster than Rollup
🤝 Compatible with existing Rollup & Vite plugins
⚡The underlying bunder for Vite 8
After 2 years, Rolldown is officially stable and has 20+M weekly downloads. Companies like Framer & PLAID are already using Rolldown in production.
Thank you to every contributor, user, and team that helped us get here.
@radjathaher@tibudiyanto@morpig Daftar payment gateway mesti pakai rekening perusahaan. Saya bikin rekening buat PT perorangan ditolak oleh bank (sudah 2 yang tolak). Sekarang belum lanjut lagi jadinya
In celebration of @rolldown_rs 1.0 🎉
Announcing `comptime` — a Zig-inspired build-time evaluation primitive, exposed as Vite and Rolldown plugins
This allows you run code at build time, replacing the call site(s) with the evaluated output value.
https://t.co/myKeJMvDBP
Giving away @CommandCodeAI Max subscription to someone at random who follows me and Command. RT.
That’s more than 5 billion tokens of DeepSeek v4 pro.
In 24hrs. LFG!!
Read the eng deep dives below, good for all not just us.
Hati tenang sejak pakai envcrypt, mau ada .gitignore, .claudeignore, .cursorignore, apapun itu the probability .env-mu kebaca agent masih sangat besar, apalagi kalau kamu Web3 dev biasanya ada private key senilai rumah subsidi.
Ini real key aku ss langsung dari .env ku.
Happy to announce TSRX. Think it as the spiritual successor to JSX.
We extracted it from Ripple, and made it framework agnostic. It can compile to React, Ripple and Solid, other frameworks to come soon.
It's a TypeScript superset language, with a parser, compiler and a selection of plugins for editors + Prettier + ESlint, etc
It's early alpha but we thought people might be interested in it. 🧵
Stay ahead
Today we're announcing Paper Snapshot
Snapshot your live website and paste it into Paper as editable layers
• start from your real site
• no more screenshots
• uses real html/css
What will you make? Link in replies 🎶
My dear front-end developers (and anyone who’s interested in the future of interfaces):
I have crawled through depths of hell to bring you, for the foreseeable years, one of the more important foundational pieces of UI engineering (if not in implementation then certainly at least in concept):
Fast, accurate and comprehensive userland text measurement algorithm in pure TypeScript, usable for laying out entire web pages without CSS, bypassing DOM measurements and reflow
Right now. But I think that may be eroding quickly.
We're sort of assuming that AI is going to keep writing code like we do. But why should it? It doesn't need all of these frameworks that are supposed to make it easier on the human brain.
It doesn't like abstractions. It would prefer that everything just be in one giant file - the only reason we can't do that is because of context window limits.
It doesn't care about variable names. It doesn't need comments. It would prefer if there was logging just absolutely everywhere for testing output.
Point being that in the very near future it's likely we'll have language/frameworks - whatever you want to call it, that are specifically built for LLMs, not for humans. And then we won't know how to fix them.
Part of me feels like this is the last missing piece - the languages and the frameworks that AI needs to work the way that is best for it, and not the way that is best for us.
This is the first AI cut.
And it will send shockwaves.
Remember: Jack is one of the greatest founders of all time. He created this platform that we’re all on, and has been early to many technological shifts. And Block was doing very well as a business.
So, for him to cut 40% of headcount in this way is a signal to everyone in tech: get good now. Become indispensable. Work nights and weekends. Learn the AI tools and raise your game. Or you might not make the cut, as an employee or as a company.
I know. That sucks. But capitalism is natural selection. The market is unforgiving, because you are the market. After all, it’s not like you’re buying some random gallon of milk from the store; you’re always buying the best product at the best price.
So too for apps: your customers are always installing the best piece of code they can get. And because AI is going to create new winners, if you aren’t the best in your market, someone may become better with AI. Particularly with the new agentic workflows.
To be clear: Block’s severance is generous by any measure. 20 weeks of pay, six months of health insurance and vested equity, all of that goes far beyond any typical package. Jack did his level best to cushion the disruption. The laid off are a temporarily unfortunate class, as opposed to a permanent underclass.
But had he not leaned into the AI transition, he might have had to lay off more people, slowly, and over time, as faster competitors went after his market share.
How would they do that? Sure, AI isn’t a panacea by any means, but the closer you are to software engineering the more aggressively you need to embrace agentic workflows. The AI companies are already doing that, and places like Stripe, Shopify, Coinbase, and now Block are pushing hard on this area.
There will be overcorrection. But the fundamental technical innovation is real. And you need to either disrupt yourself or get disrupted.
Halo temen-temen,
Just finished my another writings.
Tipis cuman 20an halaman.
I do share my experience, practical tips.
What you as Junior can do to survive.
Semoga ini bisa jadi trigger what you can do next. :)
Feel free to read here.
https://t.co/aoFG3FQRcj