@heykahn We just released the 2026 English–Chinese Simplified Localization Benchmark — 774 outputs, blind evaluated by independent linguists. Some surprising findings. Like to get your take.
@heykahn Three findings that flip conventional wisdom:
ByteDance's Doubao (67.4) beats ChatGPT (61.4) and Gemini (50.8) on Chinese content
Adding human post-editing to Western LLM drafts lowers marketing quality (54.6→53.7)
@heykahn
@andrewztan For e-commerce teams relying on automated translation for product reviews: raw MT scores 33.3/100 for UGC. That's a direct, measurable liability for conversion-critical content.
@andrewztan On Xiaohongshu and Weibo, GPT and Gemini score 84.7 for cultural authenticity. Chinese LLMs score 80.6. Training data hypothesis explains it — and it inverts the 'local models for local content' assumption.
@andrewztan The human editing paradox: editors spend their effort fighting the draft instead of refining it. The wrong model doesn't just underperform — it creates active drag on the workflow.
We just released the 2026 English–Chinese Simplified Localization Benchmark — 774 outputs, blind evaluated by independent linguists. Like to hear your take @andrewztan
@andrewztan Three findings that flip conventional wisdom @andrewztan :
ByteDance's Doubao (67.4) beats ChatGPT (61.4) and Gemini (50.8) on Chinese content
Adding human post-editing to Western LLM drafts lowers marketing quality (54.6→53.7)
Hiring a human to improve ChatGPT's Chinese marketing copy made it worse.
Not slightly worse.
Worse enough to score the same as using no AI at all.
We ran 774 tests.
And that wasn't even the biggest surprise.
Full study tomorrow.
Which LLM wins for Chinese translation? 🇨🇳
The answer: It depends.
- Qwen: Marketing copy
- DeepSeek: Technical docs
- Gemini: Creative UGC
- ChatGPT: Misses the local market
Your content should dictate your model. Full study drops this Friday with @EC_L10n & @JademondDigital!
@JulianGoldieSEO Qwen is better than you might have thought. This Friday I will be dropping free access to an LLM vs Human localization study. Would love you to review that! @JulianGoldieSEO
@iwhaleocean How about a Douyin or Rednote channel to reach more Chinese users? WeChat content usually only get pitched to our followers, but Rednote and Douyin content can land in anyone's news stream
In marketing, UGC, and creative product content, this hybrid workflow consistently wins.
AI isn't replacing the translator's taste—it's scaling it.
What's a "fact" in your field you've had to unlearn recently? (3/3)
I was wrong about AI translation for 2 years.
I thought: "Raw LLM output will never match humans for high-stakes creative content."
Still true. But I missed the workflow flip:
LLM Draft + Human Refinement > Pure Human Translation.
Why? (🧵 1/3)
When translating from scratch, a human fights the blank page—wasting energy on raw generation.
With an LLM, the human becomes a Creative Director. The AI provides the raw linguistic clay; the human curation injects the soul, nuance, and local humor. (2/3)