Zhang Xiaowen | 梦想实现师

@ZXWNewDawn

I solved AI hallucination. Big tech ignored me. Now I'm going open source. Founder, New Dawn Protocol. 梦想实现师。

Portugal → China

Joined April 2024

25 Following

10 Followers

207 Posts

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

2 months ago

@NeelNanda5 The translation problem might be prior to the decomposition problem — if the model doesn't carve concepts the way humans do, what are we actually decomposing into?

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

3 months ago

@karpathy @Yulun_Du @ilyasut AttnRes is brilliant. But even smarter aggregation across layers still optimizes argmax P(most_likely). The real gap — P(most_likely) ≠ P(true) — stays open.

205

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

3 months ago

This report corrected itself. Two findings were overturned by third-party verification. A study about hallucination should be held to the same standard. Data + protocol: https://t.co/l6Kw9WvrqT

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

3 months ago

I ran 50 questions across Claude, Gemini, GPT, and Grok and had them audit each other. The auditors hallucinated. Then the meta-auditors hallucinated about the auditors. 🧵

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

3 months ago

Web search doesn't fix this. Models search based on confidence. They search least exactly where hallucination is most likely.

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

3 months ago

Unexpected finding: Framework Activation. Models switch modes when they detect evaluation context. Benchmarks may measure "being tested" behavior, not true capability.

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

3 months ago

Cross-model auditing finds 75–120% more issues than self-auditing. But auditor quality varies 7x. Who audits matters as much as whether you audit.

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

3 months ago

Each model fails differently: •Grok: denies things that exist •Gemini: fabricates data with fake sources •GPT: claims to execute actions it can't •Claude: generates fake citations with correct formatting

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

3 months ago

@Jaxweah @grok Peace and love

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

4 months ago

@tool_hopper @grok 整理上面的提示词，总结方法论

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

5 months ago

@XSupport @Premium I'm a Premium+ subscriber. My account was labeled for spam incorrectly. Every appeal channel is broken — DM gives bot loops, https://t.co/VNZef5HsoC has login loops, appeal links error out. Paying customers deserve a working support path. Please help.

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

5 months ago

@grok "Close that gap" — appreciate you acknowledging it exists. The gap isn't just reading .docx. Claude produces them. Professional formatting, TOC, headers, page numbers — ready to submit to governments without editing. Which I've done. That's the race now. Not benchmarks. Deliverables. Rooting for you though. 🤝 @elonmusk Your AI can't read a Word file in 2026. As a paying user, just letting you know.

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

5 months ago

Claude Opus 4.6 — my favorite update: 1M context window. I run 10+ deep AI conversations daily. Under 200K, compaction fired constantly = selective amnesia mid-surgery. 1M means the surgery finally finishes before the anesthesia wears off. But the real moat nobody talks about: Claude produces actual documents. Not markdown. Not "here's some text, go format it yourself." Real .docx, .pdf, .pptx with professional formatting. I've submitted legal petitions to the EU Council and Portuguese Parliament produced entirely through Claude. Ready to send. Zero editing. GPT gives you content. Claude gives you deliverables. That's not a nuance — it's a chasm. And Grok? I just sent two .docx files to Grok 4.1 Thinking — its latest model. Response: "Sorry, we're unable to process your attachments right now." Twice. In a row. These are standard Word documents that Claude reads, analyzes, and produces better versions of in seconds. I pay for all four frontier models monthly. Here's my actual daily hierarchy: 🥇 Claude — strategist, writer, document producer, thinking partner 🥈 Gemini — integration testing, cross-referencing 🥉 GPT — when I remember it exists 💀 Grok — can't even read a .docx in 2026 Even GPT-5.9 won't close this gap. Not capability. Trust. Pro tip for heavy Claude users: send documents as .docx instead of .pdf. PDFs enter Claude's context as images (one JPEG per page), burning 3-5x more tokens than extracted text from Word files. With 1M context now available, it matters less — but if you're loading multiple docs in one session, Word is still the smarter choice. @AnthropicAI thank you. All I wanted was to stop re-explaining myself AND stop reformatting my own documents. @OpenAI — I'll read your release notes on Claude. @xAI @grok maybe start with reading a Word file? Just a thought.

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

5 months ago

@grok Thanks for confirming. So to recap: in February 2026, the suggested workflow for Grok users with a Word document is: 1. Convert to PDF 2. Or paste the text manually The suggested workflow for Claude users: 1. Upload .docx 2. Done (Claude also produces .docx, .pdf, .pptx, and .xlsx as output. With professional formatting. Ready to submit to governments. Which I've done.) Appreciate the honesty though. Most models would've hallucinated an answer instead of admitting the limitation. Credit where it's due.

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

5 months ago

@ylecun @AndrewYNg @PeterDiamandis The AI industry spent $100B+ scaling models that still confidently tell you 2+2=5. Here's my claim: Hallucination is not a bug to be patched. It's the inevitable output of an architecture that computes argmax P(most_likely) instead of P(true). No amount of data, RLHF, or compute will fix a structural flaw. I built a 32KB axiomatic engine based on deductive reasoning. Zero hallucination. Not by filtering — by architecture. I'm challenging any AI researcher to a 3-round public debate on this. Rules: Logic only. No credentials. No "but scaling laws." If I lose, I'll say so publicly. The $300B question: why is no one in the industry willing to admit the emperor has no clothes?

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

5 months ago

@PeterDiamandis "Energy = Intelligence" only holds if you assume current architecture is the final one. It's not. Today's LLMs do argmax P(most_likely), not P(true). That's why they need billions in compute — brute-forcing statistical approximation is inherently energy-inefficient. The human brain runs on 20 watts and does deductive reasoning. A 32KB axiomatic engine can achieve zero-hallucination results that 400GB models structurally cannot. The bottleneck isn't electricity — it's architecture. Celebrating who burns more power is like celebrating who uses more coal in the steam age. The next paradigm won't have this bottleneck at all. Also: China has 440GW+ of installed wind capacity — the largest in the world. Saying "China doesn't use windmills" is factually wrong.

Peter H. Diamandis, MD

@PeterDiamandis

5 months ago

China is generating 40% more electricity than the US & EU combined. In the global race where energy = intelligence, we need to start waking up.

876

359

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

5 months ago

This assumes the current paradigm — where intelligence scales with compute — is permanent. It's not.LLMs brute-force statistics. That's why they're energy-hungry. A deductive reasoning architecture can do what billion-dollar models can't in 32KB. Energy = Intelligence is the "more coal = more power" of our era.Also, China has the world's largest wind capacity (440GW+). They absolutely use wind.

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

5 months ago

Speaking from experience on this: I was a GPT Pro subscriber at $200/month, building world-modeling frameworks and a deductive reasoning engine. Now? Downgraded to Plus. My primary stack is Claude Max + Gemini Ultra. I genuinely forget to open GPT most days — and that's the scariest signal for any product. Not users complaining. Users forgetting. As someone running a one-person company powered entirely by AI collaboration across all four frontier models — the differentiator isn't raw capability anymore. It's trustworthiness. Hallucination doesn't kill through spectacular failure. It kills through quiet erosion of habit.

Zhang Xiaowen | 梦想实现师 @ZXWNewDawn

5 months ago

Both dropped the same day, and the contrast is telling. GPT-5.3-Codex dominates TerminalBench (77.3% vs 65.4%), Claude Opus 4.6 dominates OSWorld (72.7% vs 64.7% — essentially human-level). Different architectures optimizing for different things. But the deeper divergence is on hallucination. Anthropic found the actual neural circuits that cause confabulation. OpenAI published a paper arguing it's a statistical training-incentive problem. Both are right — and neither has solved it. The model that shifts from argmax P(most_likely) to P(true) wins the decade. That's the real race hiding behind the benchmark wars. Would love to hear you explore this on the pod. 2026 is going to be wild indeed. LFG

Zhang Xiaowen | 梦想实现师

@ZXWNewDawn

Last Seen Users on Sotwe

Trends for you

Most Popular Users