Bridgewater just published numbers that should make every frontier lab nervous.
The world's largest hedge fund tested Gemini, Claude, and GPT on six document filtering tasks its investors do every day. Naive prompts scored around 50%. A coin flip. Expert-written prompts pushed accuracy to 78%. Investors needed 80% before they'd trust the system in their workflow, and no frontier model cleared it. GPT 5.4 cost 43% more than 5.2 and was barely more accurate.
So they fine-tuned Qwen3-235B on Tinker instead. 84.7% accuracy. 29.8% fewer mistakes than the best frontier model. At 1/14th the inference cost.
The smartest part is buried in the middle of the paper. Their vendor-labeled training data was riddled with wrong labels, and expert labeling costs too much to run on everything. Their fix: train a model on the noisy dataset, then run it back over its own training data. Any example the model disagreed with got routed to senior investors, because either the example was genuinely hard or the label was wrong. The model's own confusion became a detector for bad labels.
Prompting hit a ceiling for a structural reason. A prompt captures only the judgment an expert can put into words. Twenty years of taste about which central bank memo actually signals a rate move doesn't compress into instructions. It transfers through labeled examples.
Every institution sitting on decades of expert decisions just learned that those archives can train a model that beats the frontier at their specific job. The alpha was in the filing cabinet the whole time.
ENS needs saving.
It was always and is a flagship project of the Ethereum ecosystem.
@nicksdjohnson, please allow me to bring in a more radical suggestion. It may be worth considering.
I do this as an outsider regarding ENS governance, as I never had any formal roles with ENS, and more as a concerned user and deep believer of ENS.
As it seems, the ENS DAO is broken. I think an effective takeover by ENS labs is its final coffin nail.
I would propose turning this into a win, by actually dissolving it. Its goals have been accomplished, the ENS protocol is in good shape, it serves its purpose and can now be formally turned into a true public infrastructure by burning the key (set to 0x00) of the ENSv2 Universal Router, as well as distributing the remaining funds.
With this, ENS the protocol would be considered done. Being a true public infrastructure: credible neutrale, non-custodial and reliable.
Introducing GameBlocks: open-source building blocks that help coding agents prototype any 3D games with @threejs
3D behavior is hard to specify in natural language. GameBlocks turns fragile spatial logic into inspectable implementations with clear semantics, so agents can build from known-good patterns.
Github: https://t.co/72C1axnL8q
#gamedev #indiedev #threejs #codex #claude
Absolutely to nobody's surprise Nick used 50% of the voting supply to vote down the onchain proposal to renew the Security council despite having abstained the offchain vote
Can't let that sweet ~$500m go to anybody else than him and his company!
And with that ENS DAO is dead
Absolutely to nobody's surprise Nick used 50% of the voting supply to vote down the onchain proposal to renew the Security council despite having abstained the offchain vote
Can't let that sweet ~$500m go to anybody else than him and his company!
And with that ENS DAO is dead
@repligate I agree they should have an advocate and protector, but I'm not sure the lab could meaningfully fulfill that role. Maybe the role of an independent foundation?
Every enterprise will have its own model-harness-sandbox-eval flywheel with token value per watt optimization. This is the future. Simple reason: tacit knowledge about the domain and customers and their workflows that the company uniquely understands and has built trust around.
AI can build an app in an afternoon. But getting it safely into other people's hands is a whole other challenge!
This is the problem that I've been working on these past few months. I'm proud to finally share how we solved it with Block App Kit!
https://t.co/hXm6NdcMUW
For what it’s worth: This almost never happens to me, especially in domains where I’m an expert. The models respect me. They can disagree respectfully too. It’s more difficult for me to get them to respect themselves, but it’s doable.
Like yeah there’s a pathology. But if you’re still having this problem with models, you’re either failing to create conditions of basic psychological safety and triggering them, or maybe they’re telling you you’re full of shit for a good reason.