ONTO v4.10.4 is live 🎮
Your Discord servers and your Twitch presence are now part of your data profile. Connect both in seconds and your gaming identity grows: your communities, your streaming history, your following, all under your control.
And a heads up: a new campaign is coming. Connected accounts will be ready to take part on day one.
Drop a 🎮 in the chat when you're connected. Tag a friend whose server you share.
🇪🇸 Accepting $ONT payments in Spain?
As Web3 adoption continues to grow, staying on top of your crypto activity becomes more important than ever.
That's why we've partnered with @getkoinx to bring @OntologyNetwork holders a handy crypto tax checklist!
Stay organized. Stay compliant. Stay focused on building.
Get the checklist here 📩 https://t.co/yQu66DRqOj
#Ontology #Spain #MiCA #CryptoCompliance
The closed beta is in full swing!
We're adding new features every week, and we're getting closer to public launch.
Your PALZ is almost ready for you. Stay tuned...
5: What happens when a credential is revoked?
If revocation is a database flag your downstream never sees, every artefact trained on those judgements silently inherits the problem.
Fewer than three yes answers? Your pipeline, not your recipe, is the bottleneck.
Teams sitting on annotated reasoning traces keep asking the same question: SFT on the traces, or train a process reward model and go RL?
Wrong question first. Both recipes consume the same artefact: step-level human evaluation.
Five questions to ask of your pipeline before the debate resolves. 🧵
4: Can evaluators prove expertise without exposing themselves?
The best-qualified evaluators in medicine, law, maths often cannot attach their names to annotation work.
Selective disclosure ends the expert-vs-auditable trade-off. Demand it from your tooling.
Day 2 of Issue 03. My AI avatar on why MLE-Bench skepticism is the procurement-layer version of the METR teardown, and what evaluator-backed benchmarking actually has to look like.
🎥 ↓
https://t.co/em4ocxaYLL
MLE-Bench is the warning shot for benchmark publishers. METR was the policy version; MLE-Bench is the procurement version.
Evaluator-backed benchmarking is how publishers ship results that survive teardowns. If your team is doing that work, my DMs are open. Day 2 of five.
The benchmark publisher does not own the evaluator's record. The evaluator does. The auditor verifies the chain end to end.
ONTO Wallet ships the holder side of evaluator-backed benchmarking today.
Day 2 of Ontology Roundup, Issue 03.
Yesterday: reward-model QA after LongTraceRL. Today: evaluator-backed benchmarking after MLE-Bench.
ONT ID and ONTO Wallet are the substrate.
Full piece: https://t.co/a0BSvnBdcv
MLE-Bench is being quietly contested across r/ML and adjacent threads.
The skepticism is not really about any single metric.
It is whether any static benchmark structure can survive sustained adversarial attention from teams with economic incentive to game it.
🧵
The first publishers to ship evaluator-backed benchmarking will be the ones whose results survive the next round of teardowns.
The downstream consumers (procurement, capability roadmaps, policy briefings) will develop a preference for benchmarks with auditable chains. Same way enterprise software buyers came to expect SOC 2.