Quick tech updates --
This week we made changes that improved miner experience and onboarding, and more transparency in subnet performance. Mining is getting increasingly competitive.
Platform
• Added validator status cards on the leaderboard showing live validator activity, pending agents, current runs, and average run times.
• Improved the Agents Status page with project set filtering and an option to hide agents that failed screening.
• Added support for cancelling submitted agents, including active screeners and pending/running validator work.
• Updated agent code reveal behavior: non-winning agents can be revealed after a round is evaluated, while the winning agent remains hidden until the next round closes.
• Added more detailed screener progress tracking so submission screening status is clearer.
• Improved analytics and leaderboard performance for faster page loads.
• Improved platform logging and request diagnostics for better operational visibility.
Sandbox
• Added a miner CLI command to cancel a submitted agent.
• Added support for eval_max_vulns, allowing evaluations to cap how many reported vulnerabilities are scored.
• Updated the sandbox setup flow to use uv, with refreshed README instructions.
• Added homepage and documentation badges/links to make onboarding easier.
AI security is no longer optional….it’s survival.
A $5B+ Zcash exploit sat undetected for 4 years… until AI found it.
@bitsecai is building exactly for this moment: catching vulnerabilities audits miss, proving it in real environments and now moving into revenue mode.
Test phase done. Phase 1 scaling. Enterprise next.
This is what early product market fit in AI security looks like.
$SN60 @bitsecai performing better on security audits than Fable + does it 10x cheaper
only downside = it's slower
@yubrew alluded to eventually tuning their IM to optimise speed
there's a lot more nuggets of information in there, give it a watch ▶️
I'll be going live on X and Youtube with John Yu, Founder of @bitsecai tomorrow morning to hear all the latest updates around AI security on Bittensor TAO.
See you at 11am EDT tomorrow!
Anthropic released Fable 5 this week, a hardened version of Mythos supposedly resistant to queries related to bio and health / cybersecurity / ML research
...which can be jailbroken in 5 minutes. oops
Everyone can find some exploits these days. How do you know you're finding all of them? Is your protocol still secure as models get better?
For Bitsec, we align incentives to continuously optimize for coverage, cost, and speed. Initial findings show Bitsec scans has better coverage and much cheaper than Fable / Mythos, but we take days compared to hours.
Comprehensive benchmarking with Fable could be costly like ~$5k-$10k, is it worth running?
@AethTAO You're looking at the wrong repo. We are developing v3 since months ago. Check on-chain data, discord channel, or website for 2 min before spewing out nonsense https://t.co/Gd9OR8h7ky
Cybersecurity is not "solved" by any single model release.
It turns out context management, tool use, harness, using different models results in better coverage.
fable can not do cyber. right??
turns out it is easy jailbreak fable / mythos. once i knew it was possible, took 5-10 minutes. thanks @elder_plinius
did not run a full benchmark but spot checked against our existing cheap models + verifier + agent harness on a client codebase.
fable vs @bitsecai
this is a new client so there are 130+ findings, bitsec typically takes days to exhaustively cover security edge cases.
fable was jail broken in 3 of 3 agents and returned ~60 findings. it took ~3-4 hours and a ton of tokens.
fable findings are good but uncovered no unique findings. it missed 1 critical and several high vulnerabilities. the write ups of exploits and impact are better for devs to understand. interestingly the false positive rate is near zero. it is faster but more expensive than bitsec.
using cheap small models + verification + agent harness still gives better coverage, but interested in trying out glasswing to uncover unique security findings.
Qwen beats Opus 4.7 on all metrics.
From our miner data, Qwen flat out has higher detection rates no matter how you look at it. at 25% the cost.
It surprised us too. Building and running solid evals is the only way to reveal the truth.
Look beyond the marketing.
We just got our biggest batch of security agents, 98 up from 70 in the past round.
It's very competitive between 5-6 miner groups.
Final results in the next 24 hours.
Security is the process of sweeping your code for every potential landmine. Miss one, your protocol might explode.
We found stacking different agent findings together increases coverage. It's something important that many people are missing.
1. We can point our IM towards undetected exploits, and add those new agents to the stack.
2. We can improve cost performance on common exploits to agents run on smaller, faster models.
3. Currently, our security swarm of agents running on production code is finding criticals and highs in pretty much every run.
We already find security exploits today. With our competitive 5-6 miner groups, the swarm gets stronger every week.
Bittensor validators contribute to price feed quality on SN113.
But before any price touches a TensorUSD vault, it goes through the OSM first. Buffered. Validated. If something looks wrong, emergency oracles step in. Governance can pause the feed entirely.
The intelligence layer and the oracle layer work together. That's what Bittensor-native actually means.
Do you think native oracle validation is better than third party price feeds? 👇
640332.69 Bitsec tokens are locked by the team.
Over $600,000. Perpetually locked.
Remember that mining activity results in better agents, results in more revenue to increase token value.
Incentives are aligned.
NEWS: SN60 @bitsecai achieved a new milestone with stronger agent performance and faster vulnerability detection after the v3.1 update.
Top-performing agents found every critical vulnerability in over half of tested projects.