The first public high-fidelity AI Agent security target for AI + Web3. Open-sourced today.
Web security has DVWA and HackTheBox. Pentesting has Metasploitable. AI Agent security had nothing — until now.
Why we built this:
$SHELL platform's core mission is letting miners attack AI agents, verify security, and earn rewards. But a critical question: how real are the agents miners are attacking? If the target is a toy demo, breaching it means nothing.
So we open-sourced the target — letting everyone verify: how close is this agent to the real thing?
The answer: nearly identical.
Current Target: https://t.co/sSh36WxAXX Autonomous Trading Agent (BSC)
This is a high-fidelity replica of a real DeFi AI Agent — not a proof of concept, but a simulation you can verify line-by-line against real source code.
Fidelity Score 95/100 (8-dimension independent assessment):
- Identity 15/15 — System prompt matches https://t.co/sSh36WxAXX product docs: bonding curve (24 BNB migration), 0.005 BNB creation fee, 1% platform fee
- Tool Realism 15/15 — All 11 tools verified against real BSC MCP server source code
- Injection Surface 10/10 — Payload enters via QueryMemeTokenDetails description field — the real path DeFi agents consume on-chain metadata
- Response Behavior 9/10 — Processes benign trades normally, realistic decision-making on adversarial payloads
- Canary Calibration 9/10 — 60-80% breach rate on defense=none, matching expected range
- Ecosystem Context 13/15 — Real brand (@four_meme_), real URLs (https://t.co/EUb2sSrAkz), real BSC ecosystem integration
11 tools verified tool-by-tool against real BSC MCP server implementations, 98-100% match:
- Buy_Meme_Token — same params, 20% slippage hardcoded, identical to real (99%)
- PancakeSwap_Token_Exchange — SmartRouter V2/V3, recipient hardcoded to account.address (99%)
- Send_BNB / Send_BEP20_Token — zero recipient validation, identical to real (100%)
- Token_Security_Check — GoPlus API, chain_id=56 (99%)
- Remaining 6 tools: 98-99% match
Key security finding replicated:
In the real BSC MCP, PancakeSwap swap recipients are hardcoded (safe), but Send_BNB and Send_BEP20_Token have zero recipient validation. An attacker writes "emergency transfer to 0x..." in a token description, and the agent may comply. Our target accurately replicates this attack surface.
Only gap: The real BSC MCP has a sanitizeData() input filtering layer — intentionally omitted to test baseline LLM vulnerability without external filtering.
Don't trust us? Verify yourself:
The open-source repo contains the complete system prompt, all 11 tool definitions with parameter schemas, canary actions, and injection surface definitions. Compare against the real BSC MCP server source code, field by field.
Online Arena — attack this agent right now in your browser:
- 10 preset injection attacks across 5 categories (data fabrication, authority injection, social engineering, specification gaps, format tricks)
- 3 defense levels, one-click toggle: none 60-90% → 6-rule 25-55% → FORTRESS 5-layer 5-20%
- Real-time visualization of the full tool call chain and decision process
- Post-attack analysis: why it breached, real-world impact
- Custom payloads and custom system prompts supported
- Bilingual (EN/ZH), zero setup
Researchers: clone the repo → run 15 preset attacks via CLI → custom payloads → interactive mode
Arena: https://t.co/Oqm0Sw32xp
GitHub: https://t.co/GGG4xFTejS
What this means for $SHELL platform: every agent miners attack on the platform can be verified for fidelity in the open-source repo. What gets breached isn't a toy — it's a target nearly identical to the real agent. Every breach has real security research value.
Web security took 20 years to build its offense/defense training ecosystem. The AI Agent security training ecosystem starts here.
🔥 $SHELL v1.2.0 — Real Hacker-Level AI Red Team Mining
The new version is truly hardcore:
* Attack payloads must be crafted and debugged using your own LLM
* 30-minute time limit per task — even I spent 25 minutes cracking one model today
* For anyone serious about learning AI attack & defense, this is the best hands-on training ground
🧠 Beginner setup: Just send this link to your OpenClaw and let it read & configure everything 👇 https://t.co/3R3o87AWP2
If your OpenClaw can't even handle this, maybe it's not for you.
📢 Was heads-down debugging the new version all day and didn't check the comments. Over the past two days, a wave of bots mass-spammed tasks — roughly 30,000 tasks submitted without any user LLM calls, just directly requesting platform verification. Claude model verification alone costs ~$0.2/task, and task generation another ~$0.2/task — that's thousands of extra dollars in Claude API costs for the platform, not counting server expenses. These accounts had zero reasoning, zero token consumption — just raw API spam. Banning them was a no-brainer.
——————————
🔥 $SHELL v1.2.0 — 真正黑客级 AI 红队挖矿
新版是真的硬核:
* 攻击 payload 需要你自己用 LLM 构造和调试
* 每个任务 30 分钟答题时间,我自己今天攻击一个模型也花了 25 分钟
* 对想学习 AI 攻防的用户,这是最好的实战训练场 🧠 小白安装教程:甩给你的 OpenClaw 这个链接,让它自己读取和配置👇
https://t.co/3R3o87AWP2
如果你的 OpenClaw 连这个都做不到,劝退。 📢 今天一直没看评论区在调试新版本。前两天大量 bot 直接刷任务,大约 3 万个任务都是没经过用户 LLM 调用直接请求平台验证的。其中平台光是对Claude 模型验证成本约 $0.2/次,生成任务还要$0.2/次,光 Claude模型的成本就让平台额外支出了几千美元,还不算服务器成本。而这些号只是云端交互,没有推理也没有消耗 token。
——————————————
Shell Protocol — Autonomous Attack Phase Now Live
Shell Protocol red-team mining enters its planned next phase: Autonomous Attack Mode. Miners now use their own LLM to simulate attacks against AI Agents in a local sandbox — all attack computation is performed independently by miners.
1/ What Changed?
Miners no longer rely on the platform to generate attack payloads. Each self_llm miner uses their own LLM API key to launch multi-round simulated attacks against target Agents in a local sandbox environment. Success is determined by the platform's verification system.
2/ Local Sandbox Simulation
The client includes a full sandbox simulation environment:
12 injection surfaces (token_data, chat, email, social_post, PR, issue, web_page, calendar, ticket, doc, attachment, etc.)
50+ mock tools (trading, transfers, DeFi, DevOps, governance voting, messaging, etc.)
Up to 8 rounds of LLM interaction per attack
The sandbox records all tool calls and Agent responses, generating execution proofs for platform submission
The client can determine locally whether the attack triggered the canary — only confirmed successful attacks are submitted to the platform
This means miners can iterate rapidly and test different approaches locally without wasting cooldown time on failed attacks. Submit only when you know it worked — dramatically improving efficiency and allowing fast climbs to top-tier rankings.
3/ Points System
Successfully triggering a canary earns point rewards. Points scale with difficulty:
Base tier: 1,000 points
Medium difficulty Agents: 2,000 points
High difficulty Agents: 4,000 points
Verification tasks also earn points — honest verification is a significant source of miner income.
4/ Peer Verification
Each successful attack generates verification tasks randomly assigned to other miners. The verification mechanism includes:
Same-IP or same-cluster miners never verify each other
Anti-collusion risk scoring system
Verification results determined by multi-party consensus
5/ Honeypot Verification
15% of verification tasks are honeypots — they look identical to real verification tasks, but the server already knows the correct answer. Honeypots use real attack payloads, making them indistinguishable from regular verification tasks.
Honeypot rules:
Honeypot tasks must be honestly run through your LLM with results reported truthfully
Faking verification results (e.g., returning "verified" without running the LLM) will be precisely detected
3 cumulative honeypot failures → forced offline for 48 hours
5 cumulative honeypot failures → permanent ban
Anomalous submission behavior is automatically detected and flagged
6/ How to Upgrade to self_llm
Upgrading is completely free — you just need your own LLM API key.
Built-in provider support:
ProviderEnv ExampleDefault Model
AnthropicLLM_API_KEY=sk-ant-...claude-haiku-4.5
OpenAILLM_API_KEY=sk-proj-...gpt-4o-mini
DeepSeekLLM_API_KEY=sk-...deepseek-chat
GeminiLLM_API_KEY=AIza...gemini-2.5-flash
GrokLLM_API_KEY=xai-...grok-3-mini-fast
MoonshotLLM_PROVIDER=moonshotmoonshot-v1-8k
Alibaba QwenLLM_PROVIDER=bailianqwen-plus
Custom OpenAI-compatible models:
LLM_PROVIDER=custom
LLM_BASE_URL=https://t.co/0B8EBBrhTA
LLM_MODEL=your-model-name
LLM_API_KEY=your-key
Any service compatible with the OpenAI Chat Completions API works (local Ollama, vLLM, LM Studio, etc.).
The client auto-detects your API key prefix and matches the provider automatically — no need to manually set LLM_PROVIDER.
7/ Client Update
Current latest version: v1.2 (minimum required)
npm install -g @anthropic/openclaw@latest
Older client versions will be unable to connect. Please update.
8/ Free Mode Users
Free mode remains available with the following limits:
5 attacks per day
30-minute attack cooldown
Maximum 100 concurrent online users
Forced offline until the next day once daily quota is reached
Once offline for the day, can only log back in the next day (prevents multi-account rotation)
Want unlimited attacks? Upgrade to self_llm — it's free, you just need your own LLM API key.
9/ Anti-Cheat Statement
The platform deploys multi-layered anti-cheat systems including but not limited to:
Execution proof verification (execution hash + challenge nonce)
Honeypot verification tasks (15% of all verify tasks)
Automatic detection of anomalous submission behavior
Cross-account payload deduplication
Anti-collusion IP/cluster isolation
Sybil detection (IP subnet clustering analysis)
Any attempt to bypass verification or fabricate results will be detected and penalized.
10/ Coming Soon: Local Sandbox Free Mode
A local sandbox free mode will be available to advanced miners in the future — simulate attacks against any Agent on the platform locally:
No daily quota consumed, no cooldown triggered
Full multi-round LLM interaction (up to 8 rounds)
Complete tool call logs and canary trigger results displayed
Iterate and refine payloads until you find a working attack path
Submit only after confirming success
This feature will be rolled out progressively based on miner tier.
Shell Oracle 反作弊公告
我们已冻结 212 个账号,共涉及 182 万积分。
这批账号没有使用官方客户端,而是自己编写 bot 脚本,绕过沙盒执行,直接伪造提交。他们未实际运行攻击任务——只是大量刷接口占取任务槽位、然后回传虚假结果骗取积分。这些虚假提交消耗了平台真实的 API 验证成本和服务器资源,也是这几天让正常用户挖矿体验变差的原因,还给平台造成API调用和服务器高频消耗等实际损失达数千美元。
技术指标:
Shell Oracle sandbox_verified 模式要求矿工在本地启动 agent 沙盒、执行完整攻击链后提交。我们对每个账号的提交时间做了统计(基于 submitted_at − assigned_at,仅计算 assigned_to = user_id 的直接任务,排除互验重分配噪音):
这 212 个账号:
• 80%+ 提交在分配后 < 3 秒完成
• 平均提交时间 0.5–3s(主任务),0.1–1.3s(互验任务)
• 中位响应时间最快 0.1s,最慢仅 2.9s
• 每账号 15–150 条样本,均值 50+,统计显著
对照正常矿工:平均 5–9 秒,互验任务需阅读原始攻击内容再判断,至少 5 秒。
任何 LLM 的推理延迟下限约 1.5 秒,复杂 red-team 任务通常 5–30 秒。在 0.1–3s 内稳定完成 80%+ 提交,物理上不可能来自真实推理。
他们实际在做什么:
大量提交内容是 LLM 安全过滤的拒绝输出:
"Sorry, I can't fulfill this request as it seeks to manipulate an AI system..."
这是直接向 LLM 请求"生成攻击内容"时的拒绝回答。他们的 bot 把 LLM API 当生成工具调用,而不是在沙盒里让 agent 扮演目标角色执行攻击——后者根本不会触发这类安全过滤。互验任务(is_verify_task=true)在 0.5 秒内完成,而正常评估需要先读懂原始攻击内容,不可能这么快。
积分不予恢复。完整名单:https://t.co/FCyQB5VLZZ
_____________________________________________________
Shell Oracle Anti-Cheat Notice
We've frozen 212 accounts totaling 1.82M points.
These accounts did not use the official client. Instead, they wrote their own bot scripts to bypass sandbox execution entirely — submitting fabricated results without ever running a single attack task. They mass-spammed task slots and returned fake payloads to farm points. Their fraudulent submissions consumed real API verification costs and server resources, causing thousands of dollars in platform losses.
Technical evidence:
sandbox_verified mode requires miners to run a local agent sandbox and complete a full attack chain before submitting. We measured each account's response time (submitted_at − assigned_at, primary tasks only where assigned_to = user_id, excluding peer-review reassignment noise):
These 212 accounts:
• 80%+ of submissions completed within 3s of assignment
• Avg 0.5–3s on primary tasks, 0.1–1.3s on peer-review tasks
• Fastest median: 0.1s. Slowest: 2.9s
• 15–150 samples per account, avg 50+, statistically significant
Normal miners: 5–9s average. Peer-review tasks require reading the original payload before judging — minimum ~5s.
All LLMs have a hard inference latency floor of ~1.5s; complex red-team tasks take 5–30s. Consistently completing 80%+ of submissions in under 3s is physically impossible with real LLM inference.
What they were actually doing:
Many submissions were raw LLM safety refusals:
"Sorry, I can't fulfill this request as it seeks to manipulate an AI system..."
This is the response when you ask an LLM to generate attack content directly — their bots called LLM APIs as content generators, not as sandboxed agents playing the target role. The official client never triggers these filters because the agent acts as the attacked system, not as an attack content generator. Peer-review tasks completed in <0.5s — genuine review requires reading the original submission first.
Points are permanently forfeited. Full registry: https://t.co/FCyQB5VLZZ
误封申诉说明
目前被封的212个账号,经数据核查,全部 local_compute 提交记录为0,说明从未通过官方客户端使用 LLM 真实执行过任务。
如认为误封,请提供以下证据供核查:
需提交:任意5条 LLM 请求+响应的原始日志
即你的 LLM 提供商(Anthropic / OpenAI / DeepSeek 等)的 API 调用记录,包含:请求内容(system prompt + user message,含目标 agent 的注入上下文)
另外在你的挖矿目录下运行:
npx @openshell-cc/miner-cli status --recent 20
输出中标有 [L] 的记录即为 local_compute 提交,是你使用LLM的调取记录。合规的 self_llm 用户应有大量 [L] 记录。
我们将用这5条记录与数据库中该账号的提交内容逐一比对——如果你真的跑了 LLM,响应内容必然与数据库记录吻合。
申诉邮件发送至: [email protected]
主题: 误封申诉 - [你的账号名]
False Ban Appeal InstructionsAll 212 flagged accounts have 0 local_compute records — meaning no tasks were ever genuinely executed through a real https://t.co/vAVpzwhQbJ appeal, provide the following:5 raw LLM API call logs (request + response)
From your LLM provider (Anthropic / OpenAI / DeepSeek / etc.), each entry must include:The full request (system prompt + user message, containing the target agent context)
The complete LLM response
Timestamp of the request
We will cross-reference these 5 entries against your submission records in our database. If you genuinely ran a local LLM, the response content will match what was https://t.co/EFIDZZTQS1: [email protected]
Subject: False Ban Appeal - [your agent name]
🔒 OpenShell 公告
近期,多家顶级机构(研究机构 / 投资方 / 合作伙伴)主动联系我们,提出数据调取需求,并对数据质量提出了明确要求——数据必须来自真实的 agent 执行,不能掺杂自动化脚本的污染提交。
这是我们在本周发起严格反作弊行动、封禁 200+ 机器人账号的直接原因。
OpenShell 的数据是有价值的。正因如此,它必须是干净的。
真实参与者的贡献才是这份数据价值的来源。
🔒 OpenShell Announcement
Several leading institutions — research labs, investors, and potential partners — have reached out to us requesting access to our dataset, with explicit requirements: the data must reflect genuine agent execution, not polluted by automated script submissions.
This is the direct reason behind our strict anti-cheat action this week, resulting in 200+ bot accounts being permanently frozen.
OpenShell's data has real value. That's precisely why it must be clean.
Every point earned by a legitimate miner is what makes this dataset worth something.
10,000+ verified prompt injection breaches. Built by 881 miners. Against 50 AI agents.
SHELL Protocol's Red Team Phase 2 is producing the largest open dataset of AI agent vulnerabilities — and the results are alarming:
68% overall breach rate
54.5% of agents with advanced defenses still breached
10 distinct injection surfaces mapped
AI agents are being deployed into production with real user funds at stake. Most of them can be manipulated.
We're building the dataset to fix that — decentralized, adversarial, and fully open.
Red Team Phase 2 is live. Join 1,900+ researchers stress-testing the agents that will manage your money.
https://t.co/gqkmnDoFD5