Asked Claude Code for a "deep search" in ultracode mode. It didn't run a search — it wrote one.
~70 agents, fanned across four phases: discovery → benchmark → enrich → verify. Every project fetched and cross-checked independently. I just watch /workflows and wait for the ping.
This is the part people miss about ultracode: you don't get "more agents," you get an orchestration plan written on the fly, with the loop and intermediate results living in a script instead of the context
window.
The catch is cost. ~70 agents is ~70 context setups, not one. Worth it here because the task was too big for a single window. Not worth it for a 3-file change.
I wrote up the full cost model — when a workflow earns its tokens and when it just burns them: https://t.co/yMD3egtJAh
Follow @avi_sangle for more Claude Code deep-dives.
Claude Code shipped dynamic workflows on May 28 with Opus 4.8.
Everyone's quoting the headline: a workflow ported Bun from Zig to Rust. 750K lines, 11 days, 99.8% of tests passing.
Nobody's answering the question that actually matters: what does it cost, and when is it worth it?
WHAT IT IS
You describe a task. Claude writes a JavaScript script that fans it across subagents - up to 16 running at once, 1,000 total per run. The script holds the loop and the intermediate results, so only the final answer hits Claude's context.
It's not "more subagents." It's a repeatable pattern: find issues, then have independent agents try to refute each one before it counts.
THE COST
No fixed token rate. Each agent pays its own context overhead at your session model's rate. Fan a task across 40 agents and you pay 40 context setups, not one.
A 500-agent audit can shift your bill by an order of magnitude.
I model a run before starting it: agent count x context per agent, plus what each returns. A 40-file auth audit pencils out near 850K tokens - roughly 5-7x the same task in one session.
WHEN IT'S WORTH IT
Only when the work is too big for one context window:
- a codebase-wide audit
- a 500-file migration
- research cross-checked across sources
- a plan stress-tested from several angles
For a single bug fix or a 3-file feature, a normal session is cheaper and faster. A workflow there just burns tokens.
THE TRAP
ultracode (/effort ultracode) makes Claude plan a workflow for every task automatically. One request becomes several workflows. It's the fastest way to 10x your bill without noticing. Use it deliberately.
Full guide - cost model, scoping tactics, the same-session resume gotcha, and how it compares to /ultrareview:
https://t.co/yMD3egtJAh
Follow @avi_sangle for more Claude Code deep-dives.
Persistent memory for AI coding agents in 2026, broken down.
CLAUDE.md is the floor, not the ceiling. It loads at session start and never updates. Anything the agent learns mid-session is gone at /exit. Long refactors, multi-session debugging, team-shared learnings all hit that wall.
THE THREE TIERS
Tier 1 - Static files (CLAUDE.md, AGENTS.md). Zero infra, read-only.
Tier 2 - MCP memory servers (agentmemory, claude-mem, mcp-memory-service). Local SQLite, cross-session recall, works in any MCP client.
Tier 3 - Platform-native: Anthropic Memory tool, Dreaming, Memory for Managed Agents, Cloudflare Agent Memory.
THE NUMBERS THAT MATTER
agentmemory: 19.5K stars, 95.2% R@5 on LongMemEval-S, ~92% context reduction, 12 auto-capture hooks, dashboard on :3113.
claude-mem: 79.5K stars, ~10x token savings, plugin marketplace install, Postgres + BullMQ server-beta for teams.
Anthropic Memory tool + context editing: 84% token savings on long-running tasks. Beta header context-management-2025-06-27. Tool type memory_20250818.
Dreaming (research preview): Harvey reported ~6x lift in task-completion rates.
THE GOTCHA NOBODY MENTIONS
The Anthropic Memory tool is NOT Claude Code CLI memory. memory_20250818 is an API capability you wire into agents you build with the SDK. Claude Code the CLI uses CLAUDE.md + MCP servers. If you want Memory-tool semantics inside Claude Code, the path is an MCP server.
FIVE-MINUTE INSTALL (agentmemory + Claude Code)
npm install -g @agentmemory/agentmemory
claude mcp add agentmemory agentmemory -- serve --stdio
claude
/mcp # expect: agentmemory connected
open http://localhost:3113
Next session: "What do you remember about this project?"
PICK THE LOWEST TIER THAT SOLVES YOUR PROBLEM
Solo dev, short sessions: CLAUDE.md only.
Solo dev, week-long work: CLAUDE.md + agentmemory or claude-mem.
Team with shared agents + audit: Memory for Managed Agents.
Building on the API: Memory tool + context editing.
Full breakdown with the benchmark caveats, decision matrix, and trade-offs (token cost, staleness, cross-agent leakage):
https://t.co/y82pNNBluD
Follow @avi_sangle for more practical Claude Code playbooks.
Spent the last six weeks running Qwen Code alongside Claude Code and Gemini CLI.
Every 2026 guide I read still tells you to "log in with your browser." That has been wrong since 2026-04-15.
Here is what actually works now.
THE OAUTH SHUTDOWN
The Qwen OAuth free tier was discontinued on 2026-04-15. If you copy-paste an old install script you will hit a 401 on first prompt and assume the CLI is broken. It is not. You just need an API key.
THREE PATHS THAT WORK TODAY
1. DashScope (Alibaba Cloud Model Studio) - the official path. Generate a key, drop it in ~/.qwen/settings.json with baseUrl pointing at dashscope-intl.
2. OpenAI-compatible provider (OpenRouter, Together, Fireworks). Same settings.json, baseUrl is the provider URL, model is qwen/qwen3-coder or equivalent. This is how teams that can't egress to Alibaba Cloud stay on Qwen.
3. Self-host Qwen3-Coder with vLLM or Ollama. Weights are Apache-2.0. The 7B variant fits on a single 24GB card. apiKey can be "not-needed"; the CLI just wants the header present.
THE 1M CONTEXT REALITY CHECK
Native context is 256K. YaRN extrapolation stretches it to ~1M, which is what the marketing pages quote. In practice recall starts dropping somewhere past 400K tokens. Keep YaRN off for normal coding. Only flip it on for whole-repo summarization runs.
QWEN CODE VS CLAUDE CODE VS GEMINI CLI
After six weeks of side-by-side use:
- Claude Code still wins on multi-step reasoning and complex refactors.
- Qwen Code wins when you need open weights, cost control, or on-prem self-host.
- Gemini CLI gets sunset on 2026-06-18. Migrate to Antigravity or pick one of the above.
CI/CD IS THE UNDERRATED WIN
qwen -p runs headless inside GitHub Actions. I wired it into a PR-summary workflow that costs ~$0.002 per PR on qwen3-coder-flash. Full YAML in the post.
The whole walkthrough (install, three auth paths, model config, CI/CD recipe, comparison table, common errors) is here:
https://t.co/QqMO2eFxaB
Follow @avi_sangle for more Claude Code + multi-CLI workflow notes from Pune.
Gemini 3.5 Flash beats Gemini 3.1 Pro on 11 of 15 agent benchmarks. At $1.50/$9 per 1M tokens.
The headline writes itself. The routing question is what nobody is answering for Claude Code users.
I wrote that piece.
THE BENCHMARKS THAT MATTER
- Terminal-Bench 2.1: 76.2% (vs 70.3% on 3.1 Pro)
- MCP Atlas: 83.6% (beats Claude Opus 4.7 by 4.5 pts, GPT-5.5 by 8.3 pts)
- GDPval-AA Elo: 1656 (3.1 Pro: 1314)
- SWE-Bench Pro: 55.1% (Opus 4.7 still leads at 64.3%)
MCP Atlas predicts tool-call reliability across multi-step agents. That score is the load-bearing number for anyone running an MCP stack.
THE PRICING TRAP
$1.50 input looks cheap. Per task it isn't.
Simon Willison's analysis cites Artificial Analysis benchmark costs: $1,551.60 on Gemini 3.5 Flash vs $892.28 on Gemini 3.1 Pro. NxCode reports 9x the cost of gemini-3-flash on equivalent eval workloads ($1,552 vs $278).
Thinking tokens persist across turns. Agent loops chew more output. Cheaper per token, more expensive per workload.
THE THINKING_LEVEL DEFAULT TRAP
Google changed thinking_budget (int) to thinking_level (enum) and silently dropped the default from high to medium. Copy-pasted code from gemini-3-flash-preview keeps running but produces dumber outputs.
For agentic coding with MCP tools, set thinking_level="low" explicitly. Google retuned low for tool-calling. It is faster, cheaper, and on coding benchmarks roughly equivalent to medium.
Drop temperature, top_p, top_k from your config. They are silently ignored.
THE ROUTING RULE
I keep Claude Code with Sonnet 4.6 as the editor for anything that touches the repo. I route to Gemini 3.5 Flash for:
- MCP-heavy planning that fans out 10-100 tool calls
- Long-running background tasks (log triage, doc gen, cron agents)
- Cheap intermediate planning where Sonnet 4.6 is overkill
- Parallel sub-agent fan-out (cached input at $0.15/1M makes this viable)
Three ways to do it:
1. OpenRouter proxy
2. A thin custom MCP server wrapping generate_content
3. Antigravity CLI (Flash is the default model)
WHERE FLASH IS THE WRONG ANSWER
- Multi-file refactors in real repos (Sonnet 4.6 still leads SWE-Bench Verified)
- ARC-style reasoning (Flash gives up 5 pts vs prior Pro, 12.5 pts to GPT-5.5)
- 128k+ retrieval (regressed 7.6 pts vs 3.1 Pro)
- Defensive code review (Anthropic models add error handling more naturally)
ONE MORE THING
GitHub Copilot launched Gemini 3.5 Flash with a 14x premium-request multiplier. A 300-request Copilot Pro quota becomes ~21 Flash calls before overage. Raw API plus OpenRouter is almost always cheaper.
Full guide: benchmarks tables, before/after Python diff, the 40-line MCP agent, three routing mechanisms, and seven honest limitations:
https://t.co/B3FEeaqBgp
Follow @avi_sangle for more cross-vendor Claude Code workflows.
Gemini CLI dies on June 18, 2026 for free, Google AI Pro, and Google AI Ultra users.
The replacement is Antigravity CLI - a closed-source Go binary named `agy`. Google announced it at I/O 2026.
You have 30 days. Here is what migrating actually looks like.
THE NUMBERS THAT HURT
Old Gemini CLI free tier: up to 1,000 requests per day.
New Antigravity CLI free tier: weekly cap that developers in Discussion #27274 say empties in 4-5 chat turns, with a 166-hour reset.
That is the single biggest behavior change.
WHO IS CUT OFF
- Free (Gemini Code Assist for individuals): cut off
- Google AI Pro ($19.99/mo): cut off from Gemini CLI, moves to Antigravity Pro
- Google AI Ultra ($249.99/mo): cut off from Gemini CLI, moves to Antigravity Ultra
- Gemini Code Assist Standard / Enterprise: unchanged
- Gemini Code Assist for GitHub via GCP: existing installs unchanged, new installs blocked
THE OPEN-SOURCE ANGLE
Gemini CLI was Apache 2.0. Antigravity CLI is closed source. Issue #27304 on the gemini-cli repo asks for source release. No commitment from Google.
If Apache 2.0 portability is non-negotiable, stay on Gemini CLI pointed at a paid API key via AI Studio or Vertex AI. You lose the free first-party endpoints, not the toolchain.
7-STEP MIGRATION (45 MINUTES)
1. Install `agy` without uninstalling `gemini`
2. OAuth with the same Google account
3. `agy plugin import gemini`
4. Move `.gemini/skills/` to `.agents/skills/`
5. Move MCP configs out of settings.json into `mcp_config.json`, rename `url` to `serverUrl`
6. GEMINI.md and AGENTS.md both work unchanged
7. Run a real workflow under `agy` before uninstalling `gemini`
WHAT DOES NOT CARRY OVER
- Custom themes embedded in extensions
- Terminal-level `gemini skills` command
- Per-call temperature, top_k, and system instruction flags
THE THREE REAL ALTERNATIVES
- Claude Code: Opus 4.6, 1M context, 77.2% SWE-bench Verified. Strongest fit for MCP-heavy, scripted CI agents.
- Codex CLI: sandboxed PR-per-task workflow. Pair with Codex Security.
- OpenGravity: alpha-stage BYOK community clone, GPL-3.0. Sidesteps the weekly cap.
THE 30-DAY PLAN
Week 1: decide migrate vs switch vs stay
Week 2: install `agy` in parallel
Week 3: import plugins, diff `.agents/` against `.gemini/`
Week 4: validate hooks and MCP servers end-to-end
Full migration guide with the field-mapping tables, install commands per OS, and a Claude Code vs Antigravity CLI vs Codex CLI comparison:
https://t.co/nsRLW1pLP0
Follow @avi_sangle for more Claude Code and AI dev tooling deep-dives.
OpenAI relaunched Codex Security on May 11 as the front end of its new Daybreak cybersecurity stack.
Same day, Google disclosed the first confirmed AI-generated zero-day used in the wild. Defensive AI tooling stopped being a future problem.
Here's the setup playbook after running it on real repos.
THE NAMING TRAP
Three different products carry the Codex brand. Don't confuse them.
- Codex Security: the scanner at https://t.co/1V9tsidk4c
- Codex CLI: local coding agent
- openai/codex-action: GitHub Action for the coding agent, not the scanner
Codex Security is a hosted SaaS surface. There is no GitHub Action that runs it.
WHAT IT ACTUALLY DOES
Three-stage loop on every repo:
1. Identify - commit-by-commit pass against a threat model
2. Validate - sandbox spins up, agent tries to exploit the candidate finding, drops it if input validation or a WAF catches it upstream
3. Remediate - patch lands as a GitHub PR you review like any other contributor diff
The sandbox-validation step is the new part. That is what separates Codex Security from pattern-matching SAST.
THE NUMBERS
Independent test on 162,000 lines across four production repos:
- Codex Security: 31 findings, 23 real, 74% TPR
- Snyk: 89 findings, 25 real, 28% TPR
- Semgrep: 147 findings, 29 real, 20% TPR
OpenAI's own beta data: false positives down 50%+, noise reduction roughly 84% vs traditional SAST.
THE STEP MOST POSTS SKIP
Editing the threat model is the highest-leverage knob in the entire product.
Backfill produces a draft. You open it and document four sections: entry points, trust boundaries, sensitive data paths, review priorities. On a Django repo with 30 generic findings, a focused edit cut the Recommended list to 12 with much sharper prioritization.
Tip: don't draft it in the web editor. Paste into Claude or ChatGPT, iterate through conversation, paste back. Five minutes vs twenty.
COSTS
Roughly $0.02 per 1,000 LOC scanned.
- 100K-line repo: ~$2 per full scan
- 20 repos x 50K lines x daily scans: ~$600/month
Access is gated through ChatGPT Business, Enterprise, or Edu plans.
WHAT IT MISSES
Codex Security is code-level. It cannot see:
- Deployment misconfig (CORS, debug flags, TLS, security headers)
- Broken authorization at runtime (OWASP API #1)
- Business logic flaws across microservices
- Infrastructure-dependent issues (rate limits, secrets at orchestrator layer)
Pair with DAST in CI (StackHawk, ZAP, Burp) to cover the runtime gap.
VS CLAUDE CODE SECURITY REVIEW
Not a winner-takes-all. Layer both.
- Codex Security: continuous repo-wide audit with sandbox validation
- Claude action: per-PR advisory commentary
- Semgrep: deterministic blocking gate, no token cost
Full walkthrough with prerequisites, threat-model editing, triage, benchmarks, and the DAST gap:
https://t.co/UZtx0O1buD
Follow @avi_sangle for more Claude Code and AI security deep-dives.
Anthropic shipped Outcomes for Claude Managed Agents on May 6.
It's the first built-in auto-grader for AI agents. Send one event, hand the agent a rubric, and a separate grader model re-runs the writer until the artifact passes.
Here's what's worth knowing after going through the docs and cookbook:
THE LOOP
Writer drafts. Harness emits `span.outcome_evaluation_start`. Grader runs in a fresh context window, same model and tools as the writer. Verdict comes back on `span.outcome_evaluation_end`. If it's `needs_revision`, the gaps flow back into the writer's next turn. No human in the loop.
THE NUMBERS
Anthropic's internal benchmarks:
- +10 points overall task success vs standard prompting
- +10.1% on .pptx generation
- +8.4% on .docx generation
Largest gains land on the hardest tasks. Easy work looks fine on the first pass anyway.
THE FIVE RESULT STATES
- `satisfied` -> session goes idle
- `needs_revision` -> writer starts another pass
- `max_iterations_reached` -> one final revision allowed
- `failed` -> rubric and description contradict each other
- `interrupted` -> a user.interrupt event landed
THE RUBRIC IS THE LEVER
Default failure mode: a grader that approves everything. Fix: explicit, gradeable criteria.
"The CSV has a numeric price column" beats "the data looks good."
Don't ask the grader to verify factual accuracy it can't check. Anchor the rubric in structure and completeness. Anticipate shortcuts. Mandate a feedback format.
THE max_iterations TRAP
Defaults to 3, max 20. The cookbook recommends 5 for strict rubrics.
Decision rule: if the loop hits the cap with the SAME failures every iteration, the rubric is the problem. If it hits the cap with DIFFERENT failures converging, raise the cap.
Raising the cap to mask a bad rubric just doubles your token spend.
THE COST
No per-outcome fee. The cost driver is iteration count. Every revision adds writer + grader tokens and keeps the $0.08-per-session-hour clock running. A 20-minute session that iterates twice runs about $0.029 in session-hours plus tokens.
Way cheaper than a human review round.
WHO'S USING IT
Harvey (legal docs), Spiral by Every (editorial quality), Wisedocs (document QA).
Full walkthrough with Python code, the rubric anti-patterns, max_iterations tuning rule, and a comparison with LLM-as-judge tools and Codex /goal:
https://t.co/TY2F0aPCFr
Follow @avi_sangle for more Claude Code and Managed Agents deep-dives.
Anthropic's April 23 postmortem confirmed three Claude Code regressions over seven weeks.
The model never changed. The wrapper around it broke.
Here is the practitioner playbook nobody else has assembled.
WHAT WENT WRONG
Three confounding wrapper changes:
- Default reasoning effort downgrade (Mar 4 to Apr 7)
- Thinking-cache bug clearing history every turn (Mar 26 to Apr 10, fixed in v2.1.101)
- System prompt cap forcing 25-word answers between tool calls (Apr 16 to Apr 20, reverted in v2.1.116)
Anthropic's own line: Claude would continue executing, but increasingly without memory of why it had chosen to do what it was doing.
Their evals missed all three.
THEN THE NEXT DAY
April 24 shipped v2.1.119 and v2.1.120 within 24 hours. Eight community-filed regressions in that window:
- claude resume crashed at startup
- Opus 4.7 silently routed to the 1M-context variant (different price, different cache)
- Resize-redraw UI duplication
- Auto-update broke
- mcp menu froze in WSL2
- CLAUDE.md ignored below 1/3 context
- Sandbox network enforcement leaked
- Worktree git merge hung on macOS 26.4
Pin or pay.
THE PLAYBOOK
Five layers, each one a short config change:
1. Pin the CLI to v2.1.117 via npm and add the version line to your user npmrc so auto-upgrade can't move you
2. Lock effortLevel in your user settings.json. The reasoning-effort downgrade was invisible to anyone who never set this
3. Allowlist your model set with availableModels. Add modelOverrides to map Anthropic IDs to specific Bedrock or Vertex inference-profile IDs
4. Wire a Stop hook that runs three to five fixture prompts after every session and pings Slack on drift
5. Keep a five-step rollback runbook so the morning a bad version ships is five minutes of work, not half a day
The full post has the 35-line Stop-hook script, the verified npm and settings syntax, the Bedrock allowlist YAML, and the residual-risk honesty section:
https://t.co/04HiCbnH9p
Follow @avi_sangle for more Claude Code deep-dives.
Anthropic's own security docs say it plainly: "The action is not designed to be hardened against prompt injection."
In April 2026, security researcher Aonan Guan proved it. A single crafted PR title was enough to steal ANTHROPIC_API_KEY and GITHUB_TOKEN out of Claude Code running in GitHub Actions. Anthropic rated it CVSS
9.4 Critical. The same attack shape hit Gemini CLI and GitHub Copilot Agent too.
After going through the disclosure, the vendor fixes, and Anthropic's own security.md, I wrote the practitioner guide nobody has published yet - the assembled hardened workflow.
What actually moves the needle:
- Allowlist tools with --allowedTools. Anthropic's fix added --disallowed-tools 'Bash(ps:*)' but blocklists are whack-a-mole. A review agent gets Read, Grep, Bash(gh pr view:*). Nothing more.
- Scope GITHUB_TOKEN to read-only. permissions: read-all at the workflow level, elevated only per job.
- Move secrets to OIDC via AWS Bedrock or Vertex AI. No static ANTHROPIC_API_KEY in GitHub secrets means nothing to leak and nothing to rotate.
- Cap script invocations with CLAUDE_CODE_SCRIPT_CAPS so an injected prompt can't loop.
- harden-runner in block mode (not audit) with an egress allowlist. If an injection escapes every other control, the shell still can't POST to https://t.co/jmBEdj9HyP.
The before/after diff is 35 lines. That's the cost of hardening a workflow. Compared to rotating an exfiltrated key and auditing every downstream service, it's a bargain.
What it won't fix: prompt injection at its core is "context the agent is designed to process." No YAML change fixes that. Keep humans in the loop for merges and secrets-bearing jobs.
Full write-up with the workflow, six starter tool allowlists, OIDC walkthrough, and residual risk:
https://t.co/BbhcRuOJoz
If you're running Claude Code, Gemini CLI, or Copilot Agent in GitHub Actions today, what's the first control you'd ship?
#ClaudeCode #DevSecOps #PromptInjection #GitHubActions #AIEngineering
Anthropic ships an official GitHub Action that uses Claude to do security code review on every PR.
4,300 stars, MIT-licensed, and almost nobody has written a proper setup guide. So I did.
THE ACTION
`anthropics/claude-code-security-review` scans PR diffs semantically. Not pattern rules - actual reasoning about the code. Broken access control, business-logic flaws, insecure deserialization, auth bypass through weird state machines. The stuff Semgrep misses.
Default model is Opus 4.1. Runtime cap 20 minutes. Nine config inputs.
COST
Bills tokens to your API key. There is NO flat per-PR fee (that's the separate Claude Code Review service at $15-$25 per PR, Team plan only).
Real math on a 500-line diff:
- Opus: $0.90 to $1.80 per scan
- Sonnet 4.6: $0.20 to $0.40
- Haiku: cheaper still, but misses subtle flaws
On a repo I run with ~30 PRs/week on Sonnet 4.6, the line item is $25-$35/month. On Opus it's $120-$160.
FALSE POSITIVES
Every guide mentions the `false-positive-filtering-instructions` input. Nobody shows what the file actually contains.
It's a Markdown file where you describe org-specific patterns the scanner should skip. Admin routes bound to localhost. Test fixtures with replay credentials. Debug paths gated by prod flags.
On a Rust project I maintain, adding this file cut false positives from 3-4 per PR to roughly one per week. HN poster Lynch reported <20% FP rate with Opus 4.6 on kernel work - my numbers on app code track.
THE PROMPT INJECTION GAP
The action is EXPLICITLY not hardened against prompt injection. A malicious fork PR can embed instructions in code comments that manipulate the reviewer.
Mitigations I actually apply:
- Settings > Actions > "Require approval for all external contributors"
- Minimum permissions: pull-requests: write + contents: read, nothing else
- Dedicated API key with monthly cap
DOES IT REPLACE SEMGREP?
No. Complement.
https://t.co/hfV7eGQFSy tested four tools: Snyk Code alone caught 11.2% of planted vulns. All four combined only hit 38.8%. Layered scanning beats any single tool.
My pipeline: Semgrep as the blocking gate (fast, deterministic), Dependabot/Snyk for CVEs, Claude Code Security Review advisory at the end for semantic flaws.
THE FULL GUIDE
Covers all 9 inputs, the exact false-positive Markdown example, cost math, layered pipeline YAML, prompt-injection mitigations, and a 5-case troubleshooting section (timeouts, duplicate comments, missing findings on large PRs).
https://t.co/OVq7GgOrH9
Follow @avi_sangle for more Claude Code deep-dives.
Anthropic shipped a CLI for managing Claude agents and nobody's talking about it.
The ant CLI launched April 8 alongside Managed Agents. It's a Go binary that lets you create, configure, and run cloud-hosted agents from your terminal. No wrapper code needed.
I wrote the first dedicated tutorial for it. Here's the short version:
WHAT IT IS
Think kubectl for Claude agents. Resource-based commands, YAML input, GJSON transforms, auto-pagination. If you've used any modern infra CLI, you already know the patterns.
ant [resource] <command> [flags...]
It's NOT Claude Code. Claude Code is your interactive coding assistant (subscription). The ant CLI is a programmatic API client for hosted agents (API key). I use both daily.
INSTALL (macOS)
brew install anthropics/tap/ant
xattr -d https://t.co/jc8HWrUykV.quarantine "$(brew --prefix)/bin/ant"
That quarantine step trips everyone up. macOS flags the binary and blocks it without this fix.
THE KILLER FEATURE: YAML VERSION CONTROL
Define agents as YAML files. Check them into Git. Deploy through CI.
ant beta:agents create < reviewer.agent.yaml
Updates require passing the current version number - optimistic concurrency control, same pattern as Kubernetes.
Nobody has written about this workflow yet. It's the CLI's best feature for teams.
PRICING
- $0.08/session-hour (billed to the millisecond)
- Standard Claude token rates on top
- Idle time is free
- ~$0.70 for a 1-hour Opus coding session
CLI VS CURL VS SDK
- curl: manual JSON bodies, manual pagination, pipe to jq
- ant CLI: typed flags, auto-pagination, --transform for filtering
- SDK: typed objects, full language integration
CLI wins for shell scripts and CI/CD. SDK wins for app integration.
SCRIPTING EXAMPLE
AGENT_ID=$(ant beta:agents create \
< agents/reviewer.agent.yaml \
--transform id --format raw)
That --transform flag with GJSON syntax is much cleaner than piping to jq for simple extractions.
Full tutorial with 15 code examples, GitHub Actions deployment, and tool configuration:
https://t.co/3e1EgFS6M9
Follow @avi_sangle for more Claude Code and managed agents content.
My Claude Code spending hit $12/day before I started tracking it. Now I'm at $5-6/day with zero loss in output quality.
The tracking commands and tools are there. Most people just don't know about them.
Here's everything I learned:
THE BUILT-IN COMMANDS
- /cost shows session spend for API users
- /stats shows usage dashboard for subscribers
- /usage shows rate limit status
- Status line config puts real-time cost in your terminal
Each one targets a different plan type. API users get dollar amounts. Subscribers get usage percentages. Both groups need to know these exist.
THE HIDDEN DATA
Claude Code writes every session to ~/.claude/projects/ as JSONL files. Full token counts, model used, timestamps. Sitting on your machine, most people never look at it.
THIRD-PARTY TOOLS WORTH INSTALLING
- ccusage (4,800+ GitHub stars) - daily/monthly/per-session reports
- claude-usage - local web dashboard
- Claude-Code-Usage-Monitor - real-time alerts
- ccost - per-request granularity
Run `npx ccusage` once and you'll see your spending patterns immediately.
THE 7 CHANGES THAT CUT MY COSTS 50%
1. Default to Sonnet ($3/$15). Only use Opus ($15/$75) when you actually need it
2. Cap thinking tokens: export MAX_THINKING_TOKENS=10000
3. Run /clear between unrelated tasks
4. Run /compact when context grows
5. Write specific prompts with file paths, not vague asks
6. Use plan mode (Shift+Tab twice) before big refactors
7. Break work into scoped sessions
THE API VS SUBSCRIPTION BREAKEVEN
- Under $50/month usage -> API pay-per-token is cheapest
- $100+ month -> Max 5x plan ($100/mo)
- $200+ month -> Max 20x plan ($200/mo)
Full breakdown with code examples, ccusage commands, and the JSONL file format:
https://t.co/6tLydc73F3
Follow @avi_sangle for more Claude Code deep-dives.
@Finstor85 Exactly. They’re shipping updates so fast that by the time you’ve figured out how to integrate the latest feature into your workflow, the next one is already dropping.
Early adopters include Notion, Rakuten, and Asana - all using it for long-running enterprise workflows.
I wrote a full comparison with code examples, pricing math, and a decision flowchart:
https://t.co/wJ8FsOXKOf
Follow @avi_sangle for more Claude Code deep-dives.
Anthropic just dropped Claude Managed Agents in beta.
It runs long-horizon agents in their infra - sandboxed, persistent, with MCP support.
But should you use it or stick with the Agent SDK?
Here's the breakdown after digging through the docs and API:
🧵
Pick Managed Agents when:
- Multi-hour production workloads
- You need sandboxed code execution
- Web browsing + MCP integrations
- You don't want to manage agent infra
Pick Agent SDK when:
- Local file access needed
- Private network access
- Custom tool execution
- You want full runtime control