Every tool available to an LLM is provided as part of the system prompt.
Each tool definition contains:
1. Tool name
2. What it does
3. Input arg JSON schema
4. Description of each arg
5. Output format JSON schema
This repeats for every tool and can explode for 10s and 20s of tools. In an agentic AI system, tools can explode to even 100s (alone mcp-atlassian exposes ~70 tools across JIRA and Confluence)
More the tools available, better options for model to work with, which brings better reasoning but **at the cost of invaluable context window**
Found out a smart solution in Claude Agent SDK.
It exposes a tool called ToolSearch https://t.co/rGILA2oUQw. Yes, a Tool of Tools.
Instead of feeding all the tools upfront, just feed the basic ones which are frequently used along with the ToolSearch and let the model reason the available tools and load them dynamically.
@arpit_bhayani Youtube does need who has watched this video. Reverse is required.
Probably users liked Insta post would be the correct use case.
Btw, would like to contribute in your video content research.
Cost optimization has been everyone's top priority for every agentic system they are building.
The primary way to look at it: how can you reduce the iterations your agent performs? With each iteration, your token consumption grows polynomial. (Little thanks to prompt caching)
You need to analyze what tools your agent is calling, figure out the patterns, and optimize for them.
=============
The logs analysis agent I am running showed some interesting insights.
A lot of the time, it was invoking toolA which fetches the trace_id from the URL of the service whose failure the agent was debugging, using the ingress-* index.
After getting the trace_id, it would then query the logs-* index to fetch service logs, stack traces, etc., via toolB.
The pattern is that toolA was almost always followed by toolB. 2 iterations, every time.
Hence I exposed a new tool called get_service_logs_by_url which accepts the query and directly returns the service logs. Internally it queries both the indexes and returns the result in one shot.
Since my agent already carries significant data from logs, collapsing 2 iterations into 1 reduced token consumption by a good margin.
It's not just about using agents, the data analytics on top of them is equally important.
Claude Tags needs another model right in between Haiku and Sonnet.
Sonnet - Too heavy to invoke on simple messages, many times.
Haiku - too dumb to understand the actual intent.
This is a new paradigm for interacting with Claude that is significantly more "inline" with all the other human activity org-wide. Once you do all of the under the hood engineering work to make this "just work" (e.g. across tools, integrations, compute environments, memory, security, etc.), Claude basically joins the team in a seamless way - you can talk to it as you would talk to a person and it can help with a very large variety of workloads.
Imo this is the 3rd major redesign of LLM UIUX. The first paradigm was that the LLM is a website you go to, the second was that it is an app you download to your computer. This third one is that it is a self-contained, persistent, asynchronous entity with org-wide tools and context, working alongside teams of humans. It really takes a while to wrap your head around it, but it works and it is awesome.
Just start a new session when performance degrades and you don't need worry about maintaining context across sessions.
Here's why:
You implement a project module by module. Login/Signup, landing page, controllers, middleware, ORM,
If you continue with the same session, you are literally taking almost whole project's context in every iteration. Which is not always required.
While writing a middleware, it doesn't needs to know about login module at all but just the exposed API.
Coding agent are quite capable enough to gather complex project's context. With the simple tools like Glob, Grep, they somehow reads only what is required in new session.
@arpit_bhayani The layered exception wrapping β isnt this something we cant avoid? Every layer expect different handling for the same error.
Whats the elegant solution?
GPT-5.5-Cyber is our most capable cyber model yet, designed for advanced, authorized defensive work: tracing vulnerable code, validating issues, developing patches, and preparing evidence for human review.
Claude code-like tools are the best place to execute dark patterns.
The simplest task took almost 4 minutes. Looks like Anthropic is throttling the requests in between.
The biggest IPO warning in 100 years.
This pattern has NEVER failed.
1926 β Goldman Sachs
1972 β Intel
1999 β AT&T Wireless
2026 β SpaceX
Each mega IPO arrived exactly on the peak of a major market bubble.
Then the S&P 500 crashed.
β 1926 crash: 86%
β 1972 crash: 53%
β 1999 crash: 51%
β 2026 crash: **%
And SpaceX is the final liquidity event before the AI bubble breaks.
Web scraping will never be the same.
(100% open-source visual search at scale)
PixelRAG is a retrieval system that skips HTML parsing completely.
Instead of scraping a page into text and embedding chunks, it screenshots the page and retrieves the image. A vision-language model reads the answer straight off the pixels.
Why that matters: parsing is where web RAG quietly loses information.
- A single HTML-to-text parser can drop 40%+ of a page.
- Tables, charts, and layout get flattened or thrown out.
- Swapping parsers alone can move accuracy ~10 points on the same docs.
PixelRAG indexes the page a person actually sees. The team built a visual index of all of Wikipedia, 30M+ screenshots, and it still beats the strongest text RAG baseline by 18.1% on text-only QA.
The repo also ships a Claude Code plugin that gives Claude eyes.
It lets Claude screenshot any URL and read the rendered page instead of scraping the DOM. So you can hand it a live page, an arXiv paper, or your local site and ask what it actually looks like.
One setup script. No MCP server, no backend.
How the pipeline works:
- Renders each document (web, PDF, image) to image tiles.
- Embeds them with Qwen3-VL-Embedding, LoRA fine-tuned on screenshots.
- Builds a FAISS index and serves a search API.
A stronger reader model lifts accuracy with no re-indexing, since the index is just pixels.
Everything is open-source under Apache-2.0.
GitHub repo: https://t.co/qun9TjAdmw
Talking about RAG, I recently wrote an article on a new approach that makes retrieval much more efficient by cutting corpus size by 40x, reducing tokens per query by 3x, and improving vector search relevance by 2.3x.
The article is quoted below.