Omkar

Verified account

@logLatency

Software Developer

India

Joined February 2024

76 Following

118 Followers

1.2K Posts

Pinned Tweet

5 days ago

Every tool available to an LLM is provided as part of the system prompt. Each tool definition contains: 1. Tool name 2. What it does 3. Input arg JSON schema 4. Description of each arg 5. Output format JSON schema This repeats for every tool and can explode for 10s and 20s of tools. In an agentic AI system, tools can explode to even 100s (alone mcp-atlassian exposes ~70 tools across JIRA and Confluence) More the tools available, better options for model to work with, which brings better reasoning but **at the cost of invaluable context window** Found out a smart solution in Claude Agent SDK. It exposes a tool called ToolSearch https://t.co/rGILA2oUQw. Yes, a Tool of Tools. Instead of feeding all the tools upfront, just feed the basic ones which are frequently used along with the ToolSearch and let the model reason the available tools and load them dynamically.

3

3

0

0

75

about 8 hours ago

@nikitabier @TechCrunch @Meta Would you join Meta if Zuck offers a hell lot than Musk? @nikitabier

0

0

0

0

43

about 14 hours ago

@arpit_bhayani Youtube does need who has watched this video. Reverse is required. Probably users liked Insta post would be the correct use case. Btw, would like to contribute in your video content research.

0

0

0

0

106

3 days ago

Cost optimization has been everyone's top priority for every agentic system they are building. The primary way to look at it: how can you reduce the iterations your agent performs? With each iteration, your token consumption grows polynomial. (Little thanks to prompt caching) You need to analyze what tools your agent is calling, figure out the patterns, and optimize for them. ============= The logs analysis agent I am running showed some interesting insights. A lot of the time, it was invoking toolA which fetches the trace_id from the URL of the service whose failure the agent was debugging, using the ingress-* index. After getting the trace_id, it would then query the logs-* index to fetch service logs, stack traces, etc., via toolB. The pattern is that toolA was almost always followed by toolB. 2 iterations, every time. Hence I exposed a new tool called get_service_logs_by_url which accepts the query and directly returns the service logs. Internally it queries both the indexes and returns the result in one shot. Since my agent already carries significant data from logs, collapsing 2 iterations into 1 reduced token consumption by a good margin. It's not just about using agents, the data analytics on top of them is equally important.

logLatency's tweet photo. Cost optimization has been everyone's top priority for every agentic system they are building.
The primary way to look at it: how can you reduce the iterations your agent performs? With each iteration, your token consumption grows polynomial. (Little thanks to prompt caching)

You need to analyze what tools your agent is calling, figure out the patterns, and optimize for them.

=============
The logs analysis agent I am running showed some interesting insights.

A lot of the time, it was invoking toolA which fetches the trace_id from the URL of the service whose failure the agent was debugging, using the ingress-* index.
After getting the trace_id, it would then query the logs-* index to fetch service logs, stack traces, etc., via toolB.

The pattern is that toolA was almost always followed by toolB. 2 iterations, every time.

Hence I exposed a new tool called get_service_logs_by_url which accepts the query and directly returns the service logs. Internally it queries both the indexes and returns the result in one shot.

Since my agent already carries significant data from logs, collapsing 2 iterations into 1 reduced token consumption by a good margin.

It's not just about using agents, the data analytics on top of them is equally important.

0

1

0

0

23

3 days ago

Claude Tags needs another model right in between Haiku and Sonnet. Sonnet - Too heavy to invoke on simple messages, many times. Haiku - too dumb to understand the actual intent.

Andrej Karpathy

4 days ago

This is a new paradigm for interacting with Claude that is significantly more "inline" with all the other human activity org-wide. Once you do all of the under the hood engineering work to make this "just work" (e.g. across tools, integrations, compute environments, memory, security, etc.), Claude basically joins the team in a seamless way - you can talk to it as you would talk to a person and it can help with a very large variety of workloads. Imo this is the 3rd major redesign of LLM UIUX. The first paradigm was that the LLM is a website you go to, the second was that it is an app you download to your computer. This third one is that it is a self-contained, persistent, asynchronous entity with org-wide tools and context, working alongside teams of humans. It really takes a while to wrap your head around it, but it works and it is awesome.

1K

22K

2K

13K

7M

0

1

0

0

42

4 days ago

Just start a new session when performance degrades and you don't need worry about maintaining context across sessions. Here's why: You implement a project module by module. Login/Signup, landing page, controllers, middleware, ORM, If you continue with the same session, you are literally taking almost whole project's context in every iteration. Which is not always required. While writing a middleware, it doesn't needs to know about login module at all but just the exposed API. Coding agent are quite capable enough to gather complex project's context. With the simple tools like Glob, Grep, they somehow reads only what is required in new session.

@AvinashSingh_20

6 days ago

How do you maintain context across long coding sessions?

1

7

0

1

2K

1

3

0

0

734

4 days ago

@bhavintu In China and India it’s night — atleast 50% of the world is asleep.

0

0

0

0

58

4 days ago

@arpit_bhayani The layered exception wrapping — isnt this something we cant avoid? Every layer expect different handling for the same error. Whats the elegant solution?

0

0

0

0

14

5 days ago

Everything literally everything will fails if LLM decides to not output tool calls in correct json format

0

1

0

0

16

5 days ago

Wait, Claude Agent SDK is exactly Claude Code....

5 days ago

Claude Agent SDK literally asking me to create one more buggy file.... while I already have a pile....

logLatency's tweet photo. Claude Agent SDK literally asking me to create one more buggy file.... while I already have a pile.... https://t.co/6efNzs03Mv

0

1

0

0

205

0

1

0

0

141

5 days ago

Claude Agent SDK literally asking me to create one more buggy file.... while I already have a pile....

logLatency's tweet photo. Claude Agent SDK literally asking me to create one more buggy file.... while I already have a pile.... https://t.co/6efNzs03Mv

0

1

0

0

205

5 days ago

@MehulFanawala Mythos sitting idle. No traffic either from people or from gov

1

1

0

0

15

5 days ago

But I was told Meta has frozen the hiring. 🥲🥲

0

1

0

0

13

5 days ago

@Cartidise Thats a snackbar. No?

1

1

0

0

94

5 days ago

OpenAI just needs a better marketing team. They are equally capable.

6 days ago

GPT-5.5-Cyber is our most capable cyber model yet, designed for advanced, authorized defensive work: tracing vulnerable code, validating issues, developing patches, and preparing evidence for human review.

OpenAI's tweet photo. GPT-5.5-Cyber is our most capable cyber model yet, designed for advanced, authorized defensive work: tracing vulnerable code, validating issues, developing patches, and preparing evidence for human review. https://t.co/KcDoGGD2tx

61

2K

187

372

935K

0

1

0

0

21

6 days ago

Claude code-like tools are the best place to execute dark patterns. The simplest task took almost 4 minutes. Looks like Anthropic is throttling the requests in between.

logLatency's tweet photo. Claude code-like tools are the best place to execute dark patterns.

The simplest task took almost 4 minutes. Looks like Anthropic is throttling the requests in between. https://t.co/evQvrN1gaM

0

1

0

0

30

6 days ago

Sometimes things are complex to explain to Claude than doing it myself. Neuralink and Claude integration would be game-changer.

1

1

0

0

22

7 days ago

I believe the trigger would be OpenAI and Anthropic IPO

7 days ago

The biggest IPO warning in 100 years. This pattern has NEVER failed. 1926 → Goldman Sachs 1972 → Intel 1999 → AT&T Wireless 2026 → SpaceX Each mega IPO arrived exactly on the peak of a major market bubble. Then the S&P 500 crashed. → 1926 crash: 86% → 1972 crash: 53% → 1999 crash: 51% → 2026 crash: **% And SpaceX is the final liquidity event before the AI bubble breaks.

228

3K

659

2K

2M

0

1

0

0

18

7 days ago

Think HTML-to-Markdown would be more efficient than this

@akshay_pachaar

8 days ago

Web scraping will never be the same. (100% open-source visual search at scale) PixelRAG is a retrieval system that skips HTML parsing completely. Instead of scraping a page into text and embedding chunks, it screenshots the page and retrieves the image. A vision-language model reads the answer straight off the pixels. Why that matters: parsing is where web RAG quietly loses information. - A single HTML-to-text parser can drop 40%+ of a page. - Tables, charts, and layout get flattened or thrown out. - Swapping parsers alone can move accuracy ~10 points on the same docs. PixelRAG indexes the page a person actually sees. The team built a visual index of all of Wikipedia, 30M+ screenshots, and it still beats the strongest text RAG baseline by 18.1% on text-only QA. The repo also ships a Claude Code plugin that gives Claude eyes. It lets Claude screenshot any URL and read the rendered page instead of scraping the DOM. So you can hand it a live page, an arXiv paper, or your local site and ask what it actually looks like. One setup script. No MCP server, no backend. How the pipeline works: - Renders each document (web, PDF, image) to image tiles. - Embeds them with Qwen3-VL-Embedding, LoRA fine-tuned on screenshots. - Builds a FAISS index and serves a search API. A stronger reader model lifts accuracy with no re-indexing, since the index is just pixels. Everything is open-source under Apache-2.0. GitHub repo: https://t.co/qun9TjAdmw Talking about RAG, I recently wrote an article on a new approach that makes retrieval much more efficient by cutting corpus size by 40x, reducing tokens per query by 3x, and improving vector search relevance by 2.3x. The article is quoted below.

130

7K

836

12K

922K

0

1

0

0

16

7 days ago

How helpful such courses are? 35T looks too much no matter what and how the content is. Or the world has changed and people are actually buying this?

logLatency's tweet photo. How helpful such courses are? 35T looks too much no matter what and how the content is.

Or the world has changed and people are actually buying this? https://t.co/WrBtD3Yhps

0

1

0

0

35

Last Seen Users on Sotwe

Trends for you

Most Popular Users