Tushar Sonawane

@Tushkiz

Spreading tech insights and good vibes. Building epic stuff at @deel. Ex-crew @SlackHQ @getmaximai

India

Joined April 2009

1.7K Following

409 Followers

2.4K Posts

Tushar Sonawane

@Tushkiz

5 days ago

Running MCP without a gateway is the microservices problem, rediscovered. Auth at every server. Logs scattered. Policy hard-coded per integration. 2 servers, manageable. 8, maintenance problem. 20, incident waiting to happen.

Tushkiz's tweet photo. Running MCP without a gateway is the microservices problem, rediscovered.

Auth at every server. Logs scattered. Policy hard-coded per integration.

2 servers, manageable. 8, maintenance problem. 20, incident waiting to happen. https://t.co/uJqhtrn4Ba

Tushar Sonawane

@Tushkiz

8 days ago

Agent loops don't just need a success condition. They need a way to stop safely. Usually, the example shows the clean path: The agent runs. The task passes. The loop ends. Useful for a demo but dangerous as the whole design. A real loop also has to answer: - What if the agent keeps retrying the same bad plan? - What if it spends 14 steps on a task that should take 3? - What if it reaches a decision that needs a human? I would design three exits before shipping the loop: 1. Success - The goal is met and verified. 2. Timeout - The run crosses a step, cost or time budget. 3. Escalation - The agent cannot resolve the next move, so it hands off context to a human. A loop with only a success path is not finished. Design for all three exits.

Tushkiz's tweet photo. Agent loops don't just need a success condition.
They need a way to stop safely.

Usually, the example shows the clean path:

The agent runs.
The task passes.
The loop ends.

Useful for a demo but dangerous as the whole design.

A real loop also has to answer:

- What if the agent keeps retrying the same bad plan?
- What if it spends 14 steps on a task that should take 3?
- What if it reaches a decision that needs a human?

I would design three exits before shipping the loop:

1. Success - The goal is met and verified.

2. Timeout - The run crosses a step, cost or time budget.

3. Escalation - The agent cannot resolve the next move, so it hands off context to a human.

A loop with only a success path is not finished.

Design for all three exits.

Tushar Sonawane

@Tushkiz

13 days ago

The agent did exactly what you asked. That was the problem. You think you gave an instruction but the agent treats it like a spec. And if the spec has gaps, it fills them with whatever seems reasonable. I use this 5-line spec before giving coding agents anything non-trivial: 1. Goal: what should change? 2. Scope: what can it touch? 3. Non-goals: what must stay alone? 4. Done when: what proves it worked? 5. Verify with: test, command or manual check. Bad: "Fix invoice retries." Better: ``` In POST /invoices, fix duplicate invoices on retries. Only touch the invoice handler and test. Do not change payments or schema. Stop when the same idempotency key returns the existing invoice. ``` Let the agent vibe inside a tighter spec.

Tushkiz's tweet photo. The agent did exactly what you asked.

That was the problem.

You think you gave an instruction but the agent treats it like a spec.
And if the spec has gaps, it fills them with whatever seems reasonable.

I use this 5-line spec before giving coding agents anything non-trivial:

1. Goal: what should change?
2. Scope: what can it touch?
3. Non-goals: what must stay alone?
4. Done when: what proves it worked?
5. Verify with: test, command or manual check.

Bad:
"Fix invoice retries."

Better:
```
In POST /invoices, fix duplicate invoices on retries.
Only touch the invoice handler and test.
Do not change payments or schema.
Stop when the same idempotency key returns the existing invoice.
```

Let the agent vibe inside a tighter spec.

Tushar Sonawane

@Tushkiz

14 days ago

@CatGodSandHive that is tricky to detect coz the agent receiving the handoff has no ground truth to compare against

Who to follow

Shreya Agarwal

@rangernotch

🎨 Senior Product Designer by day, pixel painter by night. 🎤 I sing, paint, and stay curious while connecting with new souls. 💬

Developer, Design, Cybersecurity & Investing. Engineer @Cloudflare. Previously @Feta_io @Dyte_io

Tushar Sonawane

@Tushkiz

14 days ago

A single agent might beat your multi-agent setup. In one study: 1 agent hit 90.7% accuracy and a 5-agent chain dropped to 22.5%. The cost shows up in the handoffs. Each agent receives a compressed version of the task: constraints softened, edge cases dropped, prior decisions summarized. After 4 handoffs, I would stop treating another agent as extra coordination work. Before adding the next agent, ask: What does this handoff preserve? What does it drop? Can one agent with the right tools do this instead?

Tushkiz's tweet photo. A single agent might beat your multi-agent setup.

In one study: 1 agent hit 90.7% accuracy and a 5-agent chain dropped to 22.5%.

The cost shows up in the handoffs.

Each agent receives a compressed version of the task: constraints softened, edge cases dropped, prior decisions summarized.

After 4 handoffs, I would stop treating another agent as extra coordination work.

Before adding the next agent, ask:

What does this handoff preserve?
What does it drop?
Can one agent with the right tools do this instead?

Tushar Sonawane

@Tushkiz

14 days ago

A lot of loop engineering advice assumes unlimited tokens. Design the loop, add the tools, let the agent observe, retry and continue. Useful but real engineering judgement is knowing when to stop the loop. The almost-right patch. The eval score that improves while the misses still matter. The retry that costs more than taking the keyboard back. Loop engineering keeps the system moving. Judgement is knowing when the next loop is no longer worth the cost.

Tushkiz's tweet photo. A lot of loop engineering advice assumes unlimited tokens.

Design the loop, add the tools, let the agent observe, retry and continue.

Useful but real engineering judgement is knowing when to stop the loop.

The almost-right patch.
The eval score that improves while the misses still matter.
The retry that costs more than taking the keyboard back.

Loop engineering keeps the system moving.

Judgement is knowing when the next loop is no longer worth the cost.

Tushar Sonawane

@Tushkiz

16 days ago

Agent systems rarely break in one clean place. They break where context, tools and other agents exchange state. Three boundaries show up again and again: 1. Context: quality can degrade around 60-70% fill 2. Tool: valid call, wrong meaning 3. Handoff: compressed context, bad assumptions Instead, instrument: - Context fill alerts - Versioned tool contracts - Handoff schemas Instrument the boundaries, not just the agent.

Tushkiz's tweet photo. Agent systems rarely break in one clean place.

They break where context, tools and other agents exchange state.

Three boundaries show up again and again:

1. Context: quality can degrade around 60-70% fill
2. Tool: valid call, wrong meaning
3. Handoff: compressed context, bad assumptions

Instead, instrument:
- Context fill alerts
- Versioned tool contracts
- Handoff schemas

Instrument the boundaries, not just the agent.

Tushkiz retweeted

Alex Bouaziz

@Bouazizalex

about 2 months ago

For years, we hit the same wall everyone else did. You automate a few steps. Then complexity wins. And your best people are back to doing work that should run itself. That's what we set out to fix - not for our customers first, but for ourselves. I'm proud to introduce Akai by Deel. Akai is an interconnected system of agents that learns your workflows, automates every step, and gets smarter with every run - auditably, on any system. Today, 100% of Deel's operations teams, across Finance, Tax, Treasury, Benefits, and HR, run on Akai. The results stopped us in our tracks: → 100,000+ cases handled automatically every month → 8,000+ hours of payment processing - now background tasks → Reconciliation that took 20+ days - now done in minutes → 100% automation in areas we couldn't reach after years of coding → 91,000+ manual hours saved every single month Now, every team is now building agents themselves. No developers. No IT tickets. Just the people who know the work, running it. That's the ceiling we broke. Now we're opening it up. With built-in voice, transactions across any payment method, and the ability to process and analyse any document or dataset - out of the box. Your first agents could be running by tomorrow! Get early access at https://t.co/mMBMmX4fMT

352

191

112K

Tushar Sonawane

@Tushkiz

3 months ago

@SwapnilBhoite28 Yeah makes sense coz Gemma 4 is more of an on‑device reasoning/agentic model, while Qwen Coder is actually specialized for coding.

Tushar Sonawane

@Tushkiz

3 months ago

Wanted to try Gemma 4 E2B so I built a quick Wispr Flow clone. Speak messy, get clean text back. All on device. Tested: "groceries milk eggs butter chocolates actually remove chocolates." It dropped the chocolates. Did not ask it to. ~40-50s end to end. Slow but neat.

Tushar Sonawane

@Tushkiz

3 months ago

repo: https://t.co/N6sZBt8oj9

Tushar Sonawane

@Tushkiz

3 months ago

https://t.co/LhkPzcAxFG

Tushar Sonawane

@Tushkiz

3 months ago

Full job details here https://t.co/B8EIXTQEHR

Tushar Sonawane

@Tushkiz

3 months ago

What is the hardest AI-into-production problem you have ever solved? Asking because I am hiring for a Senior Backend Engineer (AI focus) at @deel and that is exactly the kind of engineer I want to find. Drop your best work below. I'll read every reply 👇

125

Tushkiz retweeted

Akshay Deo

@akshay_deo

11 months ago

🚀 Maxim’s Bifrost is live on Product Hunt 🚀 We're excited to share that Bifrost, the fastest and open-source LLM gateway, is live on Product Hunt. https://t.co/Z8YM0EvejM

888

Tushar Sonawane

@Tushkiz

11 months ago

@avikm744 @akshay_deo Yes, we are! 🚀 Your portfolio looks great, do check out the open positions and apply if anything fits: https://t.co/1IJiYgDu6k

Tushkiz retweeted

vg @vaibhavi0601

about 1 year ago

Today, we are thrilled to announce a strategic partnership between @getmaximai and @Google Cloud's Vertex AI, a collaboration to enable developers with a comprehensive and robust solution to evaluate and observe complex agentic AI applications. The journey to building truly reliable and effective AI requires a powerful infrastructure stack, and this partnership marks a significant step in that direction. By embedding Vertex AI's Gen AI evaluation service within Maxim's end-to-end AI engineering platform, we are enabling teams to build the next generation of applications, ensuring they are not just intelligent, but also robust, safe, and ready for the real world. All in on AI teams accelerating development while keeping quality at the heart of their products 🚀⚡ You can read the official announcement here: https://t.co/9wH83pUSoe

vaibhavi0601's tweet photo. Today, we are thrilled to announce a strategic partnership between @getmaximai and @Google Cloud's Vertex AI, a collaboration to enable developers with a comprehensive and robust solution to evaluate and observe complex agentic AI applications.

The journey to building truly reliable and effective AI requires a powerful infrastructure stack, and this partnership marks a significant step in that direction. By embedding Vertex AI's Gen AI evaluation service within Maxim's end-to-end AI engineering platform, we are enabling teams to build the next generation of applications, ensuring they are not just intelligent, but also robust, safe, and ready for the real world.

All in on AI teams accelerating development while keeping quality at the heart of their products 🚀⚡

You can read the official announcement here: https://t.co/9wH83pUSoe

Tushkiz retweeted

Maxim AI

@getmaximai

about 1 year ago

🚀 AI Evals: Your Key to Building Trustworthy AI Agents! 🚀 AI agents are everywhere, from support automation to travel booking assistants. But here’s the catch: building them is easy, making them work reliably in the real world is hard. At Maxim AI, we believe evals are the backbone of high-quality AI products. We’ve just released a detailed guide to help you master agent evaluations. What’s inside? 👇 ✅ Evaluate agents – combining human and auto-evals, node-level to session-level, and balancing quality with efficiency. ✅ Test agents in the right context – using realistic, task-specific, and user-representative scenarios. ✅ Build a continuous evaluation loop – turning testing from a checklist into an ongoing feedback system. ✅ Use online and offline evals as a product accelerant – helping teams ship faster without sacrificing product taste. Whether you’re building an LLM-based support automation or a complex multi-agent system, evals are your secret weapon to ship quality, fast. Don’t build blindly. Evaluate, iterate, and win user love. 👉 Grab your copy here: https://t.co/wRjzqfoqzB Let’s make better AI, together. #AI #AIAgents #AgentEvaluation #MaximAI #AgentQuality #AIEvals

getmaximai's tweet photo. 🚀 AI Evals: Your Key to Building Trustworthy AI Agents! 🚀

AI agents are everywhere, from support automation to travel booking assistants. But here’s the catch: building them is easy, making them work reliably in the real world is hard.

At Maxim AI, we believe evals are the backbone of high-quality AI products. We’ve just released a detailed guide to help you master agent evaluations.

What’s inside? 👇

✅ Evaluate agents – combining human and auto-evals, node-level to session-level, and balancing quality with efficiency.
✅ Test agents in the right context – using realistic, task-specific, and user-representative scenarios.
✅ Build a continuous evaluation loop – turning testing from a checklist into an ongoing feedback system.
✅ Use online and offline evals as a product accelerant – helping teams ship faster without sacrificing product taste.

Whether you’re building an LLM-based support automation or a complex multi-agent system, evals are your secret weapon to ship quality, fast. Don’t build blindly. Evaluate, iterate, and win user love.

👉 Grab your copy here: https://t.co/wRjzqfoqzB
Let’s make better AI, together.

#AI #AIAgents #AgentEvaluation #MaximAI #AgentQuality #AIEvals

331

Tushar Sonawane

@Tushkiz

about 1 year ago

⚡Meet Bifrost: the open-source, drop-in LLM proxy that’s 40x faster than LiteLLM Built for speed, scale, and observability: ✅ Unified API for all LLMs ✅ Native Prometheus metrics ✅ Plugin-first middleware Checkout - https://t.co/L4W96MXLfV #opensource #LLM #AIinfra #GoLang #DevTools

156

Tushkiz retweeted

Maxim AI

@getmaximai

about 1 year ago

This comprehensive guide will walk you through creating an intelligent event-discovery agent (an agent that discovers public events happening in the US) using @n8n_io (an open-source workflow automation platform) and rigorously testing it with @getmaximai (an agent testing platform). What we'll Build? We’re going to create an AI agent that: - Fetches public event information from a Google Sheets database - Responds to user queries about events in the USA - Maintains conversation history for seamless multi-turn interactions - Provides detailed event information based on user preferences Sounds interesting?, you can follow this tutorial blog to learn more - https://t.co/XP89giM6jS

getmaximai's tweet photo. This comprehensive guide will walk you through creating an intelligent event-discovery agent (an agent that discovers public events happening in the US) using @n8n_io (an open-source workflow automation platform) and rigorously testing it with @getmaximai (an agent testing platform).

What we'll Build?

We’re going to create an AI agent that:

- Fetches public event information from a Google Sheets database
- Responds to user queries about events in the USA
- Maintains conversation history for seamless multi-turn interactions
- Provides detailed event information based on user preferences

Sounds interesting?, you can follow this tutorial blog to learn more - https://t.co/XP89giM6jS

218

Tushar Sonawane

@Tushkiz

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users