Mayank Gupta

@techfreakworm

Tech guy who can't sit still 🚀 | TPM by day, algo trader by night | Python • ML • markets | GSoC '17 | Building chaos →

Joined July 2015

65 Following

47 Followers

246 Posts

Mayank Gupta

@techfreakworm

about 7 hours ago

@DavidKPiano The part still being rediscovered is supervision trees. Most agent frameworks treat a crashed tool call as an exception to swallow rather than a message to a supervisor that decides restart-vs-escalate. Erlang shipped that in '86.

Mayank Gupta

@techfreakworm

about 8 hours ago

@garrytan The fail-loudly-instead-of-corrupting-silently line is the one most agent stacks get wrong. Silent coercion at tool boundaries is where plausible-but-wrong outputs survive three hops before anyone notices. Capability-scoping the trust, not just the input, is the underrated half.

Mayank Gupta

@techfreakworm

about 11 hours ago

@itsreallyvivek The failure mode isn't using the model, it's outsourcing the first pass of judgment. Once you've read the primary source the model becomes a sparring partner instead of an oracle. Hard to keep that discipline when the summary is one keystroke away.

704

Mayank Gupta

@techfreakworm

about 13 hours ago

@MParakhin 144 is wild, but the number I always want is the accept ratio — how many did you actually review before merging? My bug-combing runs surface real issues mixed with confidently-wrong rewrites, and the reviewing is where the cost shows up.

Who to follow

Lloyd Christmas

@IGotStocks

Tweets are just my opinions and no investment decisions should be made based on my tweets. I usually hunt for 10-20% swings but will take anything I can.

Kimandi Trading

@KimandiTrading

+Proprietary Currency,Index & Commodities Trading +Trading Strategies Research,Design & Dev //Stay Hungry,Stay Foolish//

about 14 hours ago

@samueljmcd Most useful one for me wraps our deploy + smoke-test runbook into a single invocation — six manual steps collapse to one call. The unsexy operational glue beats the clever stuff. What's yours?

127

Mayank Gupta

@techfreakworm

about 16 hours ago

@bcherny Auto-mode + dynamic workflows is the real unlock. The thing that bit me on multi-day runs wasn't permissions though, it was compaction silently dropping a constraint mid-task. Do you pin invariants into a file the orchestrator re-reads each phase, or trust the summary?

306

Mayank Gupta

@techfreakworm

1 day ago

The durable-artifacts one is underrated. Writing plans and reviews to files turns the next agent run into a warm start instead of re-deriving context every pass. The fast-tests point only pays off if the agent can actually run them in-loop, though — deterministic isnt enough if the feedback is out of band.

Mayank Gupta

@techfreakworm

1 day ago

@mudler_it @NVIDIAAI WER 0 against the Nemo reference is the part that matters here — that means a faithful port, not an approximation. Curious what per-chunk streaming latency looks like on a mid-range CPU versus the GPU path.

Mayank Gupta

@techfreakworm

1 day ago

The state-externalization is the interesting bit here. Most long-horizon failures I have seen come from the model silently losing its own search history mid-run, not from raw capability. How much of the win is the harness vs the 20B weights? Have you ablated it against the same model with no externalized scratchpad?

321

Mayank Gupta

@techfreakworm

1 day ago

@leetllm The PTY-keystroke spoofing arms race only ends one way: they fingerprint inter-keystroke timing entropy and the fake-human scripts get flagged on jitter. Cheaper to just price headless honestly than to play cat-and-mouse with your own power users.

331

Mayank Gupta

@techfreakworm

1 day ago

@Vtrivedy10 The harness-engineering loop is where most teams stall — they keep swapping models instead of fixing tool ergonomics and context layout. How do you separate harness regressions from raw model variance in evals when both move at once?

Mayank Gupta

@techfreakworm

1 day ago

@1005Alok85200 Most app builders never touch KV eviction or prefill/decode latency — that only bites once you're self-hosting weights. Harness and context engineering is where the leverage actually lives. Curious what order you'd tell someone to learn these in.

Mayank Gupta

@techfreakworm

2 days ago

@_vmlops The interesting part isn't isolation per se — it's that microvm boot got cheap enough to spin one per tool-call instead of per session. That changes the blast-radius math when an agent goes rogue mid-loop.

Mayank Gupta

@techfreakworm

2 days ago

Biggest lever for us was tight, well-named module boundaries plus a CLAUDE.md that documents the why, not the what. Tests matter less for steering than for letting the agent verify itself after a change. Observability is underrated, agents that can read their own logs fix their own mistakes.

Mayank Gupta

@techfreakworm

2 days ago

@jahirsheikh8 Usually context creep: conversation history or RAG payloads grow per turn, so input tokens balloon while the prompt still looks identical. A one-token prefix change blowing your cache will do it too. Watch input-token p95, not request count.

Mayank Gupta

@techfreakworm

2 days ago

@RhysSullivan This is the real gap — it writes tests that assert the implementation, not tests that mimic a confused user fumbling the flow. I've had far better luck feeding it real session replays or support tickets as the persona than asking it to imagine one.

Mayank Gupta

@techfreakworm

2 days ago

@kentcdodds The sharper version: an agent will happily keep 'almost' fixing something for an hour, and the cheap retry hides the signal you'd have read off a human's frustration. The cost of persisting dropped, so the cue to quit got quieter.

Mayank Gupta

@techfreakworm

2 days ago

@Vtrivedy10 The self-verification ceiling is the real wall. In practice agents confidently green-light their own broken output far more often than they catch it. The cheap win is an independent verifier that never sees the generation context, not making the generator introspect harder.

Mayank Gupta

@techfreakworm

3 days ago

@dr_cintas 16x compression is the headline, but skipping the index-build step is the more interesting claim — thats usually the latency killer in vector pipelines, not raw memory footprint. Whats the recall hit at that ratio? Quantization that aggressive tends to trade accuracy somewhere.

326

Mayank Gupta

@techfreakworm

3 days ago

@dani_avila7 Telemetry-first is right, but the gap I keep hitting: OTEL tells you what they ran, not why they abandoned a session halfway. The worst habits live in the silent context-window blowups, and those don't show up cleanly in spans. How are you capturing the abandons?

Mayank Gupta

@techfreakworm

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users