zero

@Zero_cool5

Joined January 2013

201 Following

7 Followers

36 Posts

Zero_cool5 retweeted

Suraj Sharma

@suraj_sharma14

3 days ago

If I had 6 months to become an Agentic AI Engineer. I'd do this. Stage 1: Python + Async Foundations asyncio, FastAPI, event-driven architecture, error handling, API integration patterns. Stage 2: LLM Fundamentals for Agents Context management, model routing, token economics, latency tradeoffs, failure modes. Stage 3: Tool Calling + Structured Outputs Pydantic validation, function calling schemas, error recovery, dynamic tool discovery. Stage 4: Memory + State Management Short-term buffers, long-term vector recall, context compression, cross-session sync. Stage 5: Single Agent Workflows ReAct loops, plan-and-execute, self-reflection, iteration limits, graceful degradation. Stage 6: Multi-Agent Orchestration LangGraph/CrewAI, supervisor patterns, message passing, conflict resolution, handoffs. Stage 7: Human-in-the-Loop Systems Uncertainty detection, approval gates, audit trails, resume logic, intervention points. Stage 8: Evaluation + Quality Assurance Automated eval harnesses, LLM-as-a-judge, regression testing, hallucination metrics. Stage 9: Observability + Tracing Distributed tracing (LangSmith/Arize), cost dashboards, latency monitoring, alerting. Stage 10: Security + Guardrails Prompt injection defense, output filtering, PII redaction, sandboxed execution, compliance. Stage 11: Production Deployment vLLM/SGLang, Kubernetes scaling, CI/CD for agents, canary releases, rollback strategies. Stage 12: Open Source + Portfolio Ship autonomous agents publicly, write architecture docs, record demos, contribute to libs. Most people stay stuck watching tutorials. Builders get hired. (Bookmark it)

397

184K

Zero_cool5 retweeted

zaimiri

@zaimiri

3 days ago

https://t.co/a8ejOfyUQr

977

104

107K

Zero_cool5 retweeted

elvis

@omarsar0

4 days ago

https://t.co/nV84ktpZBf

730

205K

Zero_cool5 retweeted

Avi Chawla

@_avichawla

4 days ago

Karpathy said something you'll regret ignoring: "Remove yourself as the bottleneck. Maximize your leverage. Put in very few tokens, and a huge amount of stuff happens on your behalf." Loop engineering is the exact thing that does that. In a hand-run session, the operator handles two things: - deciding what the agent runs next - and checking its output before the next step Both are manual, and both decide how far the agent gets on its own without the operator. Loop engineering moves both steps into the system. A core operating structure surrounds the loop, and the diagram below depicts it. - A schedule decides what to run - Loop is the maker that produces the work - A separate checker agent grades the output - A file on disk holds the state they both read. The loop runs until either done, max iterations, or an exhausted budget. Here are some practical engineering considerations: 1) A model grading its own output justifies what it already did instead of catching where it failed. That's why a separate checker's findings return to the maker as the next instruction. And the cycle repeats until the checker finds nothing left to fix. 2) A loop with no stop condition burns tokens, and the cost climbs fast once sub-agents and long runs add up. That's why the exit must be set before the loop runs, not while it is running. A simple exit could be: ↳ fix only the major issues, run one final pass, and stop after two loops, with "all tests pass and lint clean" as the rule that ends it. 3) State has to live on disk, not in context. The model forgets everything between runs, so an MD file or a knowledge graph holds what is done and what is still open. Each run reads it and writes back to it, which lets a loop pick up again after days. 4) The lower the verification bar, the safer the loop. Boring, repetitive checks like a stale version string or a missing test are trivial to verify, so a loop runs them with little risk while the operator is away. Judgment-heavy work is loopable too, but only as far as the checker can confirm the result. Let's look at how an unattended loop fails in two ways. 1) It reports done when nothing is actually verified. The separate checker exists to prevent it, but it merges code faster than anyone reads it, so over weeks, the team stops understanding its own codebase while every check stays green. Green tests say the code passed the tests, not that anyone knows what shipped. Someone still has to read what the loop merges. 2) The checker keeps a running loop honest, but it only catches failures inside a run. The harness around the loop, like the prompts, tools, and checks wrapped around the model, still drifts and breaks in production as models change. That repair loop is usually run by hand based on observability traces. My co-founder wrote a detailed walkthrough (with code) on making that harness repair itself, where a failing trace gets diagnosed, the fix is verified against the exact input that failed, and the failure is locked as a regression test so it cannot recur. Read it below.

544

644K

Who to follow

UBBE

@UbbeTheBold

A Cosmic Joke, You're the Punchline!

Zero_cool5 retweeted

4 days ago

If you have: Hermes Agent Claude Code & Codex Handoffs Obsidian + QMD Memory System Run Agentic Loops Fleet Tailscale Mesh Cron Jobs + Kanban Board Agentic Workflows Congrats you are the top 1% of the AI god stack

154

265

260K

Zero_cool5 retweeted

Peter Steinberger 🦞

@steipete

6 days ago

Here's a simple loop: Tell codex to maintain your repos, wake up every 5 minutes and direct work to threads. That makes it easy to parallelize+steer work as needed. I use a orchestrator skill combined with my triage+autoreview+computer use skills, so some work can land autonomously. https://t.co/FbBoJTIcfd https://t.co/8389roVnOm

steipete's tweet photo. Here's a simple loop: Tell codex to maintain your repos, wake up every 5 minutes and direct work to threads. That makes it easy to parallelize+steer work as needed.

I use a orchestrator skill combined with my triage+autoreview+computer use skills, so some work can land autonomously. https://t.co/FbBoJTIcfd
https://t.co/8389roVnOm

199

428

514K

Zero_cool5 retweeted

Viksit Gaur

@viksit

10 days ago

if you're not working with unlimited tokens like @steipete and @bcherny, you could do your loop with claude code + caveman. event -> trigger->action -> eval -> feedback - event: create a "wiki" to render claude generated md files as context - trigger: click "review with claude" on a page; it drops a line in a queue file - action: claude cowork / code reads the queue and writes edits right into the page (green add, red cut, amber note) ~thanks @nbaschez for roughdraft syntax~ - evaluate: you read those marks in the wiki and judge - feedback: accept/reject decisions; reply sends it back to claude to redo

viksit's tweet photo. if you're not working with unlimited tokens like @steipete and @bcherny, you could do your loop with claude code + caveman.

event -> trigger->action -> eval -> feedback

- event: create a "wiki" to render claude generated md files as context
- trigger: click "review with claude" on a page; it drops a line in a queue file
- action: claude cowork / code reads the queue and writes edits right into the page (green add, red cut, amber note) ~thanks @nbaschez for roughdraft syntax~
- evaluate: you read those marks in the wiki and judge
- feedback: accept/reject decisions; reply sends it back to claude to redo

175

307

30K

Zero_cool5 retweeted

Matt Van Horn

@mvanhorn

10 days ago

https://t.co/DM0CAuyprS

211

474

16K

Zero_cool5 retweeted

Shann³

@shannholmberg

9 days ago

what is agent looping for the last two years we prompted agents one task at a time. that is starting to change instead of asking an agent to build the landing page and then driving every step yourself, you set up a loop that handles discovery, planning, the work, checking, and iterating until the goal is met looping is a setup you build. almost any agent harness can run it, it just depends on how you wire it up at its simplest, looping is one agent working on itself: > researches > drafts > checks the draft against a goal > fixes what is weak > runs that cycle again until the work clears the requirements you are not prompting each step anymore. the agent repeats the cycle for you the bigger version is a fleet looping. you give an orchestrator agent a goal, it breaks the goal into pieces, hands each piece to a specialist agent, and those specialists hand smaller jobs to their own subagents the whole tree keeps looping through discovery, planning, execution, and verification until the goal is met one agent looping is like a person redoing their own draft. a fleet looping is a whole team running a project end-to-end you create a goal, and the system runs the loop until it finishes within the reqs you set open and closed looping: OPEN LOOPING is exploratory. it still has conditions and a goal, but you give the agent or the fleet a wide space to move in. it can try different paths, discover things, build something you did not fully spec out this is the exciting end, it is what Peter and others are doing, and tbh it is where I want to spend more time the catch is cost, an open loop with real room to explore burns an insane amount of tokens. for the 90 percent of people without an unlimited budget it is not runnable yet, and pointed at projects with a loose standard it turns into a slop machine CLOSED LOOPING is bounded. a human designs the end-to-end path first: > clear goal > defined steps > an eval at each step > a point where it stops or hands back to you (and feeds back performance data) the agents still loop, but inside framework you built. it gets better every run because each pass feeds the next, and it runs on a normal budget because the path is tight. for most marketing work, closed is the one that pays off today. > the orchestrator owns the goal > the specialists own the steps > the subagents do the narrow work > an eval gate make sure its not slop

shannholmberg's tweet photo. what is agent looping

for the last two years we prompted agents one task at a time. that is starting to change

instead of asking an agent to build the landing page and then driving every step yourself, you set up a loop that handles discovery, planning, the work, checking, and iterating until the goal is met

looping is a setup you build. almost any agent harness can run it, it just depends on how you wire it up

at its simplest, looping is one agent working on itself:

> researches
> drafts
> checks the draft against a goal
> fixes what is weak
> runs that cycle again until the work clears the requirements

you are not prompting each step anymore. the agent repeats the cycle for you

the bigger version is a fleet looping. you give an orchestrator agent a goal, it breaks the goal into pieces, hands each piece to a specialist agent, and those specialists hand smaller jobs to their own subagents

the whole tree keeps looping through discovery, planning, execution, and verification until the goal is met

one agent looping is like a person redoing their own draft. a fleet looping is a whole team running a project end-to-end

you create a goal, and the system runs the loop until it finishes within the reqs you set

open and closed looping:

OPEN LOOPING is exploratory. it still has conditions and a goal, but you give the agent or the fleet a wide space to move in. it can try different paths, discover things, build something you did not fully spec out

this is the exciting end, it is what Peter and others are doing, and tbh it is where I want to spend more time

the catch is cost, an open loop with real room to explore burns an insane amount of tokens. for the 90 percent of people without an unlimited budget it is not runnable yet, and pointed at projects with a loose standard it turns into a slop machine

CLOSED LOOPING is bounded. a human designs the end-to-end path first:

> clear goal
> defined steps
> an eval at each step
> a point where it stops or hands back to you (and feeds back performance data)

the agents still loop, but inside framework you built. it gets better every run because each pass feeds the next, and it runs on a normal budget because the path is tight.

for most marketing work, closed is the one that pays off today.

> the orchestrator owns the goal
> the specialists own the steps
> the subagents do the narrow work
> an eval gate make sure its not slop

200

697

10K

742K

Zero_cool5 retweeted

Kate Deyneka

@katedeyneka

22 days ago

best accounts to follow from each frontier lab to stay constantly up to date Anthropic @karpathy - must-follow account for AI; recently joined Anthropic @bcherny - Claude Code creator, always shares great tips @trq212 - also a Claude Code developer; writes amazing articles on CC OpenAI @polynoamial - works on reasoning research, shares a lot of technical details @gabriel1 - Sora developer, great career path @jxnlco - works on dev experience, shares a lot about Codex Google AI @OfficialLoganK - all the major Google Gemini and AI Studio updates @ammaar - product and design; shares great things about vibe-coding in Google AI Studio @fofrAI - cool use cases for generative models Cursor @leerob - the loudest voice behind Cursor updates @ericzakariasson - shares great insights on using Cursor @mntruell - Cursor’s CEO; major releases and usage updates xAI @milichab - recently joined xAI, shares updates on Grok @skcd42 - also covers major Grok releases @elonmusk - Elon does a great job reposting and hyping all xAI products who else did I miss?

115

235

447K

Zero_cool5 retweeted

aditya

@adxtyahq

27 days ago

“design a RAG pipeline for 10M docs with zero hallucination” apparently this was asked in a Google L5 interview round. came across it somewhere on the internet and honestly it’s a way more interesting system design problem than most classic distributed systems questions 1. ingest + normalize docs - remove duplicates, standardize formats, extract metadata, maintain version history 2. hybrid retrieval (BM25 + embeddings) - BM25 handles exact keyword matching while embeddings capture semantic meaning - semantic search alone usually struggles with precision at massive scale 3. ANN retrieval + reranking - ANN (Approximate nearest neighbor ) quickly pulls top candidate chunks from millions of docs - then a reranker rescoring step improves relevance by deeply comparing query vs retrieved chunks 4. source confidence scoring - every retrieved chunk gets scored based on freshness, trust level, overlap and retrieval consistency - low-confidence context should never heavily influence generation 5. constrained generation - the model is only allowed to answer using retrieved context (nothing new to be invented outside of the retrieved context) 6. citation-backed responses - every major claim links back to exact chunks, documents or timestamps 7. hallucination fallback layer - if retrieval confidence drops below a threshold: “insufficient evidence found” 8. continuous evals - run adversarial queries, retrieval recall benchmarks and hallucination tests continuously 9. caching + memory layer - cache high-frequency enterprise queries and retrieval paths (improves latency and output) 10. observability everywhere - trace retrieval paths, chunk rankings, token attribution and failure points Also at 10M docs, retrieval quality matters more than the frontier model itself.

adxtyahq's tweet photo. “design a RAG pipeline for 10M docs with zero hallucination”

apparently this was asked in a Google L5 interview round. came across it somewhere on the internet and honestly it’s a way more interesting system design problem than most classic distributed systems questions

1. ingest + normalize docs
- remove duplicates, standardize formats, extract metadata, maintain version history

2. hybrid retrieval (BM25 + embeddings)
- BM25 handles exact keyword matching while embeddings capture semantic meaning
- semantic search alone usually struggles with precision at massive scale

3. ANN retrieval + reranking
- ANN (Approximate nearest neighbor ) quickly pulls top candidate chunks from millions of docs
- then a reranker rescoring step improves relevance by deeply comparing query vs retrieved chunks

4. source confidence scoring
- every retrieved chunk gets scored based on freshness, trust level, overlap and retrieval consistency
- low-confidence context should never heavily influence generation

5. constrained generation
- the model is only allowed to answer using retrieved context (nothing new to be invented outside of the retrieved context)

6. citation-backed responses
- every major claim links back to exact chunks, documents or timestamps

7. hallucination fallback layer
- if retrieval confidence drops below a threshold: “insufficient evidence found”

8. continuous evals
- run adversarial queries, retrieval recall benchmarks and hallucination tests continuously

9. caching + memory layer
- cache high-frequency enterprise queries and retrieval paths (improves latency and output)

10. observability everywhere
- trace retrieval paths, chunk rankings, token attribution and failure points

Also at 10M docs, retrieval quality matters more than the frontier model itself.

325

195K

Zero_cool5 retweeted

Coin Bureau

@coinbureau

about 1 month ago

🚨LATEST: The US has officially lifted the chip ban on China, per Reuters. Alibaba, Tencent, and ByteDance are among 10 Chinese firms now approved to buy Nvidia's H200 chips. China previously represented a market worth up to $8B annually and nearly a quarter of Nvidia’s revenue before the October 2023 export controls crushed the company’s market to nearly zero. $NVDA shares surged to a new 52-week high, up +8% today after the U.S. Department of Commerce's approval.

coinbureau's tweet photo. 🚨LATEST: The US has officially lifted the chip ban on China, per Reuters.

Alibaba, Tencent, and ByteDance are among 10 Chinese firms now approved to buy Nvidia's H200 chips.

China previously represented a market worth up to $8B annually and nearly a quarter of Nvidia’s revenue before the October 2023 export controls crushed the company’s market to nearly zero.

$NVDA shares surged to a new 52-week high, up +8% today after the U.S. Department of Commerce's approval.

577

33K

Zero_cool5 retweeted

Theo - t3.gg

@theo

about 1 month ago

Security things from the last few days: - CopyFail (linux pwn'd) - CopyFail 2/Dirty Frag - 13 advisories in Next.js - Over 70 CVEs addressed in MacOS 26.5 - ~50 CVEs addressed in iOS 26.5 - YellowKey (Windows Bitlocker pwn'd entirely) - GreenPlasma (Windows privilege escalation) - CVE-2026-21510 and CVE-2026-21513 confirmed to be used by Russia for Windows RCE - CVE-2026-32202 separately confirmed to be used by Russia for sensitive document access - Mini-Shai Hulud (over 300 JS and Python packages compromised via GitHub Action cache poisoning) - Google confirms they have identified AI-powered exploitation of zero days in an unidentified "open-source, web-based system administration too" - Canvas (popular LMS used in most schools) pwn'd entirely - PAN-OS (palo alto networks) pwn'd with a 9.3 severity CVE-2026-0300 Are you scared yet?

350

991

780K

Zero_cool5 retweeted

Philosophy Sage

@philosophysage

about 1 month ago

Men's Complete Health Bible : 40 things every man should know about his health before 35. Most men learn these too late:

200

Zero_cool5 retweeted

Kirill

@kirillk_web3

about 1 month ago

KIMI FOUNDER JUST DROPPED A 40-MINUTE MASTERCLASS. The exact architecture behind a $20B valuation — there's no faster way to learn how to build AI agents right now. Bookmark this for the weekend. 40 minutes. zero fluff. from the person who built it. Optimization → Linear Attention → Sub-Agents → Open Systems → Cash

660

11K

Zero_cool5 retweeted

spidey

@lochan_twt

about 1 month ago

FRONTEND IS DEAD BACKEND IS DEAD CLOUD COMPUTING IS DEAD MOBILE DEV IS DEAD DEVOPS IS DEAD DATA SCIENCE IS DEAD UI/UX IS DEAD FULL STACK IS DEAD GAME DEV IS DEAD OPEN SOURCE IS DEAD STARTUPS ARE DEAD SAAS IS DEAD APIs ARE DEAD DATABASES ARE DEAD MICROSERVICES ARE DEAD SERVERLESS IS DEAD KUBERNETES IS DEAD DOCKER IS DEAD VERSION CONTROL IS DEAD DEBUGGING IS DEAD TESTING IS DEAD

149

683

219

145K

Zero_cool5 retweeted

Puneet Patwari

@system_monarch

about 1 month ago

As an AI Engineer. Please learn: -Prompt caching & semantic caching tradeoffs -KV cache management at scale -Speculative decoding vs quantization -RAG evaluation (RAGAS + human evals) -Cost monitoring & hidden token leaks -Agent guardrails & infinite loop detection

202

93K

Zero_cool5 retweeted

Maor Elkarat

@Maor_Elkarat

about 2 months ago

Stop buying more VRAM. Everyone’s posting Qwen 3.6 configs running insanely fast on 12GB cards. But do you actually understand the flags making it possible? Weights are only half the story. KV cache is eating your VRAM alive. The secret isn’t just 4-bit weights it’s the KV cache sorcery everyone’s missing. Here’s the annotated command & real tricks explained: @elonmusk @grok #Ai

Maor_Elkarat's tweet photo. Stop buying more VRAM.

Everyone’s posting Qwen 3.6 configs running insanely fast on 12GB cards.

But do you actually understand the flags making it possible? Weights are only half the story. KV cache is eating your VRAM alive.

The secret isn’t just 4-bit weights it’s the KV cache sorcery everyone’s missing.

Here’s the annotated command & real tricks explained:
@elonmusk @grok #Ai

150

94K

Zero_cool5 retweeted

elvis

@omarsar0

about 2 months ago

// Agentic Harness Engineering // Pay attention to this one, AI devs. (bookmark it) Most coding-agent harnesses are still tuned by hand or brittle trial-and-error self-evolution. This new work introduces Agentic Harness Engineering, a framework that makes harness evolution observable. They do this through three layers: components as revertible files, experience as condensed evidence from millions of trajectory tokens, and decisions as falsifiable predictions checked against task outcomes. Each edit becomes a contract you can verify or revert. Results: pass@1 on Terminal-Bench 2 climbs from 69.7% to 77.0% in ten iterations, beating human-designed Codex-CLI (71.9%) and self-evolving baselines like ACE and TF-GRPO. The evolved harness also transfers across model families with +5.1 to +10.1 point gains, while using 12% fewer tokens than the seed on SWE-bench-verified. Harness work is the biggest hidden cost in most agent systems. This is the first credible recipe for letting the harness improve itself without drifting into noise. Paper: https://t.co/9fEgqwlTSf Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

omarsar0's tweet photo. // Agentic Harness Engineering //

Pay attention to this one, AI devs.

(bookmark it)

Most coding-agent harnesses are still tuned by hand or brittle trial-and-error self-evolution.

This new work introduces Agentic Harness Engineering, a framework that makes harness evolution observable. They do this through three layers: components as revertible files, experience as condensed evidence from millions of trajectory tokens, and decisions as falsifiable predictions checked against task outcomes.

Each edit becomes a contract you can verify or revert.

Results: pass@1 on Terminal-Bench 2 climbs from 69.7% to 77.0% in ten iterations, beating human-designed Codex-CLI (71.9%) and self-evolving baselines like ACE and TF-GRPO.

The evolved harness also transfers across model families with +5.1 to +10.1 point gains, while using 12% fewer tokens than the seed on SWE-bench-verified.

Harness work is the biggest hidden cost in most agent systems. This is the first credible recipe for letting the harness improve itself without drifting into noise.

Paper: https://t.co/9fEgqwlTSf

Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

231

140K

Zero_cool5 retweeted

shirish

@shiri_shh

about 2 months ago

scene in the Vercel office right now 😭

225

97K

zero

@Zero_cool5

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users