Maithri @maithrivm - Twitter Profile

12 days ago

Satya’s point about AI ecosystems is directionally right, but for enterprises the missing piece is the harness. The real challenge isn’t just connecting a model to a workflow. It’s giving that workflow safe access to infrastructure, policy, telemetry, and enforcement so it can actually run in production and improve over time. That’s how I think about @Cisco Cloud Control: a secure harness for infrastructure that helps enterprises build loops safely. Not just agents that can act, but governed systems that can observe, decide, execute, and learn without losing control.

3

7

1

382

Maithri @MaithriVm

13 days ago

Very insightful 🔥

elvis

@omarsar0

14 days ago

Notes on the recent session we had related to autonomous long-running coding agents. (bookmark it) Topics: /goal, loop engineering, verifiers, dynamic workflows, and much more. So much to unpack, so I tried to quickly summarize the most relevant parts using my writer agent.

24

252

31

501

51K

0

1

0

23

MaithriVm retweeted

Satya Nadella

@satyanadella

13 days ago

https://t.co/vLmiBKTtX3

3K

41K

8K

57K

66M

MaithriVm retweeted

Avi Chawla

@_avichawla

14 days ago

Karpathy said something you'll regret ignoring: "Remove yourself as the bottleneck. Maximize your leverage. Put in very few tokens, and a huge amount of stuff happens on your behalf." Loop engineering is the exact thing that does that. In a hand-run session, the operator handles two things: - deciding what the agent runs next - and checking its output before the next step Both are manual, and both decide how far the agent gets on its own without the operator. Loop engineering moves both steps into the system. A core operating structure surrounds the loop, and the diagram below depicts it. - A schedule decides what to run - Loop is the maker that produces the work - A separate checker agent grades the output - A file on disk holds the state they both read. The loop runs until either done, max iterations, or an exhausted budget. Here are some practical engineering considerations: 1) A model grading its own output justifies what it already did instead of catching where it failed. That's why a separate checker's findings return to the maker as the next instruction. And the cycle repeats until the checker finds nothing left to fix. 2) A loop with no stop condition burns tokens, and the cost climbs fast once sub-agents and long runs add up. That's why the exit must be set before the loop runs, not while it is running. A simple exit could be: ↳ fix only the major issues, run one final pass, and stop after two loops, with "all tests pass and lint clean" as the rule that ends it. 3) State has to live on disk, not in context. The model forgets everything between runs, so an MD file or a knowledge graph holds what is done and what is still open. Each run reads it and writes back to it, which lets a loop pick up again after days. 4) The lower the verification bar, the safer the loop. Boring, repetitive checks like a stale version string or a missing test are trivial to verify, so a loop runs them with little risk while the operator is away. Judgment-heavy work is loopable too, but only as far as the checker can confirm the result. Let's look at how an unattended loop fails in two ways. 1) It reports done when nothing is actually verified. The separate checker exists to prevent it, but it merges code faster than anyone reads it, so over weeks, the team stops understanding its own codebase while every check stays green. Green tests say the code passed the tests, not that anyone knows what shipped. Someone still has to read what the loop merges. 2) The checker keeps a running loop honest, but it only catches failures inside a run. The harness around the loop, like the prompts, tools, and checks wrapped around the model, still drifts and breaks in production as models change. That repair loop is usually run by hand based on observability traces. My co-founder wrote a detailed walkthrough (with code) on making that harness repair itself, where a failing trace gets diagnosed, the fix is verified against the exact input that failed, and the failure is locked as a regression test so it cannot recur. Read it below.

85

4K

563

9K

675K

Who to follow

AmitAnitaRote ( NSAmi)

@roteamit

Lead iOS Developer #BostonScientific (@bostonsci) Mobility Consultant, Client Relations Strategy, CSM, Event Organizer, always love to explore and learn.

Nisar

@nisar_3s

Practical Exposure to Developing large scale AI based systems.

Basavraj Birajdar

@basavrajmb

The Digital Expert Solution Architect, Genpact

MaithriVm retweeted

Cisco AI

@CiscoAI

24 days ago

Watch Cisco Cloud Control in action. Discover the unified platform built for humans and AI agents to run critical IT infrastructure together, enabling customers and partners to build their own apps and agents in natural language, and extending to third-party tools.

41

3K

335

498

24M

MaithriVm retweeted

Thariq

@trq212

25 days ago

https://t.co/R6exTuF7P8

260

11K

1K

24K

3M

MaithriVm retweeted

Cisco @Cisco

25 days ago

Announcing 📣 Simplify your tech stack with Cisco Cloud Control. Manage less, innovate more. Learn more in today's #CiscoLive recap 👀

1

37

10

6

3K

MaithriVm retweeted

Rahul

@sairahul1

28 days ago

https://t.co/2HcA8LG2HT

16

429

58

2K

896K

MaithriVm retweeted

Boris Cherny

@bcherny

29 days ago

Salesforce published a detailed writeup on going agentic with Claude Code. A couple things jumped out. A migration they'd scoped at 231 days shipped in 13. One PR delivered 21 endpoints at 100% test coverage.

145

4K

180

2K

404K

MaithriVm retweeted

Daniel San

@dani_avila7

about 1 month ago

IMPORTANT! In Skills, Run vs Read makes a huge difference in how your script consumes tokens How you call a script inside your Skill matters, the verb decides whether Claude executes the code or reads it as context In this case: - Step 1 uses "Run" with an executable code block, the script runs in bash, the file never enters the context window, only the stdout from https://t.co/tXpH87lpLk does - The "Field extraction algorithm" section uses "Read" pointing at analyze_form.py, Claude opens the file and loads the full source into context, same as any reference markdown Same script, same location, completely different cost If you want deterministic code execution without paying tokens for large script files, you need to be deliberate with this. "Run" for execution, "Read" or "See" only when the code itself is the documentation Most utility scripts in a Skill should be Run, not Read!

dani_avila7's tweet photo. IMPORTANT!
In Skills, Run vs Read makes a huge difference in how your script consumes tokens

How you call a script inside your Skill matters, the verb decides whether Claude executes the code or reads it as context

In this case:

- Step 1 uses "Run" with an executable code block, the script runs in bash, the file never enters the context window, only the stdout from https://t.co/tXpH87lpLk does

- The "Field extraction algorithm" section uses "Read" pointing at analyze_form.py, Claude opens the file and loads the full source into context, same as any reference markdown

Same script, same location, completely different cost

If you want deterministic code execution without paying tokens for large script files, you need to be deliberate with this.

"Run" for execution, "Read" or "See" only when the code itself is the documentation

Most utility scripts in a Skill should be Run, not Read!

11

82

15

106

10K

MaithriVm retweeted

Suryansh Tiwari

@Suryanshti777

about 1 month ago

Andrej Karpathy just explained the future of software engineering without directly saying it. The best AI engineers are no longer “prompting.” They’re building systems around the agents. Karpathy’s biggest insight wasn’t: “Claude can code.” It was: LLMs become dramatically better when you force them into disciplined workflows. That’s why "CLAUDE.md" files are suddenly everywhere. Not because they’re prompts. Because they behave like an operating system for the agent. Karpathy called out the exact problems with AI coding: - models assume instead of asking - they overengineer simple tasks - they hide confusion - they rewrite unrelated code - they optimize for completion, not correctness So developers started encoding rules directly into the workflow: → Think before coding → Simplicity first → Surgical edits only → Goal-driven execution And the results are wild. People are now running multiple Claude Code agents in parallel like engineering teams: • one agent researching • one debugging • one writing tests • one optimizing code • one validating outputs Not “AI assistance.” Actual orchestration. And this part from Karpathy changes everything: “Don’t tell the model what to do. Give it success criteria and let it loop.” That is the shift. From: “write this function” To: “here’s the goal, constraints, tests, and verification system — now iterate until correct.” The craziest part? This already feels like a phase shift in engineering. A lot of developers quietly went from: 80% manual coding → to 80% agent-driven coding in just months. Not because AI became perfect. Because the leverage became impossible to ignore. We’re entering an era where the highest leverage engineers won’t necessarily be the best coders. They’ll be the people who build the best systems around AI agents.

Suryanshti777's tweet photo. Andrej Karpathy just explained the future of software engineering without directly saying it.

The best AI engineers are no longer “prompting.”

They’re building systems around the agents.

Karpathy’s biggest insight wasn’t:
“Claude can code.”

It was:
LLMs become dramatically better when you force them into disciplined workflows.

That’s why "CLAUDE.md" files are suddenly everywhere.

Not because they’re prompts.

Because they behave like an operating system for the agent.

Karpathy called out the exact problems with AI coding:

- models assume instead of asking
- they overengineer simple tasks
- they hide confusion
- they rewrite unrelated code
- they optimize for completion, not correctness

So developers started encoding rules directly into the workflow:

→ Think before coding
→ Simplicity first
→ Surgical edits only
→ Goal-driven execution

And the results are wild.

People are now running multiple Claude Code agents in parallel like engineering teams:

• one agent researching
• one debugging
• one writing tests
• one optimizing code
• one validating outputs

Not “AI assistance.”

Actual orchestration.

And this part from Karpathy changes everything:

“Don’t tell the model what to do. Give it success criteria and let it loop.”

That is the shift.

From:
“write this function”

To:
“here’s the goal, constraints, tests, and verification system — now iterate until correct.”

The craziest part?

This already feels like a phase shift in engineering.

A lot of developers quietly went from:
80% manual coding → to 80% agent-driven coding in just months.

Not because AI became perfect.

Because the leverage became impossible to ignore.

We’re entering an era where the highest leverage engineers won’t necessarily be the best coders.

They’ll be the people who build the best systems around AI agents.

86

4K

502

8K

735K

Maithri @MaithriVm

about 1 month ago

@sairahul1 @eng_khairallah1 This is very helpful! @singularity6770

0

1

0

26

Maithri @MaithriVm

about 1 month ago

@_sholtodouglas @trq212 You asked for it 🤓Though i love the spirit 🫡 sharing my critical concerns! 1. Subagehts -super slow in handoff and back merger 2. Better control over concurrent tool execution by agent (currently it is instruct and hope)

1

0

785

MaithriVm retweeted

Shubham Saboo

@Saboo_Shubham_

about 2 months ago

Codex /goal builds it. Claude Code /goal review and refines it. Hermes /goal manages the orchestration and handoff. All tracked on a single Kanban Board and agents keep running in the loop.