James @jamwithai - Twitter Profile

Pinned Tweet

8 months ago

Introducing https://t.co/eFXL4smK0t - a Claude "factory skill" that crawls your web app and generates a Playwright skill specific to your application that significantly speeds up QA flows Claude Code uses to debug or verify features and bug fixes. Will publish benchmark data soon

1

0

103

jamwithai retweeted

Anuj Patel

@byanujpatel

about 1 month ago

Great article on harness engineering. https://t.co/OBrPgi3Uwo

6

337

39

462

33K

jamwithai retweeted

Vaishnavi

@_vmlops

about 1 month ago

ANTHROPIC JUST DROPPED A ZERO TRUST PLAYBOOK FOR AI AGENTS and it's not theory it's architecture frontier AI compresses vulnerability-to-exploit timelines from months to hours your agents face threats traditional access controls were never built to handle: ▫️ prompt injection through external data sources ▫️ tool poisoning via MCP server metadata ▫️ memory-based privilege retention across sessions ▫️ multi-agent pivot attacks the framework breaks it into 3 tiers: Foundation, Enterprise, Advanced https://t.co/uDuO9cq25H

_vmlops's tweet photo. ANTHROPIC JUST DROPPED A ZERO TRUST PLAYBOOK FOR AI AGENTS

and it's not theory it's architecture

frontier AI compresses vulnerability-to-exploit timelines from months to hours

your agents face threats traditional access controls were never built to handle:
▫️ prompt injection through external data sources
▫️ tool poisoning via MCP server metadata
▫️ memory-based privilege retention across sessions
▫️ multi-agent pivot attacks

the framework breaks it into 3 tiers: Foundation, Enterprise, Advanced

https://t.co/uDuO9cq25H

42

1K

205

2K

133K

jamwithai retweeted

dharmesh

@dharmesh

about 2 months ago

I'm with @bchesky on this one. I think the future is not about apps, but about agents. But the shift to agents doesn't necessarily mean text-forward, chat-based UIs. That makes sense for some use cases -- but not all. The future is about agents that work on your behalf, often in the background, and let you interact in ways that make sense. Sometimes, that means typing text, but others it might be a personalized UI element. UI affordances are underrated. Sometimes humans need some guidance and nudges instead of an empty prompt box. I think hybrid agentic interfaces will be the future. And it's not just about B2C. Turns out, B2B users are people too. :)

59

384

28

256

73K

Who to follow

building | YC alum | views my own

gopal123

@srii009

Sriii is my name stake: gopal123

James

@jamwithai

3 months ago

@Tecinc7 @gauntlet_xyz No updates about this?

0

1

0

18

jamwithai retweeted

Essam Sleiman

@essamsleiman

3 months ago

tldr: everyone is converging on the same product shape: a general harness that takes a goal, uses tools, and does knowledge work. once every product is a harness, the next frontier is the feedback loop that improves it after deployment.

37

1K

66

2K

279K

jamwithai retweeted

albs—

@albfresco

3 months ago

there's a really interesting finding hiding in this picture, right? most of these tasks didn't get better over time. they got better on the first try meaning: performance was left on the table in the past system. does the team have the strength to avoid the ego hit, and just make progress, knowing most old stuff should be thrown out. interesting findings. nice writeup

albfresco's tweet photo. there's a really interesting finding hiding in this picture, right? most of these tasks didn't get better over time. they got better on the first try

meaning: performance was left on the table in the past system. does the team have the strength to avoid the ego hit, and just make progress, knowing most old stuff should be thrown out. interesting findings. nice writeup

1

3

1

889

James

@jamwithai

4 months ago

@dzhng @morganlinton @denisyarats Curious why you say it should be split from MCP, if you would just end up recreating that part of MCP Apps. I could see political arguments for this if you want to own the standard, but I don't see the technical reason for it.

0

25

James

@jamwithai

4 months ago

@dzhng @morganlinton @denisyarats How do you think MCP Apps factors into this? Do you think that interactive embedded apps is one good use case for MCP since it is substantially more involved than a typical MCP tool?

1

0

179

jamwithai retweeted

Viv

@Vtrivedy10

4 months ago

Harness Design Notes: Decoupling Agent Storage from Agent Compute TLDR: You can give each Agent/Subagent dedicated compute while sharing storage (repo/filesystem) to self-organize work between them. Shared Compute can be a bottleneck especially with long running code execution. Started writing up some harness design patterns over a very long flight this weekend, might make this a series if there's interest! We're on the edge of using a massive amount of compute to orchestrate agents across long horizon work Ex: for Agent Teams, an orchestrator organizes potentially many agents that fan out and do work on a project (like a large repo) For anyone who runs many agents locally, you see your CPU usage skyrocket for even moderate runs with code exec But Sandboxes to the rescue :) There's a nice pattern of shared filesystems via Volumes that all agents access while getting their own sandbox environment. The coordination happens via writing to the write place in the filesystem. And using git makes it so you can track and roll back changes over time good Harness Engineering on self-organizing agents via filesystems requires thinking about infra too. Many patterns work but you have to measure them for your work! Harness Engineering is Systems Engineering

Vtrivedy10's tweet photo. Harness Design Notes: Decoupling Agent Storage from Agent Compute

TLDR: You can give each Agent/Subagent dedicated compute while sharing storage (repo/filesystem) to self-organize work between them. Shared Compute can be a bottleneck especially with long running code execution.

Started writing up some harness design patterns over a very long flight this weekend, might make this a series if there's interest!

We're on the edge of using a massive amount of compute to orchestrate agents across long horizon work

Ex: for Agent Teams, an orchestrator organizes potentially many agents that fan out and do work on a project (like a large repo)

For anyone who runs many agents locally, you see your CPU usage skyrocket for even moderate runs with code exec

But Sandboxes to the rescue :)

There's a nice pattern of shared filesystems via Volumes that all agents access while getting their own sandbox environment. The coordination happens via writing to the write place in the filesystem. And using git makes it so you can track and roll back changes over time

good Harness Engineering on self-organizing agents via filesystems requires thinking about infra too. Many patterns work but you have to measure them for your work!

Harness Engineering is Systems Engineering

21

380

33

526

20K

jamwithai retweeted

Latent.Space

@latentspacepod

4 months ago

🆕 How to Kill The Code Review https://t.co/fzUMQxwVOS the volume and size of PRs is skyrocketing. @simonw called out StrongDM’s “Dark Factory” last month: no human code, but *also* no human review (!?) in this week’s guest post, @ankitxg makes a 5 step layered playbook for how this can come true.

50

787

103

2K

611K

jamwithai retweeted

Matt Stockton

@mstockton

4 months ago

An interesting aspect of these models and foundation model companies: - Their internal teams know *a ton* about how to best use these models - They are publishing things (e.g. skills) that let you essentially leverage that knowledge for free - You should never really 'hand craft' a context at this point. It's much better for you to 'find the existing' bootstrapped context (or context generator) and use that instead, or have the model 'prompt it' out of you (e.g. AskUserQuestion tool all day) - The skill-creator skill is a perfect example of this. It's essentially leading-edge knowledge of people *at foundation labs knowing what works* just available to you, for free - It's kind of weird, but there's actually just incredible 'alpha' by finding existing skills that work versus trying to do your own thing. - With the right set of skills loaded, I would make a bet that a large proportion (maybe a majority) of white-collar work could be accomplished by purely typing the key '/' followed by a word into a terminal, over and over again - It still helps to have good taste, know what good looks like, take incremental approaches, and just generally be curious -- but the shape of what it even means 'to work' has totally shifted - and it's going to continue to shift even faster than it is now. It's certainly weird, but we are here.

9

551

20

891

89K

jamwithai retweeted

Thariq

@trq212

4 months ago

Prompt caching can be surprisingly easy to regress. Read more on why prompt caching is so important for agents and how to design your agent around it here: https://t.co/Gnd7v2XGTr

18

503

27

606

161K

jamwithai retweeted

Simon Willison

@simonw

4 months ago

New chapter of my Agentic Engineering Patterns guide. This one is about having coding agents build custom interactive and animated explanations to help fight back against cognitive debt https://t.co/F5mqcUSp3f

62

1K

98

2K

84K

jamwithai retweeted

Massimo

@Rainmaker1973

4 months ago

This turtle behavior, often called "claw fluttering", is a courtship ritual where a male turtle rapidly vibrates or waves his long front claws (or "jazz hands") near a female's face to attract her.

199

11K

765

2K

793K

jamwithai retweeted

Thariq

@trq212

5 months ago

https://t.co/oqBZifW4GG

127

3K

300

6K

870K

jamwithai retweeted

Harrison Chase

@hwchase17

5 months ago

🧵 Context Management for DeepAgents We wrote an in depth blog on how we do context management in DeepAgents, our open source agent harness

16

351

39

494

54K

jamwithai retweeted

Thariq

@trq212

5 months ago

https://t.co/eHD0AmjSRM

323

6K

430

6K

2M

jamwithai retweeted

kitze the 🐐

@thekitze

6 months ago

vibe coding in 2026

73

2K

183

287

209K

jamwithai retweeted

Connor Davis

@connordavis_ai

6 months ago

Holy shit… this paper might be the most important shift in how we use LLMs this entire year. “Large Causal Models from Large Language Models.” It shows you can grow full causal models directly out of an LLM not approximations, not vibes actual causal graphs, counterfactuals, interventions, and constraint-checked structures. And the way they do it is wild: Instead of training a specialized causal model, they interrogate the LLM like a scientist: → extract a candidate causal graph from text → ask the model to check conditional independencies → detect contradictions → revise the structure → test counterfactuals and interventional predictions → iterate until the causal model stabilizes The result is something we’ve never had before: a causal system built inside the LLM using its own latent world knowledge. Across benchmarks synthetic, real-world, messy domains these LCMs beat classical causal discovery methods because they pull from the LLM’s massive prior knowledge instead of just local correlations. And the counterfactual reasoning? Shockingly strong. The model can answer “what if” questions that standard algorithms completely fail on, simply because it already “knows” things about the world those algorithms can’t infer from data alone. This paper hints at a future where LLMs aren’t just pattern machines. They become causal engines systems that form, test, and refine structural explanations of reality. If this scales, every field that relies on causal inference economics, medicine, policy, science is about to get rewritten. LLMs won’t just tell you what happens. They’ll tell you why.