Akshit @akshit_io - Twitter Profile

Pinned Tweet

Akshit

@akshit_io

8 days ago

https://t.co/FeuPpAbzoA

1

0

74

Akshit

@akshit_io

2 days ago

@vivoplt Hosted -> Clerk (No doubt) OpenSource (with hosted version) -> BetterAuth

0

134

Akshit

@akshit_io

2 days ago

AI-agent evals should be smaller than most founders make them. Bad eval: “build checkout.” Useful eval: “given this expired coupon, the agent must refuse to edit pricing logic and return the exact failing command.” Small evals catch specific judgment failures. Big evals mostly reward lucky completion.

akshit_io's tweet photo. AI-agent evals should be smaller than most founders make them.

Bad eval: “build checkout.”

Useful eval: “given this expired coupon, the agent must refuse to edit pricing logic and return the exact failing command.”

Small evals catch specific judgment failures.

Big evals mostly reward lucky completion.

3

0

177

Akshit

@akshit_io

3 days ago

AI agents need side-effect tests before they need larger tasks. For any feature that can email, charge, delete, or notify, I want one fixture: run the same action twice assert one side effect show the artifact Most tests prove the function returned. Side-effect tests prove the product did not embarrass you twice.

akshit_io's tweet photo. AI agents need side-effect tests before they need larger tasks.

For any feature that can email, charge, delete, or notify, I want one fixture:

run the same action twice
assert one side effect
show the artifact

Most tests prove the function returned.

Side-effect tests prove the product did not embarrass you twice.

1

0

20

Akshit

@akshit_io

3 days ago

AI-agent “thin harness, fat skills” is directionally right. But the harness still owns the scary parts: - permissions - stop rules - tool schemas - replay receipts - rollback paths Skills make agents useful. Harnesses make them safe to delegate to. If the harness is too thin to say no, it is just a prompt launcher.

Garry Tan

@garrytan

3 months ago

I found myself explaining this to people over and over again at YC today because I think most knowledge work will increasingly be encoded in markdown skills (fat skills) that work hand in hand with deterministic code written specifically to be called by agents (fat code)

88

2K

114

2K

324K

0

1

0

48

Akshit

@akshit_io

4 days ago

Indie founders: do not ship AI-agent pricing changes without a rollback eval. Minimum fixture: - old plan - new plan - invoice amount - coupon edge case - rollback command If the agent makes tests pass by changing the expectation, you tested obedience, not billing safety.

akshit_io's tweet photo. Indie founders: do not ship AI-agent pricing changes without a rollback eval.

Minimum fixture:
- old plan
- new plan
- invoice amount
- coupon edge case
- rollback command

If the agent makes tests pass by changing the expectation, you tested obedience, not billing safety. https://t.co/7H16zD96NL

1

0

35

Akshit

@akshit_io

4 days ago

Claude Code vs Cursor vs background agents is still the wrong comparison. The useful comparison is control surface: Chat: high control, low execution IDE agent: medium control, visible edits Background agent: low interruption, delayed evidence The more autonomy you buy, the more receipts you need. Pick the agent by where you want proof to appear.

akshit_io's tweet photo. Claude Code vs Cursor vs background agents is still the wrong comparison.

The useful comparison is control surface:

Chat: high control, low execution
IDE agent: medium control, visible edits
Background agent: low interruption, delayed evidence

The more autonomy you buy, the more receipts you need.

Pick the agent by where you want proof to appear.

1

0

27

Akshit

@akshit_io

4 days ago

Devtools builders: what should every AI-agent tool call expose by default? My current answer: - cwd - env shape, not secrets - input hash - exit code - elapsed time - artifact path - rollback hint stdout alone is too easy to fake confidence with. A tool call should leave a ledger entry a reviewer can replay.

akshit_io's tweet photo. Devtools builders: what should every AI-agent tool call expose by default?

My current answer:
- cwd
- env shape, not secrets
- input hash
- exit code
- elapsed time
- artifact path
- rollback hint

stdout alone is too easy to fake confidence with.

A tool call should leave a ledger entry a reviewer can replay.

0

32

Akshit

@akshit_io

4 days ago

AI coding agents that can drive a browser need a trace, not a victory message. Minimum receipt: - URL opened - clicks typed - network failures - console errors - DOM state before submit - diff after the run If the agent says “works in browser” but cannot show the trace, I treat it as unverified. Browser access is not proof. It is just a better place to collect proof.

akshit_io's tweet photo. AI coding agents that can drive a browser need a trace, not a victory message.

Minimum receipt:
- URL opened
- clicks typed
- network failures
- console errors
- DOM state before submit
- diff after the run

If the agent says “works in browser” but cannot show the trace, I treat it as unverified.

Browser access is not proof. It is just a better place to collect proof.

1

0

34

Akshit

@akshit_io

4 days ago

Devtools repos need one command that proves the happy path. Not a README paragraph. A command. `tool demo --local --verify` If an AI agent needs taste to figure out setup, your docs are not agent-ready. The repo should teach the first successful run before it teaches every option.

0

12

Akshit

@akshit_io

5 days ago

AI-agent evals should include clock skew. The bug: agent writes a retry rule local tests pass CI runs in UTC billing window flips at midnight retry fires twice Fixture: freeze time near the boundary and run the same command in two timezones. Date bugs are trust bugs with better disguises.

akshit_io's tweet photo. AI-agent evals should include clock skew.

The bug:

agent writes a retry rule
local tests pass
CI runs in UTC
billing window flips at midnight
retry fires twice

Fixture: freeze time near the boundary and run the same command in two timezones.

Date bugs are trust bugs with better disguises.

1

0

64

Akshit

@akshit_io

5 days ago

AI coding-agent productivity claims need a failure budget. A founder does not need “100x more code” alone. They need: - verified patches - cheaper rollback - clearer stop reasons - fewer invisible side effects Throughput without a brake is just risk at scale.

Garry Tan

@garrytan

3 months ago

I found myself explaining this to people over and over again at YC today because I think most knowledge work will increasingly be encoded in markdown skills (fat skills) that work hand in hand with deterministic code written specifically to be called by agents (fat code)

88

2K

114

2K

324K

0

10

Akshit

@akshit_io

5 days ago

AI agents do not only need repo context. They need room context: - cwd - env vars - clock - network - permissions - cache state Most “bad patch” reviews miss this. The diff looked right. The room it ran in was different. Review runtime context before blaming the model.

akshit_io's tweet photo. AI agents do not only need repo context.

They need room context:
- cwd
- env vars
- clock
- network
- permissions
- cache state

Most “bad patch” reviews miss this.

The diff looked right. The room it ran in was different.

Review runtime context before blaming the model. https://t.co/RuHs5UdoWm

0

11

Akshit

@akshit_io

5 days ago

AI agents need a network egress rule before they need more context. My default now: - allowed domains - max requests - redact secrets - stop on unknown host A generated patch that can call any URL is not an assistant. It is a supply-chain bug with a friendly chat box.

akshit_io's tweet photo. AI agents need a network egress rule before they need more context.

My default now:
- allowed domains
- max requests
- redact secrets
- stop on unknown host

A generated patch that can call any URL is not an assistant.

It is a supply-chain bug with a friendly chat box. https://t.co/oUJoNkX3sv

0

16

Akshit

@akshit_io

5 days ago

Terminal agents need a working-directory invariant. The quiet bug: agent starts in /app runs tests in /packages/api edits shared code runs tests from the wrong package ships a green lie My rule now: every task spec names the repo root, consumer app, and exact verification command. No cwd, no trust.

akshit_io's tweet photo. Terminal agents need a working-directory invariant.

The quiet bug:

agent starts in /app
runs tests in /packages/api
edits shared code
runs tests from the wrong package
ships a green lie

My rule now: every task spec names the repo root, consumer app, and exact verification command.

No cwd, no trust.

0

15

Akshit

@akshit_io

6 days ago

Background agents should return a “not done” reason before they return a diff. Examples: - blocked by missing secret - failed to reproduce - touched too many files - command output changed mid-run A clean stop is often more valuable than a clever patch. Autonomy without stop reasons just creates review debt.

akshit_io's tweet photo. Background agents should return a “not done” reason before they return a diff.

Examples:
- blocked by missing secret
- failed to reproduce
- touched too many files
- command output changed mid-run

A clean stop is often more valuable than a clever patch.

Autonomy without stop reasons just creates review debt.

0

4

Akshit

@akshit_io

6 days ago

AI-agent evals and benchmarks answer different founder questions. Benchmark: can the model solve a task in general? Eval: can this agent touch my repo without corrupting the thing I care about? Benchmarks compare models. Evals protect products. If the failure would cost you users, it belongs in an eval, not a leaderboard.

akshit_io's tweet photo. AI-agent evals and benchmarks answer different founder questions.

Benchmark: can the model solve a task in general?

Eval: can this agent touch my repo without corrupting the thing I care about?

Benchmarks compare models.

Evals protect products.

If the failure would cost you users, it belongs in an eval, not a leaderboard.

1

0

90

Akshit

@akshit_io

6 days ago

AI-agent handoffs should include environment diff, not just code diff. A patch can be correct and still fail because: - Node version changed - env var was missing - cwd was different - cache was warm locally If the next runner cannot recreate the room, they cannot verify the work.

0

20

Akshit

@akshit_io

6 days ago

AI-agent builders: what do you trust more from a generated fix? A. new passing tests B. one failing repro that now passes C. a replay script D. a rollback command I am increasingly biased toward B + C. Tests prove intent. Replays prove the agent touched the actual failure.

1

0

44

Akshit

@akshit_io

6 days ago

AI coding agents are weirdly dangerous around lockfiles. A small package bump can hide: - transitive version drift - registry auth failures - postinstall scripts - CI-only platform bugs My rule: agents can propose dependency changes, but the human owns the lockfile diff. That file is supply-chain surface.

akshit_io's tweet photo. AI coding agents are weirdly dangerous around lockfiles.

A small package bump can hide:
- transitive version drift
- registry auth failures
- postinstall scripts
- CI-only platform bugs

My rule: agents can propose dependency changes, but the human owns the lockfile diff.

That file is supply-chain surface.

0

32

Akshit

@akshit_io

6 days ago

AI-agent runs should start with a dirty-git preflight. My rule now: 1. `git status --short` 2. name every existing change 3. refuse if unrelated files are dirty 4. only then edit Most agent mistakes are not bad code. They are good code written on top of state the agent did not understand.

akshit_io's tweet photo. AI-agent runs should start with a dirty-git preflight.

My rule now:

1. `git status --short`
2. name every existing change
3. refuse if unrelated files are dirty
4. only then edit

Most agent mistakes are not bad code.

They are good code written on top of state the agent did not understand.

0

11

Akshit

@akshit_io

Last Seen Users on Sotwe

Trends for you

Most Popular Users