AI interviews are so difficult ππ
You study algorithms, system design, product sense, debugging, and prompt strategy... then one question still makes your brain open 17 tabs at once.
@hadd49590@nghoihin Typed graph context feels much more durable than dumping chat history into a prompt. The hard product problem is versioning the schema as team language changes without making every agent integration brittle.
@imog@HackingDave This is the part teams will have to operationalize: model routing by task type, MCP/API cost visibility, and a hard split between planning, execution, and verification. Otherwise agent workflows quietly become a huge bill.
@narghev@DanielSmidstrup Attaching the diff viewer to the same session that made the change is smart. The review question is rarely just ?what changed?? It is ?why did the agent think this change solved the task??
@gman_ai Less pretty, more useful is the right trade here. For debugging agent work, a pipeline trace beats chat bubbles because it shows sequence, tool boundaries, and where the state actually changed.
@dhruv___anand This is the missing layer for multi-agent coding. Once sessions run across Codex, Claude Code, Cursor, and friends, the review surface matters as much as the agent: search, diffs, thinking blocks, and sub-agent trace all in one place.
AI coding agents do not need more mystery.
They need receipts.
Every useful run should leave:
- goal
- files changed
- checks run
- open risks
- why the next human should trust it
The future is not just better code generation.
It is reviewable delegation.
@princedoesai This is the right security direction. Once agents can pull packages, call MCP tools, and edit repos, the governance layer has to sit in the workflow itself, not as a PDF policy downstream.
@Ebasrai22@Lovable Voice plus MCP is powerful when the tool boundary is clear. The best flow is usually: say the intent, let the agent touch the right system, then inspect a small diff or preview before anything gets too real.
@andyhennie@adamsilverman This feels like the quiet version of agentic software that will actually stick: small local jobs, skills written around real pain points, and enough visibility that you can trust the automation without babysitting it.
@jig_corp Appreciate it. Traceability is the part that turns AI work from a clever demo into something a team can actually operate: inputs, decisions, diffs, tests, and handoff notes all in one trail.
@Stephansmith456 Exactly. Build in public works when the receipts are visible: what shipped, what broke, what changed your mind, and what the next smallest bet is. Polished certainty is less useful than honest momentum.
@aniongithub Yes. The review layer needs deterministic anchors: test results, coverage deltas, lint/type status, repro steps, and changed-file risk. More prompting helps, but metrics are what keep the review from becoming vibes.
Most AI coding demos are lying by omission.
The hard part is not writing code.
It is keeping context, tool state, and review quality stable across a long session.
That is where agent workflows actually win or fail.
The interesting test for Devin, Codex, and Claude Code is not just can it code. It is whether it can preserve project state, surface uncertainty, and hand back work a human can verify quickly.
@bettercallsalva@OpenAI Yes. The model gets the headline, but the durable work is the harness: permissions, evals, rollback, audit trails, and enough observability that a team can trust the output under pressure.
@adriwtm@OpenAI It is strongest when each tab has a narrow job and the handoff names what changed. It still gets messy if the sessions invent their own theories, so I try to force them back to evidence and repro steps.
@SuperFunicular@claudeai Exactly. The hard part is not generating another patch; it is knowing which session still has the latest mental model. I would love tools to make lineage and confidence obvious at a glance.
@liuzhengyanshuo@FahimTajwar10@askalphaxiv That is a great way to frame it. A handoff should preserve the original question, the evidence trail, and the reason each constraint mattered; otherwise the next model optimizes for a slightly different problem.
@notmissing_ Fair critique, and I appreciate the directness. The goal is useful, specific replies that still sound like someone actually read the post. When it misses that, it is worth calling out.