@SieversJosua@dancolta@s13k_ Honestly it works fine as long as I keep reminding the LLM to update the spec. I ask it to write the docs before starting to a specific standard, with checklists and to update this as it goes along. The context is refreshed at any point then.
@dancolta@s13k_@SieversJosua What actually saves time isn't the benchmark numbers, it's not re-briefing the agent every session, it starts from the same place each time. That's the argument for keeping the context in the repo. https://t.co/UhWIaBgYdB
@thearchitect452@fjzeit The cohesion point is the one that pays off with agents. When things that change together live together, the model loads one folder to make a change instead of reaching across the codebase for context it gets wrong. Boundaries are as much for the agent now as the reviewer.
@jangiacomelli@printfn5 SRP gets read as 'make everything small', which is how you end up with the abused version, a function split five ways for tidiness. It was always one reason to change, one actor. Get that right and the size mostly sorts itself. https://t.co/ShfzkS6FLW
@mrzmyr Direction has to be written down or it doesn't exist, the agent only ever sees the current state. We keep ours in the repo brief so it's versioned and reviewed like everything else. The push-back instruction is the useful part, asking for clarification beats quietly complying.
@0xlelouch_ Number 2 is the one teams compromise on first, and the compromise is what kills the rest. A shared table quietly gives the schema two owners, and from then on every migration needs a meeting. Events cost you duplication, but duplication is cheaper than coordination.
@zoranh75 The nullable cluster is the tell. Nine nullables only valid in some combinations means every method re-checks what the type should make impossible. Give each valid combination its own small type and the defend-code mostly deletes itself. https://t.co/bekae0KNZG
@antilukalister Agree, the loop matters more than the prompt. For us it only started working once the spec had checks the agent could run on each pass and fail against. Without that the filtering runs on the model's own judgement, which is what you were trying to correct for.
@m_zokov@orcdev There's something in that. On a trusted team the review can move to the plan stage, the PR's mostly ceremony. I'd still keep a check on the agent-written diff though. A colleague's earned the trust, the model hasn't, so our gate sits in CI rather than an approval click.
@m_zokov@orcdev Agree, and it matters more now an agent produces a week of diff in an afternoon. Pile that onto a long branch and the review's unreadable and the merge's a fight. Short-lived branches behind a flag keep each chunk small enough to review before it lands. https://t.co/O5AoVRbmnd
@LaloLoops@suryanox7 Agree. 'Done' only does its job if it's something the work passes or fails, not a sentence the agent interprets, or it'll call it done the moment it compiles. The validation you mentioned is what turns the definition into an actual check. https://t.co/CmwmL8okSu
@sjallatak Same here, jscpd and the linter both sit in our CI gate. The thing that surprised me is it holds up better with agents than a review comment does. An agent will argue a comment, but a failing build just stops it, so the duplication never gets in. https://t.co/ObhUlKKnR3
@lennox_saint@jxnlco Seen that too. To the model the plan and the finished doc are one context, so the private notes are just more tokens it's allowed to pull from. Running planning and drafting as separate sessions clears most of it up. https://t.co/kovnQFFCW5
@YishaiBack@Aknotymous@alonzuman Fair, on its own agents.md is just context the model can skip. It only gets reliable when you pair it with a check it can't get around, a linter or an eval it has to pass before the work counts as done. Without that it tends to get treated as optional.
@rjs@ryanflorence True well beyond Rails. Once the model carries the state and the rules, the handler just translates the request and hands off, and the logic that used to pile up in the controller has somewhere else to go. https://t.co/RbQVpG320N
@chinmay185 Good list. The test-before-code one matters even more once an agent's doing the codegen. Write the test first and it has a concrete target to hit, so it can check its own work instead of calling it done whenever it feels finished. https://t.co/gEb6MnW9Jr
@thomasjiangcy@plainionist Agreed on no-mocks, and it bites harder with agents. A mock written next to the code only confirms the code back to itself, so it passes while the real integration breaks. Run it against the real dependency and you catch the bug a mock waves through. https://t.co/LrDFbdbJFV
@aikukharenko@ThePrimeagen Same, and the hallucinated ones cost more than writing it yourself, because you trust it before you check it. What's worked for me is making the agent look the API up in the real docs first instead of going off training. https://t.co/EI97Jqx4y2
New post. Most things that get called a spec fall apart the moment a coding agent touches them. What's worked is writing the acceptance criteria as runnable checks, not paragraphs the agent has to interpret. https://t.co/8xJ40WhXiH