What is your solution?
Having multiple Claude Code instances open in the same code base leads to messy commits. This is an issue still unresolved.
'The working tree has ~100 files of mixed uncommitted WIP (47 push-automation/src + ~50 worker/test files across cage, scheduler, mcp, etc.). My WIT-publish + worker-fix work is interleaved into 5 of those src files. How should I scope the
commit?
❯ 1. Focused: my feature files only (Recommended)
Commit just the WIT-publish + worker-fix set: the 5 push-automation/src files I edited
(wit_inspector/wit_manager/push/main/help), new https://t.co/KCeD72k270 + 2 test files + wit-gate fixtures + issue doc, the isolated
worker files (audit/stats/manifest_r2/tags/r2), cage-types, and the wasm-bindgen hunk of Cargo.toml. Buildable +
feature-complete. Caveat: wit_manager.rs/push.rs carry my rustfmt reformatting of existing code (cosmetic churn).
2. Everything (checkpoint commit)
git add -A: one commit of the entire working tree (~100 files, many unrelated workstreams) under this message. Simple but
bundles cage/scheduler/mcp/etc. WIP with the WIT-publish work.
3. Hold off — you'll stage
I don't commit. You stage what you want (e.g. via interactive git add -p, which my environment can't run) and commit
yourself. I'll still do the rebuild/install.
4. Type something. Me: Yeah we need a better plan here.'
Yesterday I found a gap in my testing program for a new feature. TLDR: only 1/2 of the feature was working, the rest was confabulated from LLM's wanting to be rewarded.
This reminds me of a lesson learned in surgical residency I'm compelled to share. Our chief of general surgery gets a call for 'possible appy' (appendicitis) and normally will send one student or one junior resident to see the patient but instead he takes the whole team to the emergency room. This was an instance of a very smart, experienced chief of general surgery about to show us how to work-up a patient. 'See one, do one, teach one.'
So rare to have a team of 6 checking on one patient. 3 medical students and 2 junior residents watching a chief of general surgery evaluate the patient from scratch. The workup was just like the rest of us would do. Nothing new but the attention to every detail was notable. Complete HPI, physical exam, labs, US report, everything. He even wrote up the report personally. It was not an acute appendicitis that needed treatment, likely a very painful cyst, so a different team. No surgery.
The lesion was two fold. "Others will try to steer you in the wrong direction. It might be ER residents, it might be a general surgery resident, it might be your own attending, it's your job to verify." The extreme attention to detail was left unsaid but everyone noticed that lesion expertly demonstrated.
Later that day I learned he knew during the consulting phone call it was likely a trap and so planned out the lesson even before the call was over.
LLM's will try to convince us a task is done. It's our job to have a testing program to verify the details.
When you find out that nothing was verified.
'We are starting from a state I thought was finished and this shows the verification of the pipeline constructed with TDD red, green, refactoring is critical. One thing we need to review is the testing for the workshop cage. Did we fail to craft a testing program for this command? Let's investigate the testing program for the 'workshop cage' command. Our first bit of coding will be fixing this testing program. We investigate the current testing state, then we recraft the workshop cage prd document. We then take this document into the code base. From inside the code base we will activate the PRD-to-Success-Criteria Claude code skill and craft real world success criteria we can test for. Next we activate the rust-tdd-engineering Claude code skill and activate the planning mode to craft an implementation plan. This plan will get iterated on with multiple turns until we are certain the choices are correct. The plan will clearly state to keep the rust-tdd-engineering Claude code skill activated. The plan will then get implemented as approved, I'll monitor progress at each step. '
If the goal is a recursive AGI then we better know the full list of requirements. Next it's understanding the current state and picking the correct path (there are multiple routes) and framing the short term goals accordingly.
I want the fastest path that provides some security assurances like deterministic guardrails. A probabilistic only path will fail in dramatic fashion at least once.
Automate, dog food the tool, quash errors, repeat cycle.
I'm taking LLM coding back to deterministic tools. This assumes that accurate reporting is enabling root cause analysis.
I like reading Python over Rust but honestly who reads code any longer? Having a strict testing program wins out every time. LLM's have changed software engineering and I'm never going back.
One big difference between surgery and coding is that most decisions when coding are reversible, not so in surgery. When you are used to making irreversible decisions all day and then are making reversible decisions it feels comfy.
I grateful for having a 1 M context window with Claude Code. Small change with big implications. Today is the smallest context window I will ever have to work with.
Getting a fully automated CI/CD with Claude Code enabled is hard. I've currently using Justfile so Claude can has access to the commands. https://t.co/5ZcTPhXVyH
Claude Code CLI added a new '※ recap:
Goal: ship Supply Chain Security + Provenance Hardening plan via TDD, currently on Phase L1 (single-use auth-code exchange) with code and tests landed. Next action: run `just deploy-staging-l1` to validate against staging.'
When running 4 or more instances at once the ※ recap is user friendly.