Focus on the hallucinations and serialization to disk issues - we see these regularly through multiple agentic interfaces. We use Grok 4 Fast extensible for spec writing, data model generation and other high level tasks. We don’t trust it for coding beyond demonstrations. You can’t trust the model because inexplicably it will sometimes not persist to disk or will alter things unexpectedly. My AI Christmas wish is Grok 4 Fast ready as a daily driver. For now we rely on GPT5 and GPT5-Codex for high complexity code, Claude Sonnet 4.5 for main tasks and GLM 4.6 for a range of dev. Grok Code Fast only for unit tests and small refactors. I feel like Grok 4 Fast could do it all but has to move beyond benchmarks. Others have said same in comments. Set up a problem, and run head to head - you will see. Fantastic potential and already our go-to for many arch tasks. Sharpen the code generation and stop the hallucinations!
Bringing up Omarchy (https://t.co/05btNwBSIz) today on a laptop. I hadn't used tiling on Hyprland before - it took a few minutes to get the hang of the keys. Seriously fast, crisp screen updates. Techies, give it a try!
I use it a fair bit also and get good results generally. Find responses oddly terse compared to interactions through Cline. Only issue I see is difficulty with editing sometimes - repeated errors on replace that I don’t see with Anthropic models. Results are very good though across refactoring, test coverage and complex design.
@_catwu I have two requests for your consideration in the Claude Code UX. We use a lot of windows at the same time, and switch models particularly at certain stages of development with our projects.
1) How about if we added the current model in the lower ribbon next to where you show the "auto" or plan mode?
2) What about a "/title" or similar command that would let you set a window with text or with a template variable?
Claude Code is awesome!
Similar experiences - just create a task breakdown structure and a plan file that has assignments. You can start by just copy pasting between windows or using file as SSOT. The interplay is also fascinating. I effectively end up sometimes with a kind of simplistic specialization. Assistants working in tandem from shared framework will literally recommend delegating some tasks or bugs while taking others. Then I have the integrator give performance assessments to other channels. They respond in turn. Seems to have lowered some of the acceptance gaps we get at end of large tasks. Probably spending a little extra in tokens to drive the assessments :). My advice is don’t even worry about automating at first other than having shared plan file structure. Had two front end assistants running today and by end of phase one was self appointed mobile expert while other was e2e test expert. What matters is the code is better, sooner.
Some feedback on Teams. I ended up switching from Teams to Pro because you were rolling features to Pro ahead of Teams. With the change in Pro Ultimatr, how will previews and context and other aspects of the service differ between the two ? Seems like organizations should have incentives to use Teams.
Great updates - looking forward to the auto-context save. Have built some processes around compaction with task files as I think context is lost on the save to some degree (though recent updates help). Over last 3 to 4 days seems like token usage is higher. Have been using on a medium complexity data proc tool from first plan and getting pretty good results. Appreciate the rapid updates!
I recommend giving Claude Code a try even if you are a Cursor, Continue or Windsurf user. I have found a number of tasks that seem to work better in this shell-based interface. Even for those who are not experienced developers, the text interface is easy to navigate. The tool works equally well on really complex design tasks - I'll post some observations on those flows when completing some other work.
For the techies and vibe coders working on projects, I have some observations on the agentic workflow for project setup. When you get beyond a one shot development and need a multi-step project setup, it takes a bit of work to get that environment up. What package manager? What process? etc. etc. This setup can be frustrating even for experienced developers. With some of our current work, I thought it would be good to test how well we could automate package / repo setup from a plan doc. Here are some initial results from Claude Code (@anthropic) and @windsurf_ai
Then I used Claude Code on the same project. Work is progressing so fast in this space - things can literally change in a week's time. I found Claude Code actually better at this setup step for several reasons. By this time I had experimented with manually creating the folder first. When doing this, Claude Code automatically thought to look one level above. Running `/init`, the tool used my project file (one level up) and wrote a very good https://t.co/eWkWiyXOhA. It then sorted out the bootstrap without any issues, even adding some extra validation steps I had not documented.
@windsurf_ai Is the Claude Sonnet interface really hitting 20241022? Doesn’t seem like it. That would be good to remedy if not. And BYOK. Think you can work out a model for billing that allows this.
In complex coding work, I find it very difficult to keep context relative to other tools. I'll post / share files and then 2-3 messages later find out that the context is lost. Probably there is something I am missing. The actual interface is great - rules support refined well with recent changes. At present, whether because of my ignorance on what to do or because of actual gaps, I work through harder problems with https://t.co/f0tSTyMI3V interface.
For developers out there, would like to recognize the @windsurf_ai team for great work over the last few releases. I've been using v1.3.4 in agentic mode to work on a small-ish Bun/Typescript project we are releasing at end of month. The interface is getting much better every week! Finding the diagnostic loops to work better with Claude. Have had to nudge a few times on unit tests and some syntax issues. Overall, though, this is great!
@ScottPresler Scott, you and your colleagues set the standard for diligence, vision, passion and commitment to your principles. Ignore the haters and keep your eye and your heart on what you know to be right!
@jarredsumner I am going to deploy some scripts using new S3 in the next couple of weeks. Great idea! More SSH and SFTP also good as others have noted.
New threads, broader mission: helping transactional businesses and their partners wrangle their scaling problems into submission. We've leveled up our game to turn your business challenges into solutions. Want to see what we're cooking? Hop over to our new website at https://t.co/LWr24NVqQG. Call or message us - we would love to explore how we can help you! #3LEAPS #ScaleUp