Gonna delete X from my phone for a little bit. I may access from other clients, but I just need to reduce ease of access. The recent algorithm changes are negatively affecting my mentality, and I’m not adept enough to figure out how to change my own timeline.
I find that a lot of my hesitation from building new cool things with AI is just fear. I fear being surprised by how good it is, how fast it is, and what it means for my profession of writing code, which has now just been turned into a hobby.
Use case request: When reviewing an agent's plans or output, I want to directly edit OR comment on a specific text selection. Similar to the review functionality on Github Pull Requests. Current pain point: When our AI overlords flood us with amazing, comprehensive plans, I have to keep scrolling up long conversations to find the small detail I need changed, click back down in text input (which often resets scroll position, so I can't look at the text anymore), and reference the AI to change that specific text via a prompt, which sets off yet another long chain of agent thinking and response. Like no, I just need to make a tiny edit or comment on your long wall of text, which is otherwise perfect.
And here's how it all came together, piece by piece, over a few hours, and launching within 24. @MichelleHarjani with the play-by-play, including internal conversations in Basecamp and Fizzy:
https://t.co/KEtLy9C2mZ
We need a new Superhuman app, but instead of managing emails, it's for managing agents. Keyboard shortcuts to launch/respond to agents, fast optimistic UI, surface key info with focus on brevity. Maybe in the software dev context first, but can expand to OS level.
We need a new Superhuman app, but instead of managing emails, it's for managing agents. Keyboard shortcuts to launch/respond to agents, fast optimistic UI, surface key info with focus on brevity. Maybe in the software dev context first, but can expand to OS level.
@gdb This is one of the main, unspoken benefits of cloud agents! I have to keep laptop opened (and unlocked!) while waiting for local agentic calls to run.
@bcherny I am totally curious how you guys manage evals with this method. Is it usually effective enough, where you don’t have to track overall performance when you continue to add add add instructions, at the risk of providing conflicting instructions?
I think this is highly dependent on @damengchen family context. It’s true, the Taiwan govt does not usually give visas to Chinese citizens. Unless the Chinese citizen has lived abroad for over a year, then they can apply through a Taiwanese consulate outside mainland China (not possible from within China).
Thanks @v0 !
Though can't download the PNG on Vercel's platform yet - they need Puppeteer locally to run the headless Chrome to generate the PNG. But this was easy enough to setup with v0!
AI agent use case request: use voice input to create an event poster that I can save as PNG. I expect it to have as beautiful aesthetics as posters from Canva. I should be able to provide specific details with text input, such as location address or registration URL. I should also be able to input other images as layers, such as a QR code, so it places the QR code in a nice place for people to scan the poster.
AI agent use case request: use voice input to create an event poster that I can save as PNG. I expect it to have as beautiful aesthetics as posters from Canva. I should be able to provide specific details with text input, such as location address or registration URL. I should also be able to input other images as layers, such as a QR code, so it places the QR code in a nice place for people to scan the poster.
When you have a good PRD with a list of code changes to make, LLMs can reliably convert PRDs to good code. The PRD doc becomes the “architectural” thinking, which is where LLMs often struggle with mid- to large- codebases.
The LLM basically becomes your expensive “compiler” of English-to-code.
What’s great about focusing on PRDs is that they make it so much easier to for LLM agents to make the code changes that are in line with your expectations.
The next phase of this is, as you make edits to the PRD, the agent automatically makes the code changes in near real time, so you can almost see your app changes in real time.
So how are people managing PRDs these days?
If I were to create a wishlist for how I want to work, I would like:
- markdown: just so easy to focus on content, and LLMs are fluent.
- diff history: these PRD docs will be edited by both humans and agents. Something similar to how git does it.
- not tracked directly in codebase, as it doesn’t seem like the right place to be saving and editing PRDs. These documents are ephemeral, and if accidentally left in, might pollute the context for future humans and agents.
- but still should be “tied” to a codebase somehow, just like github issues are tied to a codebase
- git versioning, but “branches” and “merging” seems unnecessary: Just a way to track diff history and revert seems enough (for now).
- patching: humans and agents should be able to “patch” the file, ie edit in line, instead of rewriting the whole file on every save. For example, current pitfall of GitHub issues is their API only allows for create or edit of an issue description, not an in line patch. If PRDs (or any doc for that matter) become tens of thousands of tokens long, we wouldn’t want an LLM to have to re-write the whole doc just to make a single small edit.
There’s more but I forgot off the top of my head, I’ll follow up with another post
“The button should be in outline format, not primary color” —> agent updates PRD, codebase gets updated on a new branch, and we see the preview just seconds after we saved this updated PRD.
The flow could be:
1. PATCH update to a PRD
2. Webhook receives PRD update event, sends to coding agent on the cloud.
3. Agent context would include current PRD, PRD diff, access to codebase, etc
4. Agent would think through PRD diff, assess what needs to change on codebase, then make those changes on new branch
5. New branch has preview deployments through CI/CD