@GaryMarcus Gary, curious if you feel the same if we apply this framework to whatever the next wave of AI models are (world models / neurosymbolic AI)?
Here's the link (still EXTREMELY early, only try it if you're willing to deal with issues):
https://t.co/UZ37ICk6Pj
Shout out to @mattpocockuk and @nicopreme for /grill-with-docs and /visual-explainer. This heavily leans on the foundation provided by both.
I think I've cracked generative UI...
(gh link in comment)
The secret is to leverage AI at runtime to generate state, not HTML/CSS.
The skill below is my first foray into this approach, targeted specifically for replacing plan.md documents when pair programming with AI.
In other words... give the AI a pre-written HTML layout template. Then, give it a JSON file to populate. The HTML parses the JSON into a JS variable and renders the UI accordingly.
Add a lightweight service that polls for changes to the .json file and rebuilds, and you now essentially have AI as the "backend" of your UI.
The skill currently works with vanilla JS and HTML, but I don't see any reason why this approach can't work with any JS framework.
HONEST CAVEATS
There are lots of places where this approach will NOT be useful.
It's still slow for most users. Some UX approaches will make it feel better but at the end of the day you are still hostage to how long the LLM turn is.
Since you still need to write the custom components in your framework of choice, this doesn't get rid of actually designing the UI components.
There is real security work to consider before you put this in front of untrusted users. This approach does limit some attacks especially if you validate the JSON that the AI spits out, but isolating the session is an important part of getting something like this into production.
And of course, there's the cost concerns of GenUI.
That said...
I think this approach gives you the best of all worlds:
* It only generates the things that really need to be generated. Layout, programmatic logic, all can still be pre-defined.
* Since the AI only generates component state, and only receives the minimal amount of user intent data (more on that shortly) it has the lowest possible context bloat compared to pure text output. Primarily because the AI doesn't have to waste tokens on css or html. It just writes the data.
* Of course, fewer tokens also means faster iterations of the page. You're still going to have LLM-style wait times, but token spend can be less than 1/10th what it would be writing the entire HTML/CSS from scratch.
* Reliable Dynamic Layout: Every item in a list in your JSON could represent the state for a component. Meaning that the result can still feel extremely generative, but still constrained to well-tested UI components.
But where it really gets to the next level is when you start sending messages to the LLM directly from the UI itself.
My current approach is simple - the dev server has a `POST /intent` endpoint that takes either structured or unstructured user intents.
Request bodies get put in an inbox queue that triggers the AI session.
Structured Intents are for user interactions that are known in advance that the AI should be informed of. Things like form submissions or clicks that should trigger the AI to do some work. These calls include a IntentKind enum so we can
Unstructured Intents are for anything else. Think right-click-and-drag + "Send comment to AI", but for anything, on any page.
With just this single endpoint, you get an interactive experience where the AI gets infused into the runtime of your SPA. It's pretty neat.
Overall, generative state seems like a very promising approach to me for us to move beyond AI chat applications and towards AI native SaaS experiences.
The skill below is my first foray into this approach, targeted specifically for replacing plan.md documents when pair programming with AI.
@ZachSDaniel1 The A in OAuth is supposed to be capitalized, which was already triggering me earlier, and then you had to go ahead and add a typo ๐ My eye is twitching lol
Great work though!
@DomGalamini@Sabremetrix I hope the org reconsiders the salary of these roles. This is roughly 50% of market value.
Good talent is worth the investment.
@omarsar0@dair_ai I want to incorporate more of this myself. Can you share what your interop layer is between the AI layer generating markdown and the HTML? Is it an HTML template that AI writes into, or is it a client app that is reading from local MD files, or is it full generative UI?
Can you elaborate on this piece?
> every chunk carries its ancestor headings with it
> which means an H3 that says "conclusion" gets indexed with ZERO context
These things can't both be true, unless I'm misunderstanding. The word "conclusion" would have the ancestor headings as context.