As an ex-Viv (w/ Siri team) eng, let me help ease everyone's future trauma as well with the Fundamentals of Assisted Intelligence.
Make no mistake, OpenAI is building a new kind of computer, beyond just an LLM for a middleware / frontend. Key parts they'll need to pull it off:
Persistent User Preferences:
- The biggest unlock of assistants has always been to deeply understand what someone wants in the most specific way.
- This is the "wow" moment where computers stop being scary and start feeling truly helpful.
- We did this in 2016 on Viv (https://t.co/aQSbFfRNde) when our AI knew what you liked for each and every service you used via Viv and mixed that in with context like what kind of flowers you told us your mom liked.
- This will need to include access to your personal information to infer preference as well.
External, Real-time Data:
- 50% of the utility of an LLM comes from the base training and RLHF fine-tuning; but much more comes from extending its available data with external sources.
- Zapier, Airbyte and others will help, but expect deep integration with 3rd party apps / data pipelines.
- "Chat w/ PDF" is a tiny, tiny part of this. If you're only building that, think much bigger.
Actual Computing on a Virtual Machines:
- Context windows are limiting, so AI providers will continue benefiting from running tasks directly on a Python or Node/Deno virtual env so it can consume huge amounts of data just like a computer today can.
- Today these are short-lived envs used by Data Analyst / Julius, but over time they'll become a new type of Dropbox where your data is persisted long term for additional processing or cross-file inference / insights.
Agent Task / Flow Planning:
- Planning can't function without intent. Understanding intent has always been a holy grail, and LLMs finally helped us unlock what we spent years approximating at Viv with NLP tricks.
- Once intent is accurate, planning can start. Creating an agent planner is incredibly nuanced and will take significant integration with user preferences, 3rd party data sets, knowledge of compute capabilities, etc.
- The bulk of the real magic of Viv was the dynamic planner / mixer that would pull all these data and APIs together and generate both a workflow AND dynamic UI on top of them for a normal consumer to execute.
An App Store of Experts:
- Apple initially made the mistake of building a closed app store; then realized they could monetize a cornucopia of creativity if they opened it.
- Regardless of OpenAI saying they're focused on ChatGPT and only ChatGPT, it's inevitable they'll rescope it and enable a long tail of specialized assistants.
- Builders will be able to compose multiple tools together into workflows that can specialize
- And AIs over time will be able to auto-compose these tools together as well, learning from the builders that came before them.
Persistent, Contextual Memory:
- Embeddings are helpful, but they are missing fundamental parts like context switching, conversational centroids, summarization, enrichment, etc.
- Most of the cost of LLMs today comes from prompts, but as history and persistence is embedded and the inference cached, this will unlock the ability to have long term memory with pointers to critical subjects, topics, feelings, tone, etc.
- Core memory is just the beginning. We still need all the rich information our minds conjure when we think about a past sunset, a breakup, a scientific understanding, or sensitive context for people we interact with.
Long Polling Tasks:
- "Agent" is a loaded word, but part of the intent is to have tasks that can be scheduled and self-completing regardless of the time horizon required.
- E.g. "Let me know when flights from Montréal to Hawaii are less than $500"
- This will require coordination of compute across API providers, as well as virtual envs in the cloud.
Dynamic UI:
- Chat is not the final, end-all interface. There's a reason apps have affordances like buttons, date pickers, images. It simplifies, clarifies.
- AI will be a copilot, but to be a copilot it'll need to adjust to what works best for a given user. The future is personalized as optimizations require it, so UI will be dynamic.
API & Tool Composition:
- Expect AIs to generate custom "apps" in the future where we can build our own workflows and compose together APIs, without waiting for a big startup to do so.
- Fewer apps and startups will be needed to generate frontends, and AI will be better at composing an array of tools and APIs together coupled with a gas fee / tax.
Assistant-to-Assistant Interaction:
- There will be countless assistants in the future, with each assisting humans and other assistants towards some greater intent.
- Alongside this, assistants will need to learn to interface across text, APIs, file systems, and other modalities used both by agents / startups and humans as integration flows deeper into our world.
Plugin / Tool Stores:
- Specialized assistants can only be made possible by composing tools, APIs, prompts, data, preferences, and much more.
- The current plugin store is super early days, so expect much more work to come, and expect many of those plugins to be rolled in-house as they become more mission critical.
And this is just a 10 minute brain dump; much, much more is needed behind the scenes including internet search and scraping, community (for intent, building, RLHF, etc), dynamic API generators and connectors, gas fees, tool builders, ingestion via glasses / earbuds / etc. If you think it's too late to be in AI, just know the above is about 25% of what it'll actually take, with much more to come as we iterate and get even more creative.
We're in the early days of building parts of this at @FastlaneAI but with a different understanding: OpenAI will never be the best at everything. So we want to let you use the best AIs in the world, regardless of who builds them (that could be you!). Come join the fun!
I don’t know why there is any surprise.
Here’s OpenAI’s product strategy for the next 2 years:
- you will be able to upload anything to ChatGPT
- you will be able to link any external service like Gmail, Slack
- ChatGPT will have persistent memory, no more multiple chats unless you want it
- ChatGPT will have a consistent, user customizable personality including political bias
- ChatGPT will be able to respond by text, voice, images (diagrams and video still ?? In this timeframe)
- ChatGPT will become much much faster until you feel it’s a real person (>50ms response time)
- Hallucinations and non factual errors will decline rapidly
- as self moderation improves, question rejection will decline
Anthropic being unprepared for this scenario makes me wonder what else the labs are unprepared for
Anyone working in gov/military knows this is the most basic of basic scenarios to support
Building a superintelligence requires much, much more
@apples_jimmy Not unexpected (and that's more concerning that it still happened)
Also concerning they'd refuse to patch a simpler jailbreak though
https://t.co/frHiEz7GA1
Anthropic being unprepared for this scenario makes me wonder what else the labs are unprepared for
Anyone working in gov/military knows this is the most basic of basic scenarios to support
Building a superintelligence requires much, much more
@Miles_Brundage Except Anthropic seems to have rejected requests to patch the jailbreak. If true, not a good precedent as AI gets stronger
https://t.co/frHiEz7GA1
Anthropic being unprepared for this scenario makes me wonder what else the labs are unprepared for
Anyone working in gov/military knows this is the most basic of basic scenarios to support
Building a superintelligence requires much, much more
@martin_casado I'd read the tea leaves more on this one... comes down to trust when the stakes are still low / easy
If they can't do the right thing by fixing a simpler jailbreak now, can they be trusted when models are much stronger?
https://t.co/frHiEz7GA1
Anthropic being unprepared for this scenario makes me wonder what else the labs are unprepared for
Anyone working in gov/military knows this is the most basic of basic scenarios to support
Building a superintelligence requires much, much more
Anthropic being unprepared for this scenario makes me wonder what else the labs are unprepared for
Anyone working in gov/military knows this is the most basic of basic scenarios to support
Building a superintelligence requires much, much more
Anthropic being unprepared for this scenario makes me wonder what else the labs are unprepared for
Anyone working in gov/military knows this is the most basic of basic scenarios to support
Building a superintelligence requires much, much more
Anthropic being unprepared for this scenario makes me wonder what else the labs are unprepared for
Anyone working in gov/military knows this is the most basic of basic scenarios to support
Building a superintelligence requires much, much more
Anthropic being unprepared for this scenario makes me wonder what else the labs are unprepared for
Anyone working in gov/military knows this is the most basic of basic scenarios to support
Building a superintelligence requires much, much more
Anthropic being unprepared for this scenario makes me wonder what else the labs are unprepared for
Anyone working in gov/military knows this is the most basic of basic scenarios to support
Building a superintelligence requires much, much more
@ClementDelangue Using classifiers to reject is a lazy solution (and LLMs are essentially fancy classifiers)
Need to start being truly serious on safety research
https://t.co/FFDZ2bBB7m
Anthropic’s argument is flawed. If you jailbreak Fable, you get a weapon.
If you jailbreak GPT 5.5, you don’t.
A true safety-oriented lab would appreciate that.
Anthropic’s argument is flawed. If you jailbreak Fable, you get a weapon.
If you jailbreak GPT 5.5, you don’t.
A true safety-oriented lab would appreciate that.
Anthropic’s argument is flawed. If you jailbreak Fable, you get a weapon.
If you jailbreak GPT 5.5, you don’t.
A true safety-oriented lab would appreciate that.
Anthropic’s argument is flawed. If you jailbreak Fable, you get a weapon.
If you jailbreak GPT 5.5, you don’t.
A true safety-oriented lab would appreciate that.
I’ve had a number of conversations with folks inside and outside government about the current situation with Anthropic, and here is what I believe to be true:
— As we know, Anthropic publicly released its Mythos class models earlier this week under the commercial name Fable.
— Fable is Mythos with guardrails. But if those guardrails fail, then you’ve exposed Mythos and its advanced cyber capabilities to people who shouldn’t have them. (Keep in mind that Anthropic itself widely promoted the idea that Mythos was a cyberweapon and needed to be regulated as such. They asked for government regulation of Mythos and championed the guardrails on Fable. If there is a vulnerability — big or small — it is Anthropic’s responsibility to patch.)
— A highly credible trusted partner of both Anthropic and the USG who was testing Fable came forward with a jailbreak of those guardrails. The Admin asked Dario to fix the jailbreak or de-deploy the model. Dario refused.
— In their blog post, Anthropic defended its decision by saying the jailbreak isn’t serious. That is not what the trusted partner and the USG believe; nor is that kind of minimizing language consistent with Anthropic’s brand as the AI safety company. It’s difficult to fathom how they could claim a jailbreak allowing operability of a cyber weapon could be defined as not “serious.”
— In the past, Anthropic has always said that safety must be top priority and taken super seriously. In this case, Anthropic prioritized the continued offering of the consumer model over safety.
— In reaction, the Admin issued the export control. The Admin did this reluctantly. It’s been very surprised that Anthropic hasn’t wanted to cooperate with a reasonable safety request (ie fixing the jailbreak issue). Anthropic’s reaction is very much at odds with their branding and ethos as a safe AI research community.
— The Admin’s hope now is that Anthropic remediates the safety issue, the export control is lifted, and Fable goes back into general release. The Admin wants all of this to happen as soon as possible. It is frankly bewildered that Anthropic hasn’t wanted to comply with safety requests that it previously said were its highest priority.
— Those trying to misdirect and tie this action to the prior DoW/Anthropic issues are wrong. The Admin values Anthropic’s technical capabilities and feels that this issue, while serious, should be easily resolved. The ball is in Anthropic’s court.