@nickaturley I'd really like you to maintain and update Standard Voice Mode while keeping the legacy asr->llm->tts architecture. With the new asr gpt-4o-transcribe replacing Whisper, it would be a perfect service! The current accuracy of SVM speech-to-text needs to be improved.
@sama I'd really like you to maintain and update Standard Voice Mode while keeping the original asr->llm->tts architecture. With the new asr gpt-4o-transcribe replacing Whisper, it would be a perfect service!
@nickaturley Standard Voice Mode is intelligent because it's based on the true flagship model, so it can use tools like Code Interpreter correctly and respond intelligently to user queries by voice. It's an incredibly powerful tool by ChatGPT!
@sama Within the ChatGPT mobile or desktop apps, it would be super useful to have a free-text search box inside the chat (similar to what browsers offer with Ctrl-F)
@sama Why not experiment with recursive self-improvement? A model could learn from its own interactions and propose pull requests to itself.
An adversarial "reviewer" reasoning-model would then accept or reject them, passing only the best changes to human approva.
@sama [4] From that point, the Agent will only ask for confirmation in case of unexpected events. Everything runs inside a local app, able to also leverage your own computational resources.
@sama [1] Imagine a local Agent (or one running on a user-made VM) that automates your work. You first show it what to do step by step, in person. The Agent observes, then discusses improvements with you, and finally carries out the agreed tasks.
@sama [3] Once the first cycle of actions is complete, if they repeat, the Agent asks if it can continue autonomously. If you confirm, you take responsibility, but you can always stop it instantly with a big emergency button.
@sama [2] For safety, the first time the Agent acts it asks confirmation for every single step. Before doing anything, it explains the action (even with an auto-generated image) and waits for your approval.
@sama Would be good to be able to call a Custom GPT through the API. You could potentially even allow developers to set their price and that would be on top of OpenAI’s API pricing.
@sama It would be very useful to have the option to replay an entire chat as a full back-and-forth dialogue between user and model, ideally with two different voices.
@sama In Advanced Voice speed doesn’t have to sacrificing reasoning:
– Start with a fast, low-latency first sentence
– while it’s being spoken, have the model reason more deeply before generating the second and subsequent sentences, it has plenty of time to do this well.
Staggered gen.
@sama Save Standard Voice Mode! With Advanced Voice Mode, the “mind behind the voice” is missing. AVM lacks intelligence and the ability to use tools. This drastically limits its use: for work or serious study it’s practically useless—leaving only a playful or companion-like experience
@nickaturley Similar to the retirement of previous AI models, @OpenAI will retire the Standard Voice on September 9, 2025, which is currently the only way to communicate by voice with the full-power frontier model capable of performing actions, reasoning, and using Code Interpreter tool