Right now economists are debating whether 2019 will see a recession. Short-term is hard to predict, but I predict that in the next 5 years, AI adoption across multiple industries--especially outside the software industry--will drive massive global GDP growth.
@elonmusk@stats_feed Why are you messing with this media nonesense. Please get back to basics and progress our science and technologies. Playing with sapiens like toys doesn’t work
I know your timeline is flooded now with word salads of "insane, HER, 10 features you missed, we're so back". Sit down. Chill. <gasp> Take a deep breath like Mark does in the demo </gasp>. Let's think step by step:
- Technique-wise, OpenAI has figured out a way to map audio to audio directly as first-class modality, and stream videos to a transformer in real-time. These require some new research on tokenization and architecture, but overall it's a data and system optimization problem (as most things are).
High-quality data can come from at least 2 sources:
1) Naturally occurring dialogues on YouTube, podcasts, TV series, movies, etc. Whisper can be trained to identify speaker turns in a dialogue or separate overlapping speeches for automated annotation.
2) Synthetic data. Run the slow 3-stage pipeline using the most powerful models: speech1->text1 (ASR), text1->text2 (LLM), text2->speech2 (TTS). The middle LLM can decide when to stop and also simulate how to resume from interruption. It could output additional "thought traces" that are not verbalized to help generate better reply.
Then GPT-4o distills directly from speech1->speech2, with optional auxiliary loss functions based on the 3-stage data. After distillation, these behaviors are now baked into the model without emitting intermediate texts.
On the system side: the latency would not meet real-time threshold if every video frame is decompressed into an RGB image. OpenAI has likely developed their own neural-first, streaming video codec to transmit the motion deltas as tokens. The communication protocol and NN inference must be co-optimized.
For example, there could be a small and energy-efficient NN running on the edge device that decides to transmit more tokens if the video is interesting, and fewer otherwise.
- I didn't expect GPT-4o to be closer to GPT-5, the rumored "Arrakis" model that takes multimodal in and out. In fact, it's likely an early checkpoint of GPT-5 that hasn't finished training yet.
The branding betrays a certain insecurity. Ahead of Google I/O, OpenAI would rather beat our mental projection of GPT-4.5 than disappoint by missing the sky-high expectation for GPT-5. A smart move to buy more time.
- Notably, the assistant is much more lively and even a bit flirty. GPT-4o is trying (perhaps a bit too hard) to sound like HER. OpenAI is eating Character AI's lunch, with almost 100% overlap in form factor and huge distribution channels. It's a pivot towards more emotional AI with strong personality, which OpenAI seemed to actively suppress in the past.
- Whoever wins Apple first wins big time. I see 3 levels of integration with iOS:
1) Ditch Siri. OpenAI distills a smaller-tier, purely on-device GPT-4o for iOS, with optional paid upgrade to use the cloud.
2) Native features to stream the camera or screen into the model. Chip-level support for neural audio/video codec.
3) Integrate with iOS system-level action API and smart home APIs. No one uses Siri Shortcuts, but it's time to resurrect. This could become the AI agent product with a billion users from the get-go. The FSD for smartphones with a Tesla-scale data flywheel.
i love you all.
today was a weird experience in many ways. but one unexpected one is that it has been sorta like reading your own eulogy while you’re still alive. the outpouring of love is awesome.
one takeaway: go tell your friends how great you think they are.
Smart spaces, homomorphic encryption, generative AI, graph technologies and the metaverse will transform entire markets 📊 Find out more from Gartner's Emerging Technologies and Trends Impact Radar: https://t.co/pv7RsvGbYe #GartnerTGI#EmergingTech
Stage separation confirmed! The @SpaceX Dragon is now floating freely and flying toward the @Space_Station with science, supplies, and holiday treats aboard for the @NASA_Astronauts.