People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way.
We share our approach, early results, and a quick look at our model in action.
https://t.co/AFJZ5kH7Ku
We are offering grants of $100,000 + Tinker credits to researchers advancing the field of human-AI interactivity. Submit your proposals by June 19th!
https://t.co/907HfBy7g3
P.S. The demo is basically my life at thinky: I start to cut coffee, @liliyu_lili is visually prompt-injecting my human intelligence with sweet snack every day, and I've gained weight since joining TML.
Congrats Rowan and Thinky team on the cool release!
I remember you mentioned having a v different vision of multimodal interactions a few weeks ago @rown so this is what that looks like! 🆒
It’s exciting to see this release going beyond just a single model, showcasing truly different native multimodal interactions too.
A couple things from the nicely written blog really resonate with me:
1. people are most effective when they can collaborate with AI the same way they do with other people
2. existing interfaces limit human inputs (esp multimodal ones) to the model, and this input limit needs to be lifted to unlock much better interactivity
The blog also reminds me of the fun and challenging discussions with @shannonzshen and others on what “scaling collaboration” can look like. we made an initial attempt describing our vision: https://t.co/YEHvWeH7LR
It’d be great to see more human centric evaluations of the model/system/interface too — looking forward to it🥂
We started Thinking Machines to advance human-AI collaboration, and this is our first bet on what that looks like. Most labs treat autonomy as the goal and interactivity as scaffolding around a turn-based core. We think the way we work with AI matters as much as how smart it is. Interactivity has to be in the model, and it has to scale with intelligence rather than trail behind it.
https://t.co/U4c0uC7tnT
In the past few months, we had a lot of fun (and stress 😅) to produce 12 versions (+ many subversions) and 137 pages in our training run log book.
Turns out human-human collaboration is important to improving human-AI collaboration. 😊
My first share since joining @thinkymachines. Fun working with this team on real-time multimodal interaction. Vision in turn-based models felt like flipping through photos — continuous video is a different problem.
Visual proactivity is essential — grateful to have worked on this alongside @liliyu_lili, @rown , and the rest of the team!
I'm excited to share some of our work at @thinkymachines. As models get more intelligent, the bottleneck is increasingly how quickly and seamlessly we can access their intelligence, and today we are sharing a preview of how we think about human-AI collaboration.
@liliyu_lili@saurabh_garg67@AndreaMadotto If you're interested in working on realtime video+speech specifically, or human AI collaboration more generally, please reach out!
Our interaction model is the first general video+speech model that's visually proactive. It was super fun working on this with @liliyu_lili / @saurabh_garg67 / @AndreaMadotto and others - after countless versions it was amazing when visual interruptions suddenly worked!
We’re interested in AI systems that can collaborate in real time, without relying only on artificial turn boundaries.
For audio, this feels natural: listen, speak, interrupt, update.
For video, we think an important version of this is visual proactivity — models that respond when something happens visually:
“Tell me when I start slouching.”
“Count my pushups.”
“Say stop when the person stops doing X.”
We’re interested in AI systems that can collaborate in real time, without relying only on artificial turn boundaries.
For audio, this feels natural: listen, speak, interrupt, update.
For video, we think an important version of this is visual proactivity — models that respond when something happens visually:
“Tell me when I start slouching.”
“Count my pushups.”
“Say stop when the person stops doing X.”
People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way.
We share our approach, early results, and a quick look at our model in action.
https://t.co/AFJZ5kH7Ku
vision🍌 is here https://t.co/Ued6GGk4Et
if you got into computer vision the way I did, starting with pixel-level labeling tasks like segmentation, edges, depth, or surface normals, you’ll probably feel the same seeing these results -- something big has quietly shifted, and it’s going to change how we approach these problems for good 🧵