To cut those 2-3 seconds shorter, you need word by word incremental processing that would implement real-time understanding of the input; meaning that some thinking/tool calls happen before the end of an utterance.
Still an under-explored area of research! 🤖
Reasoning LLMs typically take 2-3 seconds to start emitting tokens. In a voice agent, that's 2-3 seconds of silence after the user finishes speaking.
The @MiniMax_AI team just shipped a community contribution to Gradbot with two models running in parallel. MiniMax-M2-her produces a short acknowledgement that starts streaming to TTS immediately, while MiniMax-M2.7 runs in the background reasoning and tool calls.
Thanks to @davidtaoweiji for this contribution. Checkout our readme for more details.
https://t.co/gxSTdrCiAm
Excited to share our CVPR 2026 paper, ARC Is a Vision Problem! 🖼️
The Abstraction and Reasoning Corpus (ARC) is often approached as a language reasoning problem, despite being an inherently visual puzzle for humans.
🧩Introducing Vision ARC (VARC)🧩: we reframe abstract reasoning as an image-to-image translation problem, solved by a plain Vision Transformer.
AI agents are advancing research-level math. 🚀
I’m thrilled to share @GoogleDeepMind’s AlphaProof Nexus - an agentic framework for formal proof search powered by Gemini.
When applied to a set of open formal math problems, our agent autonomously solved:
✅ 9 open Erdős problems (including two open for 56 years!)
✅ 44 Online Encyclopedia of Integer Sequences (OEIS) problems
✅ A 15-year-old open problem in algebraic geometry ✅ A 7-year-old open question in min-max optimization
We are collaborating with mathematicians across disciplines - from combinatorics and graph theory to quantum optics. Ultimately, these results show the massive potential of even simple agentic loops powered by Gemini.
Read the paper here: https://t.co/c5M9ZjRXU1
The social phenomenon of certain words becoming trendy because of AI can/should be studied.
I can think of "harness" and "bespoke" as some recent ones.
Attention @arxiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. 1/
@TessWOfficial And that's beside compositional generalisation, data efficiency, reasoning & planning, interpretability & explainablity, compute power, safety & controllability (all under active research, but also "given" in symbolic models.
2/2
Just remembered this: I confess it's shortsighted to call LLMs stochastic parrots really.
But no doubt they are inefficient and intractable.
🤞🏻I get to contribute to fixing it.
@TessWOfficial They are very good at certain (many but not all) complicated tasks, say olympiad math questions and coding.
But at the same time, they have unstable performance, therefore unreliable. 1/2
The social phenomenon of certain words becoming trendy because of AI can/should be studied.
I can think of "harness" and "bespoke" as some recent ones.
If you use GitHub (especially if you pay for it!!) consider doing this *immediately*
Settings -> Privacy -> Disallow GitHub to train their models on your code.
GitHub opted *everyone* into training. No matter if you pay for the service (like I do). WTH
https://t.co/vcSkhM5yLV