Anurag Kumar

@AcouIntel

Research Scientist, @GoogleDeepMind | Prev: @AIatMeta | CMU @SCSatCMU | @IITKanpur | Audio/Speech, Multimodal AI

Cambridge, MA

Joined June 2016

292 Following

2.1K Followers

216 Posts

Pinned Tweet

Anurag Kumar @AcouIntel

over 1 year ago

Looking forward to @NeurIPSConf #NeurIPS2024 next week, I am there from Dec 11th-15th. Join our Audio Imagination Workshop on Dec 14th for engaging discussions on all things in audio generation space. We have an exciting list of papers and speakers. https://t.co/AwFxXg4kZ8

AcouIntel's tweet photo. Looking forward to @NeurIPSConf #NeurIPS2024 next week, I am there from Dec 11th-15th. Join our Audio Imagination Workshop on Dec 14th for engaging discussions on all things in audio generation space. We have an exciting list of papers and speakers. https://t.co/AwFxXg4kZ8 https://t.co/3woqgJMwuw

AcouIntel retweeted

Sundar Pichai

@sundarpichai

18 days ago

Gemini Omni doesn't just build scenes that look real, it reasons about what should happen next. It combines an intuitive understanding of physics with Gemini's knowledge of history, science, and cultural context. Rolling out today starting with video outputs to Google AI Plus, Pro and Ultra subscribers globally through the @Geminiapp + Google Flow, and @YouTube Shorts this week.

390

754

750K

AcouIntel retweeted

koray kavukcuoglu

@koraykv

18 days ago

Today at Google I/O, we introduced Gemini 3.5 Flash! It has become an integral part of our daily research cycle and works with all the tools we have at Google. We used a team of agents in Antigravity 2.0 to recreate the original AlphaZero research paper and build a playable version. They coded the reinforcement learning pipeline in JAX/Flax, trained a ResNet model from scratch via self-play on multi-TPU pods, and shipped a full-stack web app so you can play against it, from just 2 prompts. . Here’s what else makes 3.5 Flash special 🧵

116

565

122

93K

AcouIntel retweeted

Jeff Dean

@JeffDean

18 days ago

1/ Today at #GoogleIO, we’re releasing Gemini 3.5, our latest family of models combining frontier intelligence with action. We’re starting by releasing 3.5 Flash, which is built to help you execute complex, long-horizon agentic workflows. Gemini 3.5 Flash is our strongest model for coding and agent https://t.co/m62cBJhIjJ outscores 3.1 Pro on agentic and coding benchmarks like Terminal-Bench and MCP Atlas, while running 4x faster than other frontier models. Used in Google Antigravity, 3.5 Flash is even further optimized to be up to 12x faster. It’s a powerful engine to deploy sub-agents that collaborate, run high-frequency iterative loops, and solve real-world problems at scale. Some highlights we’re excited about 🔽

JeffDean's tweet photo. 1/ Today at #GoogleIO, we’re releasing Gemini 3.5, our latest family of models combining frontier intelligence with action.

We’re starting by releasing 3.5 Flash, which is built to help you execute complex, long-horizon agentic workflows.

Gemini 3.5 Flash is our strongest model for coding and agent https://t.co/m62cBJhIjJ outscores 3.1 Pro on agentic and coding benchmarks like Terminal-Bench and MCP Atlas, while running 4x faster than other frontier models.

Used in Google Antigravity, 3.5 Flash is even further optimized to be up to 12x faster. It’s a powerful engine to deploy sub-agents that collaborate, run high-frequency iterative loops, and solve real-world problems at scale.

Some highlights we’re excited about 🔽

197

230

132K

Who to follow

Shinji Watanabe

@shinjiw_at_cmu

I'm working at CMU (2021-). I was working at NTT (2001-2011), MERL (2012-2017), and JHU (2017-2020). Speech and Audio Processing is my main research topic.

Yuki Mitsufuji

@mittu1204

PhD, Distinguished Engineer @Sony, Lead Research Scientist/VP of AI Research @SonyAI_global, Visiting Research Professor @nyuniversity, World's Top 2% Scientist

Nicholas J. Bryan

@NicholasJBryan

Head of Music AI, Adobe Research (personal account)

Anurag Kumar @AcouIntel

8 months ago

We are looking for reviewers for @ieeeICASSP 2026 for AASP areas. We received quite a bit more papers this cycle. If you don't currently review for ICASSP please consider doing so. Fill out the form below https://t.co/Vbv98lHogX

AcouIntel retweeted

Chenda @chenda54

9 months ago

🚀 Join the ICASSP 2026 URGENT Challenge! Advance Universal, Robust & Generalizable Speech Enhancement. 🗣 Track 1: Universal Speech Enhancement 🎧 Track 2: Speech Quality Assessment 🔗 https://t.co/bZ3edVhIGM #ICASSP2026 #SpeechEnhancement #AI #AudioProcessing

AcouIntel retweeted

Google DeepMind @GoogleDeepMind

11 months ago

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵

GoogleDeepMind's tweet photo. An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇

It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵

152

681

666

Anurag Kumar @AcouIntel

12 months ago

Check out the paper on Fri, Jun 13, ExHall D, evening session @CVPR #CVPR2025. Paper https://t.co/8L0NLvv6jY

166

Anurag Kumar @AcouIntel

12 months ago

Couple of papers at @CVPR #CVPR2025 (1) VisAH: Learning to Highlight Audio by Watching Movies. How do you transform a poorly mixed audio into a well-balanced audio ? VisAH learns to leverage visual cues by training from movies which naturally provides the required supervision

AcouIntel's tweet photo. Couple of papers at @CVPR #CVPR2025 (1) VisAH: Learning to Highlight Audio by Watching Movies. How do you transform a poorly mixed audio into a well-balanced audio ? VisAH learns to leverage visual cues by training from movies which naturally provides the required supervision https://t.co/kUcwt0JUeZ

613

Anurag Kumar @AcouIntel

12 months ago

(2) XRIR: Hearing Anywhere in Any Environment. A key problem in neural RiR estimation has been cross-room generalization. We make an attempt to address this and introduce a large scale dataset ACOUSTICROOMS, with 300,000 high-fidelity RIRs simulated from 260 diverse rooms.

AcouIntel's tweet photo. (2) XRIR: Hearing Anywhere in Any Environment. A key problem in neural RiR estimation has been cross-room generalization. We make an attempt to address this and introduce a large scale dataset ACOUSTICROOMS, with 300,000 high-fidelity RIRs simulated from 260 diverse rooms. https://t.co/Mu5aIT6fAl

138

AcouIntel retweeted

Nando de Freitas

@NandoDF

about 1 year ago

RL is not all you need, nor attention nor Bayesianism nor free energy minimisation, nor an age of first person experience. Such statements are propaganda. You need thousands of people working hard on data pipelines, scaling infrastructure, HPC, apps with feedback to drive benchmarks and data, tons of research and engineering on generative models, data mixtures, ablations, RL/selftraining, etc etc and we will probably need lots of people working hard to figure out safety, causal world models, awareness, models that create abstractions comparable to infinity and zero and use these to predict the existence of things like black holes and suggest experiments to verify such hypothesis, or come up with novel engineering designs to generate energy more efficiently, robotics, etc etc. It takes thousands of people and many ideas. In the end some simple ideas might become obvious but such obviousness only happens in retrospect. Yes, there is a bitter lesson but if we had followed it, we’d still be doing linear regression with RL. Let’s not oversimplify, but rather honour the research and engineering of thousands of people. Also, people keep rewriting history. When our language understanding start up (darkbluelabs) was acquired by Google about 10 years ago, we joined DeepMind, where the AGI documents were all about concepts, RL, episodic memories and made it clear that there was no room for language. To be honest, back then such a position wasn’t so crazy. Now it seems silly, but only because of the benefit of hindsight. There’s no 1 or 10 heroes in the history of AI. There’s many 1000s of hard working students, profs, engineers, operations and support people, product folks, managers, even hedge funds among others. Let’s honour the whole community and not just ceos or the philosophers of Bayes, RL, deep learning, etc. I look forward to learning from the next generation and seeing what they will achieve. To them: Don’t buy the existing narratives blindly, innovate. Remember that just like mathematics, AI will advance one grave at the time.

193

475

114K

Anurag Kumar @AcouIntel

about 1 year ago

(2) Reexamining the Efficacy of MetricGAN for Speech Enhancement. Led by @realHaibinWu. Showcases some crucial limitations of MetricGAN, and proposes some training tricks to address. (already presented, but check out the paper) https://t.co/lNouyTwyka (3/3)

AcouIntel's tweet photo. (2) Reexamining the Efficacy of MetricGAN for Speech Enhancement. Led by @realHaibinWu. Showcases some crucial limitations of MetricGAN, and proposes some training tricks to address. (already presented, but check out the paper) https://t.co/lNouyTwyka (3/3) https://t.co/Ml9WAw2TFf

209

Anurag Kumar @AcouIntel

about 1 year ago

@ieeeICASSP is finally happening at a place for which I don’t need a visa to travel 😀, but not able to attend this year #ICASSP2025. If you are there, check out these two papers I co-authored. (1/3)

339

Anurag Kumar @AcouIntel

about 1 year ago

(1) Advancing Active Speaker Detection for Egocentric Videos. Led by @huh_jaesung. SOTA for active speaker detection in challenging ego-centric videos. Session: Machine learning for multimodal data I Apr 11: 11:30 am - 1:00 pm. https://t.co/CCvoJ53nT6 (2/3)

AcouIntel's tweet photo. (1) Advancing Active Speaker Detection for Egocentric Videos. Led by @huh_jaesung. SOTA for active speaker detection in challenging ego-centric videos. Session: Machine learning for multimodal data I Apr 11: 11:30 am - 1:00 pm. https://t.co/CCvoJ53nT6 (2/3) https://t.co/hj1DWqLsGf

251

Anurag Kumar @AcouIntel

about 1 year ago

Career Update: Excited to join Google Deepmind @GoogleDeepMind to continue working on audio/speech/multimodal AI. I left Meta @Meta after more than 6 years and I will definitely miss working with some amazing friends and colleagues. Super thankful for all the fun collaborations.

146

108K

AcouIntel retweeted

Shrestha Mohanty @shremoha

about 1 year ago

So happy to share that our work has been accepted to @SIGIRConf. Thank you to my amazing collaborators! @NegarEmpr, Andrea Tupini, Yuxuan Sun, @Tviskaron, @artemZholus, @Cote_Marc and @julia_kiseleva Pre-print: https://t.co/ctdcHZTVqw

AcouIntel retweeted

arXiv Sound @ArxivSound

over 1 year ago

``Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment,'' Joanna Hong, Sanjeel Parekh, Honglie Chen, Jacob Donley, Ke Tan, Buye Xu, Anurag Kumar, https://t.co/29Yg1je6zB

Anurag Kumar @AcouIntel

over 1 year ago

The paper explores how LLMs can be used to effectively contextualize excerpts from conversations to improve understandability, readability, and other factors and reduce misinterpretations.

822

Anurag Kumar @AcouIntel

over 1 year ago

Exciting new work focusing on comprehension of long-form social conversations @coling2025 #COLING2025. https://t.co/hpO8TFn4Bg. All thanks to the hard work of @shremoha.

AcouIntel's tweet photo. Exciting new work focusing on comprehension of long-form social conversations @coling2025 #COLING2025. https://t.co/hpO8TFn4Bg. All thanks to the hard work of @shremoha. https://t.co/q0PxuMJajW

AcouIntel retweeted

Shrestha Mohanty @shremoha

over 1 year ago

Excited to share our work at @coling2025! While I couldn’t attend in person, @jad_kabbara will be presenting today at the 1:30 PM poster session. Come by to learn how we’re using LLMs to improve understanding in social conversations! #COLING2025 #NLProc

Anurag Kumar

@AcouIntel

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users