Aviv Navon

@avivnav

ML Researcher

Joined April 2020

107 Following

65 Followers

21 Posts

avivnav retweeted

doubleAI

@_doubleAI_

about 1 month ago

We ran WarpSpeed, our autonomous optimization agent, on @NVIDIA's new SOL-ExecBench for a single day. It took first place by a wide margin, beating the optimized kernels on 90% of problems, with an average speedup of 2.24x. ExecBench gathers 235 of the hardest CUDA kernels in production today, lifted from real workloads in DeepSeek, Qwen, Gemma and Kimi. Blackwell kernels are notoriously hard to write. But we find that verification is just as hard. We have a story to tell. https://t.co/aQX9XXCz4z

_doubleAI_'s tweet photo. We ran WarpSpeed, our autonomous optimization agent, on @NVIDIA's new SOL-ExecBench for a single day.

It took first place by a wide margin, beating the optimized kernels on 90% of problems, with an average speedup of 2.24x.

ExecBench gathers 235 of the hardest CUDA kernels in production today, lifted from real workloads in DeepSeek, Qwen, Gemma and Kimi.

Blackwell kernels are notoriously hard to write. But we find that verification is just as hard.

We have a story to tell.

https://t.co/aQX9XXCz4z

avivnav retweeted

Amnon Shashua

@AmnonShashua

4 months ago

DoubleAI’s AI system just beat a decade of expert GPU engineering WarpSpeed just beat a decade of expert-engineered GPU kernels — every single one of them. cuGraph is one of the most widely used GPU-accelerated libraries in the world. It spans dozens of graph algorithms, each written and continuously refined by some of the world’s top performance engineers. @_doubleAI_'s WarpSpeed autonomously rewrote and re-optimized these kernels across three GPU architectures (A100, L4, A10G). Today, we released the hyper-optimized version on GitHub — install it with no change to your code. The numbers: - 3.6x average speedup over human experts - 100% of kernels benefit from speedup - 55% see more than 2x improvement. But hasn’t AI already achieved expert-level status — winning gold medals at IMO, outperforming top programmers on CodeForces? Not quite. Those wins share three hidden crutches: abundant training data, trivial validation, and short reasoning chains. Where all three hold, today’s AI shines. Remove any one of them and it falls apart (as Shai Shalev Shwartz wrote in his post). GPU performance engineering breaks all three. Data is scarce. Correctness is hard to validate. And performance comes from a long chain of interacting choices — memory layout, warp behavior, caching, scheduling, graph structure. Even state-of-the-art agents like Claude Code, Codex, and Gemini CLI fail dramatically here, often producing incorrect implementations even when handed cuGraph’s own test suite. Scaling alone can’t break this barrier. It took new algorithmic ideas — our Diligent framework for learning from extremely small datasets, our PAC-reasoning methodology for verification when ground truth isn’t available, and novel agentic search structures for navigating deep decision chains. This is the beginning of Artificial Expert Intelligence (AEI) — not AGI, but something the world needs more: systems that reliably surpass human experts in the domains where expertise is rarest, slowest, and most valuable. If AI can surpass the world’s best GPU engineers, which domain falls next? For the full blog: https://t.co/sCF033hb28 CuGraph: https://t.co/jqxrcuhfs4 Winning Gold at IMO 2025: https://t.co/fAdIT2mTkI Codeforces benchmarks: https://t.co/UhRAUieWFi @shai_s_shwartz post: https://t.co/1WAGIXfiqh From Reasoning to Super-Intelligence: A Search-Theoretic Perspective https://t.co/iX625p57NT Artificial Expert Intelligence through PAC-reasoning https://t.co/Hq3wWsmidw

AmnonShashua's tweet photo. DoubleAI’s AI system just beat a decade of expert GPU engineering

WarpSpeed just beat a decade of expert-engineered GPU kernels — every single one of them.

cuGraph is one of the most widely used GPU-accelerated libraries in the world. It spans dozens of graph algorithms, each written and continuously refined by some of the world’s top performance engineers.

@_doubleAI_'s WarpSpeed autonomously rewrote and re-optimized these kernels across three GPU architectures (A100, L4, A10G). Today, we released the hyper-optimized version on GitHub — install it with no change to your code.

The numbers: - 3.6x average speedup over human experts - 100% of kernels benefit from speedup - 55% see more than 2x improvement.

But hasn’t AI already achieved expert-level status — winning gold medals at IMO, outperforming top programmers on CodeForces? Not quite. Those wins share three hidden crutches: abundant training data, trivial validation, and short reasoning chains. Where all three hold, today’s AI shines. Remove any one of them and it falls apart (as Shai Shalev Shwartz wrote in his post).

GPU performance engineering breaks all three. Data is scarce. Correctness is hard to validate. And performance comes from a long chain of interacting choices — memory layout, warp behavior, caching, scheduling, graph structure. Even state-of-the-art agents like Claude Code, Codex, and Gemini CLI fail dramatically here, often producing incorrect implementations even when handed cuGraph’s own test suite.

Scaling alone can’t break this barrier. It took new algorithmic ideas — our Diligent framework for learning from extremely small datasets, our PAC-reasoning methodology for verification when ground truth isn’t available, and novel agentic search structures for navigating deep decision chains.

This is the beginning of Artificial Expert Intelligence (AEI) — not AGI, but something the world needs more: systems that reliably surpass human experts in the domains where expertise is rarest, slowest, and most valuable.

If AI can surpass the world’s best GPU engineers, which domain falls next?

For the full blog: https://t.co/sCF033hb28

CuGraph:
https://t.co/jqxrcuhfs4

Winning Gold at IMO 2025:
https://t.co/fAdIT2mTkI

Codeforces benchmarks:
https://t.co/UhRAUieWFi

@shai_s_shwartz post:
https://t.co/1WAGIXfiqh

From Reasoning to Super-Intelligence: A Search-Theoretic Perspective
https://t.co/iX625p57NT

Artificial Expert Intelligence through PAC-reasoning
https://t.co/Hq3wWsmidw

194

126

67K

avivnav retweeted

Shai Shalev-Shwartz

@shai_s_shwartz

4 months ago

1/ Software was eating the world - and now AI is eating software. AI already beats humans at math/coding (IMO, CodeForces). Right? So let's test the strongest coding agents on a real domain: optimizing cuGraph (GPU graph analytics kernels). Spoiler: * The strongest coding agents crash. * And @_doubleAI_ built WarpSpeed - an AI that beat a decade of expert-engineered GPU kernels. 🧵

shai_s_shwartz's tweet photo. 1/ Software was eating the world - and now AI is eating software.
AI already beats humans at math/coding (IMO, CodeForces). Right?

So let's test the strongest coding agents on a real domain: optimizing cuGraph (GPU graph analytics kernels).

Spoiler:
* The strongest coding agents crash.
* And @_doubleAI_ built WarpSpeed - an AI that beat a decade of expert-engineered GPU kernels.
🧵

126

58K

avivnav retweeted

Haggai Maron @HaggaiMaron

7 months ago

📄GradMetaNet: An Equivariant Architecture for Learning on Gradients w/ @yoav_gelberg , @ytn_ym , @avivnav, @AvivShamsian , @moeputterman , @mmbronstein https://t.co/f3PUbelRAY

Who to follow

Gal Chechik

@GalChechik

A Sr director of AI research at NVIDIA and a CS Prof. at Bar-Ilan U. I study learning for reasoning and perception.

Haggai Maron

@HaggaiMaron

Assistant Professor @TechnionLive. Senior research scientist at @nvidia. Learning with group symmetries, graphs and weight spaces.Views do not represent nvidia.

Youngwoon Lee

@YoungwoonLee

Assistant Professor at Yonsei | Postdoc @UCBerkeley with @pabbeel | PhD @USC with @JosephLim_AI | Reinforcement Learning and Robot Learning

avivnav retweeted

Christopher Morris @chrsmrrs

7 months ago

I respect that @iclr_conf had to respond to the OR leak, but I disagree with resetting scores. Many students worked hard on rebuttals and improved their papers in good faith. I hope the organizers reconsider and revert the reset. If you agree, feel free to retweet.

150

20K

avivnav retweeted

@_akhaliq

10 months ago

Beyond Transcription Mechanistic Interpretability in ASR

11K

avivnav retweeted

Dvir Samuel @dvir_samuel

almost 2 years ago

🚀 High-quality inversion of text-to-image models in real time! Now you can do interactive image editing! 🎨 📄 Paper: https://t.co/iGh4ZCywGb 🌐 Project Page & Demo: https://t.co/sWOEE8eTVB

635

avivnav retweeted

Derek Lim @dereklim_lzh

almost 2 years ago

Our new work won best paper at the ICML HiLD workshop! more details to come, but briefly, this work on parameter symmetries touches on many cool directions in understanding neural network loss landscapes and optimization!

avivnav retweeted

Michael Bronstein @mmbronstein

over 2 years ago

⁦@HaggaiMaron⁩ talking about equivariance in NN weight spaces. At this pace, we will not be training NNs anymore but running an equivariant generative model on the weights

mmbronstein's tweet photo. ⁦@HaggaiMaron⁩ talking about equivariance in NN weight spaces. At this pace, we will not be training NNs anymore but running an equivariant generative model on the weights https://t.co/Yl5l7IymCn

104

13K

avivnav retweeted

Haggai Maron @HaggaiMaron

almost 3 years ago · Israel

A new blog post about our Deep Weight Space Networks! A must read if you want to learn how to apply neural networks to (weights of) other neural networks in a principled way. https://t.co/XwPnCbQhJv

avivnav retweeted

@_akhaliq

about 3 years ago

Point-Cloud Completion with Pretrained Text-to-image Diffusion Models paper page: https://t.co/AVwq8BQssZ Point-cloud data collected in real-world applications are often incomplete. Data is typically missing due to objects being observed from partial viewpoints, which only capture a specific perspective or angle. Additionally, data can be incomplete due to occlusion and low-resolution sampling. Existing completion approaches rely on datasets of predefined objects to guide the completion of noisy and incomplete, point clouds. However, these approaches perform poorly when tested on Out-Of-Distribution (OOD) objects, that are poorly represented in the training dataset. Here we leverage recent advances in text-guided image generation, which lead to major breakthroughs in text-guided shape generation. We describe an approach called SDS-Complete that uses a pre-trained text-to-image diffusion model and leverages the text semantics of a given incomplete point cloud of an object, to obtain a complete surface representation. SDS-Complete can complete a variety of objects using test-time optimization without expensive collection of 3D information. We evaluate SDS Complete on incomplete scanned objects, captured by real-world depth sensors and LiDAR scanners. We find that it effectively reconstructs objects that are absent from common datasets, reducing Chamfer loss by 50% on average compared with current methods.

_akhaliq's tweet photo. Point-Cloud Completion with Pretrained Text-to-image Diffusion Models

paper page: https://t.co/AVwq8BQssZ

Point-cloud data collected in real-world applications are often incomplete. Data is typically missing due to objects being observed from partial viewpoints, which only capture a specific perspective or angle. Additionally, data can be incomplete due to occlusion and low-resolution sampling. Existing completion approaches rely on datasets of predefined objects to guide the completion of noisy and incomplete, point clouds. However, these approaches perform poorly when tested on Out-Of-Distribution (OOD) objects, that are poorly represented in the training dataset. Here we leverage recent advances in text-guided image generation, which lead to major breakthroughs in text-guided shape generation. We describe an approach called SDS-Complete that uses a pre-trained text-to-image diffusion model and leverages the text semantics of a given incomplete point cloud of an object, to obtain a complete surface representation. SDS-Complete can complete a variety of objects using test-time optimization without expensive collection of 3D information. We evaluate SDS Complete on incomplete scanned objects, captured by real-world depth sensors and LiDAR scanners. We find that it effectively reconstructs objects that are absent from common datasets, reducing Chamfer loss by 50% on average compared with current methods.

245

57K

avivnav retweeted

Weights & Biases

@wandb

about 3 years ago

Our next Community Spotlight focuses on DWSNets, a public repo demonstrating a novel network architecture for learning in deep weight spaces. Congrats to the authors of DWSNets for recently being accepted at @icmlconf 🎉 👏 🔗 https://t.co/v9OVbdPWOm

wandb's tweet photo. Our next Community Spotlight focuses on DWSNets, a public repo demonstrating a novel network architecture for learning in deep weight spaces.

Congrats to the authors of DWSNets for recently being accepted at @icmlconf 🎉 👏

🔗 https://t.co/v9OVbdPWOm https://t.co/95ocnCZcmG

avivnav retweeted

Idan Achituve @IdanAchituve

about 3 years ago

Check out our new paper which was accepted to UAI 2023! Guided deep kernel learning: https://t.co/zQKXgKRtUU. (1/6)

avivnav retweeted

Weights & Biases

@wandb

about 3 years ago

"Improving MTL optimization algorithms is, therefore, an important task with significant implications for many systems." Find out how Nash-MTL achieves state-of-the-art results on various benchmarks across multiple domains. 🎲 https://t.co/dGIVFbioC3

Aviv Navon @avivnav

over 3 years ago

@SebastianCygert @HaggaiMaron We provide additional details in Appendix J of the paper. We use 4000 training examples. The DWSNet gets v as its input and outputs Δv. The loss is then computed using target the domain images and the net v-Δv.

Aviv Navon @avivnav

over 3 years ago

@MaxNiroomand @HaggaiMaron @AvivShamsian @GalChechik @IdanAchituve We considered only permutation symmetries in this work. Extending to other symmetry types is a great future research direction.

162

avivnav retweeted

Haggai Maron @HaggaiMaron

over 3 years ago

(1/10) New paper! A deep architecture for processing (weights of) other neural networks while preserving equivariance to their permutation symmetries. Learning in deep weight spaces has a wide potential: from NeRFs to INRs; from adaptation to pruning https://t.co/7ELrkEIo4G 👇

HaggaiMaron's tweet photo. (1/10) New paper! A deep architecture for processing (weights of) other neural networks while preserving equivariance to their permutation symmetries. Learning in deep weight spaces has a wide potential: from NeRFs to INRs; from adaptation to pruning https://t.co/7ELrkEIo4G 👇 https://t.co/UxfMWGCt3Y

696

124

315

129K

avivnav retweeted

Aviv Shamsian @AvivShamsian

almost 4 years ago

Our approach named Nash-MTL outperforms multiple MTL baselined on various domains. This is a joint work with @avivnav @idanachituve @HaggaiMaron @GalChechik @EthanFetaya Kenji Kawaguchi. Paper: https://t.co/I6Icv8sufX Project Page: https://t.co/6e6HCDQDMc #ICML2022

avivnav retweeted

Aviv Shamsian @AvivShamsian

almost 4 years ago

Interested in Multi-task learning? Check out our paper "Multi-task Learning as a Bargaining Game" from ICML 2022. We proposed a principled approach for combining tasks' gradients incorporating ideas from game theory.

Aviv Navon

@avivnav

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users