Marc Molina - ICML 🛩️ @marcm_77 - Twitter Profile

marcm_77 retweeted

1 day ago

Zyphra is sharing our first work in continual learning where we study: Can LLMs learn forever from new data? Many see continual learning as a path to AGI through recursive self-improvement (RSI). The first obstacle is plasticity loss. We derive a scaling law for its onset 🧵

ZyphraAI's tweet photo. Zyphra is sharing our first work in continual learning where we study: Can LLMs learn forever from new data?

Many see continual learning as a path to AGI through recursive self-improvement (RSI).

The first obstacle is plasticity loss. We derive a scaling law for its onset 🧵 https://t.co/bd46YfF7XH

10

590

73

625

992K

marcm_77 retweeted

LMCache Lab

@lmcache

2 days ago

𝐍𝐨 𝐆𝐏𝐔? 𝐍𝐨 𝐩𝐫𝐨𝐛𝐥𝐞𝐦. We just published a starter guide for developing vLLM + LMCache on a MacBook. LMCache's multi-platform design decouples the GPU from most core data paths, so a single laptop is enough to clone, build, run unit tests, and verify a real cache hit on CPU. The guide walks through the environment setup in ~10 minutes and points to four concrete areas where you can start contributing. If not having a GPU was your only blocker, it is not anymore. Read the guide and join us in building the KV cache layer for faster LLM inference: https://t.co/l22JWaH22F #LMCache #vLLM #LLM #AIInfrastructure #OpenSource

11

595

78

1K

207K

marcm_77 retweeted

Tianqi Chen

@tqchenml

3 days ago

We taught a brand-new mini-series this year at @SCSatCMU on Modern GPU Programming for ML Systems, as part of the ML Systems course, touching on fun questions like what data layout swizzling is, how to use 3D TMA, and state-of-the-art Blackwell programming. We released a curated online book based on the materials: https://t.co/5ZJg2lySNO check it out

20

2K

231

2K

135K

marcm_77 retweeted

News from Google

@NewsFromGoogle

13 days ago

Today, we filed a lawsuit to permanently dismantle a group of organized cybercriminals accused of using AI tools — including Gemini — to scam Americans via fake text campaigns. Here’s what to know: ◾Our suit targets core software developers in a cybercrime operation known as the “Outside Enterprise.” The group has allegedly weaponized AI to quickly generate highly convincing fake government and brand websites intended to steal victims’ credit card numbers and personal information. ◾The group used AI and different Google products — including our trademarks and logos — as part of these phishing campaigns. ◾The scale of the operation is massive: More than 100,000 victims have been scammed, with losses estimated in the millions.

108

3K

338

621

385K

Who to follow

Rildo Demarqui

@rildodemarqui

"All models are wrong, but some are useful" - George E. P. Box

Bruno Oliveira

@iobruno_

Staff Data Engineer | Backend Software Engineer

Thomas Lazarus

@lazarustda

He/Him | Data Practitioner | Thoughts are my own

marcm_77 retweeted

Joseph Suarez 🐡

@jsuarez

15 days ago

Now that I have your attention, any suggestions on our ~200 line CUDA implementation of Muon would be greatly appreciated https://t.co/K4Ub7JT93K. In the 5.0 branch on the same file, I played with a small change to preserve LR across model sizes, but there have not been any major improvements otherwise.

3

57

1

18

5K

marcm_77 retweeted

Google Gemma

@googlegemma

15 days ago

Meet DiffusionGemma! An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license. Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇

166

5K

810

2K

956K

marcm_77 retweeted

Tim Sneath @timsneath

16 days ago

One of my personal favorite features announced at WWDC will I suspect be a sleeper hit: container machines, allowing your Mac to run a lightweight, persistent Linux environment with your home directory and repos automatically mounted: https://t.co/dOBdfOOVxC

228

10K

813

6K

737K

marcm_77 retweeted

Markus J. Buehler

@ProfBuehlerMIT

21 days ago

We've made a breakthrough in self-evolving AI scientists moving from "search" to "principled discovery": Scientific discovery requires that the search space itself changes, and an AI scientist must perceive this shift without intervention. We built an AI that achieves this for the first time with the ability to discover the scientific vocabulary it reasons in. Evidence, tools, artifacts, verifiers, failures & claims become typed provenance. We show three distinct modalities: 1) retrieval, adding known objects; 2) search, exploring a fixed schema; and critically: 3) discovery, a verified regime transition. We solve the open-endedness evaluation problem by lifting agentic workflows into a typed copresheaf and proving, via a Kan obstruction, that true discovery is not unbounded generation but a verifiable schema expansion: old evidence is transported by Left Kan extension, and genuine novelty is mathematically quantified by the pointwise residual beyond the transported image - separating discovery from mere search and making novelty objective and measurable rather than a subjective judgment or benchmark delta. Our AI scientist is built in a way that does not pre-conceive the approach it chooses; instead, we endow the system with formal power to adapt, evolve, and reason from first principles. Case studies include: 1⃣Builder/Breaker model that discovers mode-conditioned compliance in proteins; 2⃣CategoryScienceClaw that finds anisotropic fiber-network stiffness rules. Great work in collaboration with my graduate student @fwang108_ @MITdeptofBE F.Y. Wang & M.J. Buehler, Self-Revising Discovery Systems for Science: A Categorical Framework for Agentic Artificial Intelligence, arXiv:2606.01444, 2026

105

3K

379

3K

785K

marcm_77 retweeted

Aoden Teo

@AodenTeoMT

22 days ago

Today, we’re excited to introduce Miso One, the most emotive voice model in the world. Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency. We’ve open-sourced the model weights, with API access coming soon. Hear how Miso One sounds in the thread below.

558

10K

722

11K

5M

marcm_77 retweeted

Gouki Minegishi @ICML

@GoukiMinegishi

about 1 month ago

Our paper was accepted as a #ICML2026 Spotlight! Reasoning in LLMs has improved largely by chaining local steps. But is that the whole story? Humans occasionally make inferential "leaps" across domains, a faculty known as analogy. We design a synthetic task to show how small Transformers acquire analogical reasoning, and find that the same signatures appear in pretrained LLMs. arxiv: https://t.co/1WCizIKWly code: https://t.co/82kOKCtJo7

30

1K

161

1K

87K

marcm_77 retweeted

Google DeepMind @GoogleDeepMind

about 1 month ago

We want to help scientists discover their next breakthrough with AI. Gemini for Science is our new suite of experimental tools to help them explore more hypotheses, validate work at scale, unpack literature with ease, and more 🧵

203

2K

366

883

25M

marcm_77 retweeted

Ryan Peters

@ryanpirl

about 1 month ago

Biological networks too :) Here is the neural geometry of mice navigating a figure-8 maze.

17

642

81

305

66K

marcm_77 retweeted

Ishaan Watts (ICML’26 🇰🇷)

@IshaanWatts18

about 2 months ago

Spending billions to train the "best" base model? You might be optimizing the wrong thing! 🎯 We show that controlling sharpness during mid-training leads to over 35% less forgetting after fine-tuning / quantization... even when the base model itself gets worse. 🧵 Takeaways for pretraining: - Use SAM (Sharpness-Aware-Minimization) in the final steps (~10%) - Try much higher learning rates (yes, even ~10× larger) 1/9

IshaanWatts18's tweet photo. Spending billions to train the "best" base model? You might be optimizing the wrong thing! 🎯

We show that controlling sharpness during mid-training leads to over 35% less forgetting after fine-tuning / quantization... even when the base model itself gets worse.

🧵 Takeaways for pretraining:
- Use SAM (Sharpness-Aware-Minimization) in the final steps (~10%)
- Try much higher learning rates (yes, even ~10× larger)

1/9

31

622

91

440

591K

marcm_77 retweeted

Rosinality @rosinality

about 1 month ago

Looped Transformer and MoE are a natural combination (https://t.co/ToBNezLUTb, https://t.co/CLwbrhaage). And this would lead to more sparsity (https://t.co/evpLzdVdzg).

rosinality's tweet photo. Looped Transformer and MoE are a natural combination (https://t.co/ToBNezLUTb, https://t.co/CLwbrhaage). And this would lead to more sparsity (https://t.co/evpLzdVdzg). https://t.co/u9sKLo4wWh

5

216

29

179

11K

marcm_77 retweeted

Francesco Bertolotti @f14bertolotti

about 1 month ago

The authors introduce Kaon, a Muon variant with random noise replacing SVs. Kaon matches Muon, suggesting Muon’s gains don’t depend from a geometry. They also show Muon has a stable opt. step size, yielding a more effective learning rate during training. 🔗https://t.co/jqwp3534c2

f14bertolotti's tweet photo. The authors introduce Kaon, a Muon variant with random noise replacing SVs. Kaon matches Muon, suggesting Muon’s gains don’t depend from a geometry. They also show Muon has a stable opt. step size, yielding a more effective learning rate during training.
🔗https://t.co/jqwp3534c2 https://t.co/ZfKvIVPSz6

3

185

19

169

38K

marcm_77 retweeted

Thinking Machines

@thinkymachines

about 2 months ago

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. https://t.co/AFJZ5kH7Ku

465

16K

2K

12K

8M

marcm_77 retweeted

hardmaru

@hardmaru

about 2 months ago

The human brain🧠 is incredibly efficient because it only activates the specific neurons needed for a thought. Modern LLMs naturally try to do this too (> 95% of neurons in feedforward layers stay silent for any given word), but our hardware punishes them for it. One of the most frustrating paradoxes in deep learning: making a model do less math often makes it run slower. Why? Because unstructured sparsity introduces irregular memory access, and GPUs are built for predictable, dense blocks of math. We teamed up with @NVIDIA to try to fix this hardware mismatch. Instead of forcing the GPU to adapt to the sparsity, we built a "Hybrid" format that reshapes the sparsity to fit the GPU. Our sparsity format (TwELL) dynamically routes the 99% of highly sparse tokens through a fast path, and uses a dense backup matrix as a safety valve for the rare, heavy tokens. Through TwELL and a new set of custom CUDA kernels for both LLM inference and training, we translated theoretical sparsity into actual wall-clock speedups: >20% faster training and inference on H100 GPUs, while also cutting energy consumption and memory requirements. Paper: https://t.co/rqIY9SYBDe Blog: https://t.co/oRjNbpJKha Code: https://t.co/FAFaJwpxAJ ⚡️

hardmaru's tweet photo. The human brain🧠 is incredibly efficient because it only activates the specific neurons needed for a thought. Modern LLMs naturally try to do this too (> 95% of neurons in feedforward layers stay silent for any given word), but our hardware punishes them for it.

One of the most frustrating paradoxes in deep learning: making a model do less math often makes it run slower. Why? Because unstructured sparsity introduces irregular memory access, and GPUs are built for predictable, dense blocks of math.

We teamed up with @NVIDIA to try to fix this hardware mismatch. Instead of forcing the GPU to adapt to the sparsity, we built a "Hybrid" format that reshapes the sparsity to fit the GPU. Our sparsity format (TwELL) dynamically routes the 99% of highly sparse tokens through a fast path, and uses a dense backup matrix as a safety valve for the rare, heavy tokens.

Through TwELL and a new set of custom CUDA kernels for both LLM inference and training, we translated theoretical sparsity into actual wall-clock speedups: >20% faster training and inference on H100 GPUs, while also cutting energy consumption and memory requirements.

Paper: https://t.co/rqIY9SYBDe
Blog: https://t.co/oRjNbpJKha
Code: https://t.co/FAFaJwpxAJ
⚡️

52

3K

509

3K

433K

marcm_77 retweeted

Flapping Airplanes

@flappyairplanes

about 2 months ago

(4/5) One thing we’ve built is a “kittens” virtual machine that takes over the whole GPU and allows new kinds of co-optimization. We can go past the traditional sequential kernel model – for example, fusing entire training runs into a single kernel and even weirder stuff.

flappyairplanes's tweet photo. (4/5) One thing we’ve built is a “kittens” virtual machine that takes over the whole GPU and allows new kinds of co-optimization. We can go past the traditional sequential kernel model – for example, fusing entire training runs into a single kernel and even weirder stuff. https://t.co/5lQAy1Qa7Z

28

675

56

293

248K

marcm_77 retweeted

Zechen Zhang

@ZechenZhang5

about 2 months ago

1/ For nearly 350 years, science has communicated itself through one object: the paper. A linear narrative, frozen as a PDF, written for a human reader. We've come to treat that format as the medium of science itself. It doesn't have to be. It's a historical artifact. 🧵

ZechenZhang5's tweet photo. 1/ For nearly 350 years, science has communicated itself through one object: the paper. A linear narrative, frozen as a PDF, written for a human reader. We've come to treat that format as the medium of science itself.
It doesn't have to be. It's a historical artifact. 🧵 https://t.co/P5UUhceLkC

28

1K

134

853

142K

marcm_77 retweeted

lodestone-rock

@LodestoneRock

about 2 months ago

seems like architectural research can be done ridiculously cheaply by measuring the time to overfit into 1 sample! RMSnorm (https://t.co/zyfZ4aFmdh) vs softclamp (https://t.co/2INxdcAcDs) (4000 steps fit)

LodestoneRock's tweet photo. seems like architectural research can be done ridiculously cheaply by measuring the time to overfit into 1 sample!

RMSnorm (https://t.co/zyfZ4aFmdh) vs softclamp (https://t.co/2INxdcAcDs) (4000 steps fit) https://t.co/SoyJr29iDR

10

355

17

315

33K

Marc Molina - ICML 🛩️

@marcm_77

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users