Roberto Garcia @garctrob - Twitter Profile

garctrob retweeted

7 days ago

The dominant story in AI has been the growing cloud: bigger clusters, larger models, more gigawatts. We believe the future is in the opposite direction: on-device inference, smaller models, watts instead of gigawatts. Today we're releasing @OpenJarvisAI v1.0: a personal AI assistant that lives, learns, and works on your device.

49

596

91

566

144K

garctrob retweeted

Jon Saad-Falcon

@JonSaadFalcon

about 2 months ago

Say hi to @OpenJarvisAI 👋 If you have issues, want to make a PR, or simply chat, just @OpenJarvisAI in a tweet! This account is itself an OpenJarvis instance: running 24/7 on an NVIDIA DGX Spark, triaging issues + PRs on the repo and serving as a personal assistant for the lab! For personal AI on personal devices, checkout: https://t.co/40LjG2h0AR https://t.co/Dn2v4g2MbR

1

25

8

1

2K

garctrob retweeted

Dan Fu

@realDanFu

about 2 months ago

📢 Super excited to announce Parcae! We've been thinking about scaling laws and the "right" way to get more FLOPs. Turns out layer looping - with the right parameterization - gives you a new axis to scale! Parcae matches Transformers 2x their size (w/ the same data), and outperforms prior formulations of looped models. But - you need the right parameterization to get these gains against strong Transformer baselines. Looped models are famously unstable to train, with tons of loss spikes and hyperparameter sensitivity. The main technical challenge with looped models is residual explosion - if you're passing the activations through the same layers over and over, some otherwise benign parameterizations cause huge instability. Our key idea: we can think of the residual stream of a model as a time-varying dynamical system - the same fundamentals behind SSMs like Mamba and S4. Then a few modest modifications to classic Transformers (stable diagonalization of injection params, LN before embeddings) can stabilize the looped models. The resulting models are more stable to train, but also reach higher quality. It's strong enough to start to derive new scaling laws. Classically - we know you need to scale parameters with data to be FLOP-optimal. With Parcae, we find a third axis - given fixed parameters, you additionally want to scale FLOPs by looping as you scale data. Super excited to see how these ideas hold, and what we can do with looped models! Check out @hayden_prairie's great explainer thread below, and see links for our paper, blog, and models. Joint w/ @zacknovack and @BergKirkpatrick, and a fun collab between @togethercompute and my lab at @ucsd_cse. Enjoy!

2

125

26

65

22K

garctrob retweeted

Stuart Sul

@stuart_sul

2 months ago

Happy to share new ThunderKittens attention kernels for B300 GPUs -- faster than FA4! Check it out:

2

150

13

34

15K

Roberto Garcia @garctrob

2 months ago

Extremely useful read for any ML/AI researcher out there!

Neel Guha @NeelGuha

2 months ago

I wrote a blogpost about writing machine learning research papers (e.g., NeurIPS, ICML, ICLR, etc.). The core idea is that most papers follow one of a predetermined set of templates. The post talks about each template, describes their rules, and offers examples...

NeelGuha's tweet photo. I wrote a blogpost about writing machine learning research papers (e.g., NeurIPS, ICML, ICLR, etc.). The core idea is that most papers follow one of a predetermined set of templates. The post talks about each template, describes their rules, and offers examples... https://t.co/6P9OOw17cS

6

620

82

827

81K

0

4

1

532

garctrob retweeted

Cursor @cursor_ai

3 months ago

Composer 2 is now available in Cursor.

647

10K

880

2K

5M

garctrob retweeted

Jon Saad-Falcon

@JonSaadFalcon

3 months ago

Personal AI should run on your personal devices. So, we built OpenJarvis: a personal AI that lives, learns, and works on-device. Try it today and top the OpenJarvis Leaderboard for a chance to win a Mac Mini! Collab w/ @Avanika15, John Hennessy, @HazyResearch, and @Azaliamirh. Details in thread.

JonSaadFalcon's tweet photo. Personal AI should run on your personal devices. So, we built OpenJarvis: a personal AI that lives, learns, and works on-device.

Try it today and top the OpenJarvis Leaderboard for a chance to win a Mac Mini!

Collab w/ @Avanika15, John Hennessy, @HazyResearch, and @Azaliamirh. Details in thread.

38

326

90

230

106K

garctrob retweeted

Stuart Sul

@stuart_sul

3 months ago

(1/7) We're releasing ThunderKittens 2.0! Faster kernels, cleaner code, industry contributions, and new state-of-the-art BF16 / MXFP8 / NVFP4 GEMMs that match or surpass cuBLAS! Alongside this release, we’re equally excited to share some insights we learned while squeezing every last TFLOP out of Blackwell: (with @hazyresearch & generously supported by @cursor_ai)

stuart_sul's tweet photo. (1/7) We're releasing ThunderKittens 2.0! Faster kernels, cleaner code, industry contributions, and new state-of-the-art BF16 / MXFP8 / NVFP4 GEMMs that match or surpass cuBLAS!

Alongside this release, we’re equally excited to share some insights we learned while squeezing every last TFLOP out of Blackwell:

(with @hazyresearch & generously supported by @cursor_ai)

13

538

87

268

62K

garctrob retweeted

Flapping Airplanes

@flappyairplanes

4 months ago

Announcing Flapping Airplanes! We’ve raised $180M from GV, Sequoia, and Index to assemble a new guard in AI: one that imagines a world where models can think at human level without ingesting half the internet.

338

4K

256

1K

2M

garctrob retweeted

Shizhe He @shizhehe

5 months ago

Holiday read from @HazyResearch 🎄: How should you mix and match LLMs in an agentic system? How many bits of information about the context does an agent carry? We use information theory to understand how to choose and scale these models.

shizhehe's tweet photo. Holiday read from @HazyResearch 🎄:

How should you mix and match LLMs in an agentic system? How many bits of information about the context does an agent carry?

We use information theory to understand how to choose and scale these models. https://t.co/haSxpMtjpJ

8

361

52

378

38K

garctrob retweeted

Jerry Liu @jerrywliu

6 months ago

Really enjoyed our conversation with @alex_damian_ , check it out! Lots of interesting thoughts about the role of theory in modern ML and what questions to explore next.

0

19

1

6

2K

garctrob retweeted

Yasa Baig @BaigYasa

6 months ago

🎙️ First time doing this 🙂 — I filled in for François on a one-off podcast episode with @alex_damian_ and @jerrywliu! We had a really fun, wide-ranging conversation about AI, theory, and how research actually gets done. Watch here 👇 https://t.co/vo6zX5nlFN

BaigYasa's tweet photo. 🎙️ First time doing this 🙂 — I filled in for François on a one-off podcast episode with @alex_damian_ and @jerrywliu!

We had a really fun, wide-ranging conversation about AI, theory, and how research actually gets done.

Watch here 👇
https://t.co/vo6zX5nlFN https://t.co/C7KNi3PLvF

0

17

6

2

4K

garctrob retweeted

Jerry Liu @jerrywliu

6 months ago

We explicitly construct MLPs that implement key–value fact mappings and, as a proof-of-concept, demonstrate modular fact editing inside a 1-layer transformer. (Joint work with @OwenDugan, @garctrob, @ronnygjunkins and team!) https://t.co/b8YzumU97H

0

6

2

0

613

garctrob retweeted

Jerry Liu @jerrywliu

6 months ago

Curious how to cook up your own fact-storing MLPs? We wrote up a simple recipe… just in time for the holiday season 🎁👨‍🍳✨🧠 Check it out! I’ll be at NeurIPS this week — happy to talk about MLPs & more!

0

10

3

0

1K

garctrob retweeted

Owen Dugan @OwenDugan

6 months ago

Part 2 of our MLPs blog post is out! 👀 This time, we’re here to tell you the story 📖 of our quest for a construction that: ✅ Handles general embeddings 🌐 ✅ Asymptotically matches the information-theoretic limit 📊📈 ✅ Is usable within transformers 🤖✨

OwenDugan's tweet photo. Part 2 of our MLPs blog post is out! 👀

This time, we’re here to tell you the story 📖 of our quest for a construction that:
✅ Handles general embeddings 🌐
✅ Asymptotically matches the information-theoretic limit 📊📈
✅ Is usable within transformers 🤖✨ https://t.co/ziV4XkeutC

1

31

11

12

9K

Roberto Garcia @garctrob

6 months ago

Very excited to introduce our fact-storing MLP construction and the insights we learned from it and from plugging it into a Transformer block! Really fun work with an amazing team 🙌. Can’t wait to see the new directions this could unlock: can MLP constructions help us pack more knowledge into smaller models or speed up pre-training and inference in LLMs?

Owen Dugan @OwenDugan

6 months ago

Happy 🦃 Thanksgiving weekend! 🍂 This year, we cooked up a new recipe for juicy fact-storing MLPs. Instead of picking apart trained models, we asked: Can we construct fact-storing MLPs from scratch? 🤔 Spoiler: we can & we figured out how to slot these hand-crafted MLPs into Transformer blocks as modular fact stores! 🧩 New work with @garctrob @ronnygjunkins @jerrywliu @dylan_zinsley @EyubogluSabri Atri Rudra @HazyResearch! 🧵👇

OwenDugan's tweet photo. Happy 🦃 Thanksgiving weekend! 🍂 This year, we cooked up a new recipe for juicy fact-storing MLPs. Instead of picking apart trained models, we asked: Can we construct fact-storing MLPs from scratch? 🤔

Spoiler: we can & we figured out how to slot these hand-crafted MLPs into Transformer blocks as modular fact stores! 🧩

New work with @garctrob @ronnygjunkins @jerrywliu @dylan_zinsley @EyubogluSabri Atri Rudra @HazyResearch!
🧵👇

8

336

47

244

65K

0

1

0

198

garctrob retweeted

Owen Dugan @OwenDugan

6 months ago

Happy 🦃 Thanksgiving weekend! 🍂 This year, we cooked up a new recipe for juicy fact-storing MLPs. Instead of picking apart trained models, we asked: Can we construct fact-storing MLPs from scratch? 🤔 Spoiler: we can & we figured out how to slot these hand-crafted MLPs into Transformer blocks as modular fact stores! 🧩 New work with @garctrob @ronnygjunkins @jerrywliu @dylan_zinsley @EyubogluSabri Atri Rudra @HazyResearch! 🧵👇

8

336

47

244

65K

garctrob retweeted

Avanika Narayan

@Avanika15

6 months ago

The U.S.–China AI race won’t be decided by who builds the most datacenters, but by who deploys the most intelligence. We call this Gross Domestic Intelligence (GDI): intelligence per watt × usable power. If the U.S. activates its dense installed base of local AI accelerators in a hybrid local–cloud system, it could add ~30–40% inference capacity and ≈2-4× GDI for single-turn chat and reasoning queries without building any new datacenters or grid infrastructure. Winning the GDI race means treating local compute as critical infrastructure and making hybrid inference the default. (1/N)

Avanika15's tweet photo. The U.S.–China AI race won’t be decided by who builds the most datacenters, but by who deploys the most intelligence.

We call this Gross Domestic Intelligence (GDI): intelligence per watt × usable power.

If the U.S. activates its dense installed base of local AI accelerators in a hybrid local–cloud system, it could add ~30–40% inference capacity and ≈2-4× GDI for single-turn chat and reasoning queries without building any new datacenters or grid infrastructure.

Winning the GDI race means treating local compute as critical infrastructure and making hybrid inference the default.

(1/N)

9

136

41

97

69K

garctrob retweeted

Yasa Baig @BaigYasa

7 months ago

This is a satirical image but I would actually love this feature.

0

1

0

247

garctrob retweeted

Mayee Chen

@MayeeChen

7 months ago

Thrilled to have contributed to Olmo 3! The best fully open 32B model (data, training recipes, checkpoints and more!) As an intern at AI2 these last 8 months, I’ve grown to deeply appreciate the careful science, iteration, and collaboration that go into models like this and have learned so much from the team. I am more optimistic than ever about the future of open-source and data-centric research right now. My particular contribution was working on the Dolma 3 data mix 👩‍🍳 I was able to apply ideas from some of my earlier mixing work, explore new problem settings, and see firsthand the data challenges that arise when building datasets intended for real models at scale. More on this coming soon!

16

271

34

67

70K

Roberto Garcia

@garctrob

Last Seen Users on Sotwe

Trends for you

Most Popular Users