Jonas Jarutis @jjarutis - Twitter Profile

jjarutis retweeted

about 2 years ago

@Rascist_Killer @DefenceU @ZelenskyyUa I am here for those men who AREN’T in the cafes. I am here for the soldiers in the trenches.

4

329

6

0

10K

Jonas Jarutis @jjarutis

about 2 years ago

@mdonatas 24

0

1

0

9

jjarutis retweeted

Andrej Karpathy

@karpathy

about 2 years ago

A few new CUDA hacker friends joined the effort and now llm.c is only 2X slower than PyTorch (fp32, forward pass) compared to 4 days ago, when it was at 4.2X slower 📈 The biggest improvements were: - turn on TF32 (NVIDIA TensorFLoat-32) instead of FP32 for matmuls. This is a new mathmode in GPUs starting with Ampere+. This is a very nice, ~free optimization that sacrifices a little bit of precision for a large increase in performance, by running the matmuls on tensor cores, while chopping off the mantissa to only 10 bits (the least significant 19 bits of the float get lost). So the inputs, outputs and internal accumulates remain in fp32, but the multiplies are lower precision. Equivalent to PyTorch `torch.set_float32_matmul_precision('high')` - call cuBLASLt API instead of cuBLAS for the sGEMM (fp32 matrix multiply), as this allows you to also fuse the bias into the matmul and deletes the need for a separate add_bias kernel, which caused a silly round trip to global memory for one addition. - a more efficient attention kernel that uses 1) cooperative_groups reductions that look much cleaner and I only just learned about (they are not covered by the CUDA PMP book...), 2) the online softmax algorithm used in flash attention, 3) fused attention scaling factor multiply, 4) "built in" autoregressive mask bounds. (big thanks to ademeure, ngc92, lancerts on GitHub for writing / helping with these kernels!) Finally, ChatGPT created this amazing chart to illustrate our progress. 4 days ago we were 4.6X slower, today we are 2X slower. So we are going to beat PyTorch imminently 😂 Now (personally) going to focus on the backward pass, so we have the full training loop in CUDA.

karpathy's tweet photo. A few new CUDA hacker friends joined the effort and now llm.c is only 2X slower than PyTorch (fp32, forward pass) compared to 4 days ago, when it was at 4.2X slower 📈

The biggest improvements were:
- turn on TF32 (NVIDIA TensorFLoat-32) instead of FP32 for matmuls. This is a new mathmode in GPUs starting with Ampere+. This is a very nice, ~free optimization that sacrifices a little bit of precision for a large increase in performance, by running the matmuls on tensor cores, while chopping off the mantissa to only 10 bits (the least significant 19 bits of the float get lost). So the inputs, outputs and internal accumulates remain in fp32, but the multiplies are lower precision. Equivalent to PyTorch `torch.set_float32_matmul_precision('high')`
- call cuBLASLt API instead of cuBLAS for the sGEMM (fp32 matrix multiply), as this allows you to also fuse the bias into the matmul and deletes the need for a separate add_bias kernel, which caused a silly round trip to global memory for one addition.
- a more efficient attention kernel that uses 1) cooperative_groups reductions that look much cleaner and I only just learned about (they are not covered by the CUDA PMP book...), 2) the online softmax algorithm used in flash attention, 3) fused attention scaling factor multiply, 4) "built in" autoregressive mask bounds.

(big thanks to ademeure, ngc92, lancerts on GitHub for writing / helping with these kernels!)

Finally, ChatGPT created this amazing chart to illustrate our progress. 4 days ago we were 4.6X slower, today we are 2X slower. So we are going to beat PyTorch imminently 😂

Now (personally) going to focus on the backward pass, so we have the full training loop in CUDA.

102

4K

350

1K

1M

jjarutis retweeted

Gryn ✌ @gryn_a

over 2 years ago

This is me as a little pup, aged 5. My little sister is almost 3. It’s January 1991, and we are in Vilnius, Lithuania 🇱🇹 Something scary is about to happen on this day. Walk with me on this journey through old memories… 🧵

gryn_a's tweet photo. This is me as a little pup, aged 5. My little sister is almost 3.
It’s January 1991, and we are in Vilnius, Lithuania 🇱🇹
Something scary is about to happen on this day.
Walk with me on this journey through old memories… 🧵 https://t.co/FIAm40juUc

167

4K

1K

412

457K

Who to follow

Juozas

@zuozas

Raising kids, making designs, and playing games

pragmatic and ambitious

jjarutis retweeted

Tony Zhao

@tonyzzhao

over 2 years ago

Introducing 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀🏄 -- Hardware! A low-cost, open-source, mobile manipulator. One of the most high-effort projects in my past 5yrs! Not possible without co-lead @zipengfu and @chelseabfinn. At the end, what's better than cooking yourself a meal with the 🤖🧑‍🍳

219

5K

1K

2K

2M

jjarutis retweeted

Physics In History

@PhysInHistory

over 2 years ago

10 Brilliant Mathematics Lectures (free) that you must watch. A Thread 👇

45

5K

1K

8K

942K

jjarutis retweeted

Jonathan Gorard @getjonwithit

over 2 years ago

The Rabin-Scott theorem is one of the (philosophically) deepest mathematical results I know. When properly understood, I claim that it can't help but alter your view of reality in a fairly foundational way. Yet its typical textbook presentation obscures much of this depth. (1/8)

getjonwithit's tweet photo. The Rabin-Scott theorem is one of the (philosophically) deepest mathematical results I know. When properly understood, I claim that it can't help but alter your view of reality in a fairly foundational way. Yet its typical textbook presentation obscures much of this depth. (1/8) https://t.co/xH5njZDEuW

114

4K

593

4K

774K

jjarutis retweeted

Kevin Patrick Murphy

@sirbayes

almost 3 years ago

The trilogy is complete! My "Advanced Topics" book is officially released today. Buy it on Amazon, or get it for free at https://t.co/yWK8zMbMRA.

sirbayes's tweet photo. The trilogy is complete! My "Advanced Topics" book is officially released today. Buy it on Amazon, or get it for free at https://t.co/yWK8zMbMRA. https://t.co/90cM7LXnpX

61

5K

733

2K

445K

jjarutis retweeted

Nat Friedman

@natfriedman

about 3 years ago

Daniel and I have setup a cluster for startups: https://t.co/FMmfA7MQm3

195

4K

374

1K

2M

Jonas Jarutis @jjarutis

about 3 years ago

@mdonatas Added some on top :)

0

1

0

12

jjarutis retweeted

Josh Marshall

@joshtpm

about 3 years ago

On @elonmusk’s claim that everybody does it, actually in 2014 Erdoğan made the same demands of Twitter. Twitter refused and he did briefly ban the site. But the whole thing backfired on Erdoğan. Twitter also told Turkish users ways to evade the ban. https://t.co/h7tbENNDK0

152

11K

3K

291

1M

jjarutis retweeted

George Mack

@george__mack

about 3 years ago

The best habit I made in 2023... I have 2 phones: Cocaine Phone & Kale Phone. Here's how it fixed my smartphone addiction:

238

11K

942

8K

10M

jjarutis retweeted

Mig Greengard

@chessninja

about 3 years ago

NATO is a threat to Russia the way an electrified fence is a threat to a rabid dog.

6

667

122

11

48K

jjarutis retweeted

2R4U🇺🇦

@2r4uu

about 3 years ago

According to the 1926 all-Soviet census 55%-67,5% (depending on definition) of the Kuban Region's (today part of russia) population identified as Ukrainians. In 2021 only 0,5% identified as such. So what happened?

2r4uu's tweet photo. According to the 1926 all-Soviet census 55%-67,5% (depending on definition) of the Kuban Region's (today part of russia) population identified as Ukrainians. In 2021 only 0,5% identified as such.
So what happened? https://t.co/55QbO10eQk

64

3K

675

249

282K

jjarutis retweeted

Eric Jang

@ericjang11

about 3 years ago

Instead of finding the perfect prompt for an LLM (let's think step by step), you can ask LLMs to critique their outputs and immediately fix their own mistakes. Here's a fun example:

59

1K

204

1K

1M

jjarutis retweeted

Bad Baltic Takes @BadBalticTakes

about 3 years ago

My favourite thing on Twitter (genuinely) is people discovering the Baltic Way. No mobiles, no social media, limited freedoms. Yet, on 23 August 1989, 2 million formed a nearly 700km chain across 3 countries demanding freedom. However, the key context is often missed… 🧵

BadBalticTakes's tweet photo. My favourite thing on Twitter (genuinely) is people discovering the Baltic Way.

No mobiles, no social media, limited freedoms. Yet, on 23 August 1989, 2 million formed a nearly 700km chain across 3 countries demanding freedom.

However, the key context is often missed… 🧵 https://t.co/aDWKKb6PIk

90

6K

1K

420

547K

jjarutis retweeted

Billy Oppenheimer

@bpoppenheimer

about 3 years ago

What is your brain doing when you are doing absolutely nothing? A neuroscientist wanted an answer to that question. So she conducted the first brain-imaging study of brain activity when the brain is in a "resting state"—free of inputs/free to wander. The results are interesting... The neuroscientist Dr. Nancy Andreasen used a positron emission tomography (PET) scan to study brain activity when the brain is at "wakeful rest" When you are doing absolutely nothing: "We found activations in multiple regions of the association cortex," Dr. Andreasen wrote. "We were not [seeing] a passive silent brain during the ‘resting state,’ but rather a brain that was actively connecting thoughts and experiences.” Essentially, Dr. Andreasen demonstrated that the brain defaults to creativity. When you are doing absolutely nothing, the brain engages in what the researchers termed "random episodic silent thought" or REST. And during REST, the brain "uses its most human & complex parts." Takeaway 1: Doing nothing promotes creativity. Raymond Chandler, Neil Gaiman, Dr. Suess, George Lucas, and many other writers all have/had a similar habit. They gave themselves two options: a) You don’t have to write. b) You can’t do anything else. @neilhimself explains: "I go down to my lovely little gazebo at the bottom of the garden, sit down, and I’m absolutely allowed not to do anything...I'm not allowed to do a crossword, read a book, phone a friend...all I’m allowed to do is absolutely nothing, or write. But writing is actually more interesting than doing nothing after a while." Takeaway 2: REST Resting, Dr. Andreasen wrote, “allows the association cortices of the brain to converse in a free & uncensored manner.” This is why ideas come to us in the shower, on walks, and when daydreaming. A favorite example of this: @Lin_Manuel had the idea for “Hamilton” on vacation: "I was on a pool float with a margarita in my hand and had a moment when my brain could unplug. [It was his first vacation in 7 years]. The moment my brain got a moment's rest, the best idea I've ever had in my life walked into it." --- “If a plant only gets sunlight, it’s very harmful. It needs darkness too…In the darkness, it converts oxygen into carbon dioxide. We are like that too. We need periods of doing & periods of non-doing.” — Robert Pirsig Follow me @bpoppenheimer for more content like this!

bpoppenheimer's tweet photo. What is your brain doing when you are doing absolutely nothing?

A neuroscientist wanted an answer to that question.

So she conducted the first brain-imaging study of brain activity when the brain is in a "resting state"—free of inputs/free to wander.

The results are interesting...

The neuroscientist Dr. Nancy Andreasen used a positron emission tomography (PET) scan to study brain activity when the brain is at "wakeful rest"

When you are doing absolutely nothing:

"We found activations in multiple regions of the association cortex," Dr. Andreasen wrote. "We were not [seeing] a passive silent brain during the ‘resting state,’ but rather a brain that was actively connecting thoughts and experiences.”

Essentially, Dr. Andreasen demonstrated that the brain defaults to creativity.

When you are doing absolutely nothing, the brain engages in what the researchers termed "random episodic silent thought" or REST.

And during REST, the brain "uses its most human & complex parts."

Takeaway 1: Doing nothing promotes creativity.

Raymond Chandler, Neil Gaiman, Dr. Suess, George Lucas, and many other writers all have/had a similar habit.

They gave themselves two options:

a) You don’t have to write.
b) You can’t do anything else.

@neilhimself explains:

"I go down to my lovely little gazebo at the bottom of the garden, sit down, and I’m absolutely allowed not to do anything...I'm not allowed to do a crossword, read a book, phone a friend...all I’m allowed to do is absolutely nothing, or write. But writing is actually more interesting than doing nothing after a while."

Takeaway 2: REST

Resting, Dr. Andreasen wrote, “allows the association cortices of the brain to converse in a free & uncensored manner.”

This is why ideas come to us in the shower, on walks, and when daydreaming.

A favorite example of this: @Lin_Manuel had the idea for “Hamilton” on vacation:

"I was on a pool float with a margarita in my hand and had a moment when my brain could unplug. [It was his first vacation in 7 years]. The moment my brain got a moment's rest, the best idea I've ever had in my life walked into it."

---

“If a plant only gets sunlight, it’s very harmful. It needs darkness too…In the darkness, it converts oxygen into carbon dioxide. We are like that too. We need periods of doing & periods of non-doing.” — Robert Pirsig

Follow me @bpoppenheimer for more content like this!

65

4K

924

2K

1M

jjarutis retweeted

Jeremy Howard

@jeremyphoward

almost 4 years ago

After 2 years, Practical Deep Learning for Coders v5 is finally ready! 🎊 This is a from-scratch rewrite of our most popular course. It has a focus on interactive explorations, & covers @PyTorch, @huggingface, DeBERTa, ConvNeXt, @Gradio & other goodies 🧵 https://t.co/nzv7pek0iq

107

5K

1K

2K

0

jjarutis retweeted

Jeremy Howard

@jeremyphoward

about 3 years ago

Everybody I know in the NLP community has been watching this extraordinary project closely. Amazing achievement. RNNs are not dead!

7

1K

127

403

271K

jjarutis retweeted

Jimmy Lin

@lintool

about 3 years ago

GPT-4 and its ilk are awesome for rapid prototyping and one-offs, but at the end of the day, enterprises will deploy far smaller distilled models in production. Here's my contrarian take -

28

712

144

454

207K

Jonas Jarutis

@jjarutis

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users