Bert Maher

@tensorbert

I’m a software engineer building high-performance kernels and compilers at Anthropic! Previously at Facebook/Meta (PyTorch, HHVM, ReDex)

Joined December 2022

401 Following

2.8K Followers

432 Posts

Bert Maher @tensorbert

3 months ago

@headinthebox It seems like in theory reorgs could solve problems (centralizing decision making in one person reducing communication and consensus costs) but in practice it doesn’t seem like they ever do

0

0

0

0

56

tensorbert retweeted

Matt Durrant @mgdurrant

4 months ago

I am a proud American, and I’m proud to work at Anthropic. Labeling Claude a supply chain risk will only harm America’s lead in AI. It’s not un-American to support AI guardrails that protect our civil liberties.

145

3K

198

41

45K

Bert Maher @tensorbert

4 months ago

I am deeply proud to work for this company.

4 months ago

A statement on the comments from Secretary of War Pete Hegseth. https://t.co/Gg7Zb09IMR

3K

42K

7K

5K

18M

8

552

12

3

7K

Bert Maher @tensorbert

7 months ago

@difficultyang Claude and I have so much in common 😂

0

4

0

0

286

Who to follow

Verified account

Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems.

Verified account

a really exciting new account "advanced pytorch user" - @cHHillee alt: @typedalt

Verified account

Research Scientist @OpenAI Previously: PhD @MIT @MSFTResearch @UberATG @SCSatCMU @AUB_Lebanon

Bert Maher @tensorbert

7 months ago

@ViepliveeLee @giffmana 2.71828: Geminie

0

1

0

0

32

tensorbert retweeted

7 months ago

@fermatslibrary One of the most common flaws of math textbooks is that they present only the logic, without the intuition. They give you the later, cleaned up version of the idea, which hides the way it was discovered.

123

3K

247

433

137K

Bert Maher @tensorbert

7 months ago

This is all true, but Soumith is also one of the most brilliant strategic thinkers in the world. Some of us just fail a lot, dust ourselves off, and keep hacking the next day ☺️

7 months ago

If you feel like giving up, you must read this never-before-shared story of the creator of PyTorch and ex-VP at Meta, Soumith Chintala. > from hyderabad public school, but bad at math > goes to a "tier 2" college in India, VIT in Vellore > rejected from all 12 universities for US masters despite 1420 on the GRE > fuckit.jpg > goes to the US anyway on a J-1 visa to CMU with no plan > applies for masters (again) to 15 universities > rejected from all except USC and with late admissions, NYU in 2010 > finds this guy called Yann LeCun (before he was famous) > starts getting into open source > rejected from all jobs including DeepMind > only job is Amazon as test engineer > his PhD mentor helps him get a job at a small startup (MuseAmi) > rejected from DeepMind > couldn't get H-1B because of J-1 home return issue; gets waiver through months of approval with USCIS and US State Dept > very low on confidence > In 2011/12 builds one of the fastest AI inference engines on phones > rejected from DeepMind > emailed Yann again and joins FAIR because of Torch7 open-source work > scrapes through bootcamp at Facebook, struggling on an HBase task > L8/L9 engineers at Facebook struggle to get ImageNet working > figures out numerics / hyperparam issue as an L4 > first big win! > FAIR goes well, runs 3 person torch7 team and co-creates PyTorch > because of politics, management wants to shut down PyTorch > cries-at-bar.jpg, literally > eventually some people save PyTorch and it launches in 2017 > gets a EB-1 green card! > the rest is history... Think about that. He went to a tier 2 college. Was rejected from all Masters programs 2x. Rejected from every single job except Amazon test engineering. Rejected from DeepMind 3x. Nearly had his baby project shut down. Struggled with visa issues. After 12 years of failures (2005-17), he eventually rose to became a VP at Meta one of the most influential people in AI! Soumith's story is one of resilience and he's living proof that no matter how down in the dumps you are, there's always hope.

deedydas's tweet photo. If you feel like giving up, you must read this never-before-shared story of the creator of PyTorch and ex-VP at Meta, Soumith Chintala.

> from hyderabad public school, but bad at math
> goes to a "tier 2" college in India, VIT in Vellore
> rejected from all 12 universities for US masters despite 1420 on the GRE
> fuckit.jpg
> goes to the US anyway on a J-1 visa to CMU with no plan
> applies for masters (again) to 15 universities
> rejected from all except USC and with late admissions, NYU in 2010
> finds this guy called Yann LeCun (before he was famous)
> starts getting into open source
> rejected from all jobs including DeepMind
> only job is Amazon as test engineer
> his PhD mentor helps him get a job at a small startup (MuseAmi)
> rejected from DeepMind
> couldn't get H-1B because of J-1 home return issue; gets waiver through months of approval with USCIS and US State Dept
> very low on confidence
> In 2011/12 builds one of the fastest AI inference engines on phones
> rejected from DeepMind
> emailed Yann again and joins FAIR because of Torch7 open-source work
> scrapes through bootcamp at Facebook, struggling on an HBase task
> L8/L9 engineers at Facebook struggle to get ImageNet working
> figures out numerics / hyperparam issue as an L4
> first big win!
> FAIR goes well, runs 3 person torch7 team and co-creates PyTorch
> because of politics, management wants to shut down PyTorch
> cries-at-bar.jpg, literally
> eventually some people save PyTorch and it launches in 2017
> gets a EB-1 green card!
> the rest is history...

Think about that. He went to a tier 2 college. Was rejected from all Masters programs 2x. Rejected from every single job except Amazon test engineering. Rejected from DeepMind 3x. Nearly had his baby project shut down. Struggled with visa issues. After 12 years of failures (2005-17), he eventually rose to became a VP at Meta one of the most influential people in AI!

Soumith's story is one of resilience and he's living proof that no matter how down in the dumps you are, there's always hope.

279

11K

1K

6K

2M

0

62

1

7

7K

Bert Maher @tensorbert

7 months ago

This got me thinking that both int and FP math is “emulated” via a pretty complex set of transistors. I wonder how many gates/transistors it takes to implement an int8 fma versus an fp8, e4m3 fma

7 months ago

@CernBasher As the number of bits drops, the difference between floating point and integer decreases until they are the same thing at 1 bit. “Floating point” is not real. It is emulated with 2 integers and a lot of complexity.

119

1K

90

159

397K

1

5

0

1

1K

Bert Maher @tensorbert

7 months ago

@tqchenml Fair point! My first guess would be “this path has regressed” but it’s also true that expectations are high, the hw is fast, and 10us can actually be substantial (depending on the work). If it’s the latter that’s rough, triton.jit is decently fast (need c++ launch maybe)

0

2

0

1

402

Bert Maher @tensorbert

7 months ago

I’ve heard this complaint from a couple people recently, and I’m surprised because we optimized the launch path like a year ago and got it down to ~10us. There’s a now closed GitHub issue I filed with a microbenchmark - someone should run it, profile, and bring it down

7 months ago

why is triton’s kernel launch cpu overhead so freaking high? the actual kernel takes 10x less execution time than to launch it and i can’t use cuda graphs because the shapes are dynamic.

11

127

3

30

42K

3

17

0

3

18K

Bert Maher @tensorbert

7 months ago

@soumithchintala ❤️ It was great to have the chance to work with you, Soumith. I can’t wait to see what you do next

0

0

0

0

218

Bert Maher @tensorbert

8 months ago

@headinthebox Might it be more tractable to verify that two implementations match, than to come up with an optimized implementation of a simpler one?

1

1

0

0

375

Bert Maher @tensorbert

8 months ago

It would be kind of cool if torch.compile could be used as a context manager, like: ``` some_custom_kernels() with torch.compile(): # do a bunch of easy pointwise stuff more_custom_kernels() ```

0

6

0

0

495

Bert Maher @tensorbert

9 months ago

@ScottWolchok @marksaroufim Sometimes I think the the rows-vs-columns framing is kind of unhelpful. I sometimes think about matmul with rhs transposed, so you have an [m,k] matrix and and [n,k] matrix, and you end up with an [m,n] of all the dot products over k. (Which is kind of what nn.Linear does)

0

2

0

0

61

Bert Maher @tensorbert

9 months ago

Read this to the end — the last section is mind-blowing

Thinking Machines

@thinkymachines

9 months ago

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to prompt engineering. Here we share what we are working on and connect with the research community frequently and openly. The name Connectionism is a throwback to an earlier era of AI; it was the name of the subfield in the 1980s that studied neural networks and their similarity to biological brains. https://t.co/lrJioBmpbT

thinkymachines's tweet photo. Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference”

We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to prompt engineering. Here we share what we are working on and connect with the research community frequently and openly.

The name Connectionism is a throwback to an earlier era of AI; it was the name of the subfield in the 1980s that studied neural networks and their similarity to biological brains.

https://t.co/lrJioBmpbT

230

8K

1K

5K

3M

2

22

1

5

3K

Bert Maher @tensorbert

9 months ago

I am really enjoying using Claude Code with Sonnet 4.5! It's super smart, and super fast!

9 months ago

Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

claudeai's tweet photo. Introducing Claude Sonnet 4.5—the best coding model in the world.

It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math. https://t.co/7LwV9WPNAv

1K

20K

3K

3K

5M

0

6

0

0

1K

tensorbert retweeted

9 months ago

Introducing Claude Sonnet 4.5—the best coding model in the world. It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math.

claudeai's tweet photo. Introducing Claude Sonnet 4.5—the best coding model in the world.

It's the strongest model for building complex agents. It's the best model at using computers. And it shows substantial gains on tests of reasoning and math. https://t.co/7LwV9WPNAv

1K

20K

3K

3K

5M

Bert Maher @tensorbert

9 months ago

😍

9 months ago

Tri Dao says Claude Code makes him 1.5x more productive and that it's quite helpful at writing Triton kernels

scaling01's tweet photo. Tri Dao says Claude Code makes him 1.5x more productive and that it's quite helpful at writing Triton kernels https://t.co/olpVVNX5XG

8

454

22

112

272K

0

13

0

2

3K

Bert Maher @tensorbert

9 months ago

@matt_dz @davorVDR Oh man can’t believe I forgot Helion in the list. And it compiles to triton (or at least did last I looked) so it’s turtles all the way down

1

2

0

0

143

Bert Maher @tensorbert

9 months ago

lol, there is quite the explosion of kernel DSLs lately (triton, tilelang, gluon, TLX, cuteDSL, cuTile, …) And honestly as much as I love TLX and want it to succeed, I think the next big kernel programming language might be… natural, human language

9 months ago

Just one more DSL bro. I promise bro just one more DSL and we'll fix hardware adoption. It's just a better DSL bro. Please just one more. One more DSL and we'll port all the kernels. I just need one more DSL

3

132

5

15

19K

7

114

9

51

17K

Bert Maher @tensorbert

9 months ago

@vinodg Indeed - but I think that language can increasingly become more intuitive and flexible

0

0

0

0

139

Last Seen Users on Sotwe

Trends for you

Most Popular Users