LLMWildling @LLMWildling - Twitter Profile

LLMWildling @LLMWildling

1 day ago

Laptop died 🤷‍♂️

0

2

LLMWildling @LLMWildling

2 days ago

@SakanaAILabs Hiring in the Bay Area?

0

67

LLMWildling @LLMWildling

2 days ago

almost at 125B... MXAR way better than Adam 8 bit. Way faster. Much Speed. Learning go BRRRRRRRR

0

12

LLMWildling @LLMWildling

2 days ago

@sashadem wahhh lol I have this at home for a year, im printing models. what is this LOL

0

8

LLMWildling @LLMWildling

2 days ago

@__alpoge__ I did... and ive had this at home for about a year now... Much faster and much more efficient than y'all

0

58

LLMWildling @LLMWildling

2 days ago

@sama I know the person who worked on this, you are very lucky to have them

0

3

LLMWildling @LLMWildling

2 days ago

Gemma 4 250B tomorrow? Started training

0

15

LLMWildling @LLMWildling

2 days ago

@googlegemma Gonna drop Gemma 4 250B this weekend. https://t.co/EVj8sRVntA

0

1

0

1

747

LLMWildling @LLMWildling

2 days ago

@Mayhem4Markets @TheAhmadOsman @nvidia @NVIDIAAI If I had 1 more or 2 more. I could really crush it.

0

15

LLMWildling @LLMWildling

2 days ago

@Mayhem4Markets @TheAhmadOsman @nvidia @NVIDIAAI If needed I can put my auto researcher on it but im using up all my compute on my new 30x kv cache and im training Gemma 4 250B on my gpus.

1

0

26

LLMWildling @LLMWildling

2 days ago

@Mayhem4Markets @TheAhmadOsman @nvidia @NVIDIAAI I should not get distracted but should I grow nemotron 500B to 600B to get their attention on my two gpus? would be faster if they gave us kernels

1

0

35

LLMWildling @LLMWildling

2 days ago

@thsottiaux @ah20im wait really? does this count towards our current codex usage?

0

712

LLMWildling @LLMWildling

3 days ago

@JustinLin610 Why Gemma 4 12b when you can have Gemma 120b https://t.co/EVj8sRVntA

0

2

0

2

662

LLMWildling @LLMWildling

4 days ago

@demishassabis This one is more fun https://t.co/EVj8sRVntA

0

1

0

40

LLMWildling @LLMWildling

4 days ago

@AlexiGlad Better than mine?

0

57

LLMWildling @LLMWildling

6 days ago

@TheAhmadOsman It’s already here for some of us I dropped a model without Adam 8bit optimizer yesterday

0

56

LLMWildling @LLMWildling

7 days ago

LLMWildling's tweet photo. https://t.co/rlmv6xuuB4

0

33

LLMWildling @LLMWildling

7 days ago

@willdepue https://t.co/M3dZpqX6Kz 🎉

0

4

LLMWildling @LLMWildling

10 days ago

@willdepue its not bad, but there are much faster ways to do that Mine is a different approach but much faster result. Im in my last round of new optimization testing, about to replace Adam optimizer for a 3x speed https://t.co/3200r2RGHQ

1

0

67

LLMWildling @LLMWildling

7 days ago

Hi all today I release https://t.co/M3dZpqX6Kz It was made with a method I call MXAR. This is a 13x speed up in pre/post training. No LORA, just regular full weights modification. Also update, current KV cache replacement sits at 72X compression. Soon.

0

39

LLMWildling

@LLMWildling

Last Seen Users on Sotwe

Trends for you

Most Popular Users