William Fleshman @willcfleshman - Twitter Profile

Pinned Tweet

9 months ago

Did you know that LoRA A matrices can be frozen at init w/o degrading performance? 🤯 We leverage this trick to construct an unsupervised routing procedure that achieves identical performance to the previous best with orders of magnitude fewer FLOPs and ~50% less GPU memory. 🧵

willcfleshman's tweet photo. Did you know that LoRA A matrices can be frozen at init w/o degrading performance? 🤯
We leverage this trick to construct an unsupervised routing procedure that achieves identical performance to the previous best with orders of magnitude fewer FLOPs and ~50% less GPU memory. 🧵 https://t.co/16wE38CeTb

1

11

6

7

2K

William Fleshman @willcfleshman

about 14 hours ago

@investingidiocy Great stuff. I've been tinkering with copulas for doing something on the spectrum of fake->real data. Real correlation structure/fake marginals or real marginals (through ECDF) and fake dependence structure (tail dependence or correlation generally). Quite a nice framework.

0

164

William Fleshman @willcfleshman

24 days ago

@NandoDF I'm not affiliated with them, but I love this idea to address exactly that: https://t.co/xoxOxRTDSt

Nir Zicherman

@NirZicherman

28 days ago

AI is making you stupid. Today, we're introducing the all new Oboe, designed to make you smarter. Think about the last 10 answers you got from an LLM. How many of them do you actually remember? Probably none, because LLMs are not good teachers. But @oboelabs helps you learn the way humans are supposed to: through guided conversations, frequent checks for understanding, real-time adjustments, and multiple formats for all learning styles. Here's everything we're introducing today:

107

2K

119

2K

411K

0

1

0

225

willcfleshman retweeted

JHU Computer Science @JHUCompSci

3 months ago

Congratulations, @willcfleshman!

0

17

2

0

663

Who to follow

Chetan Nayak (Brute Ratel C4 Author)

@NinjaParanoid

Dark Vortex Founder/Brute Ratel Author

Connor McGarr

@33y0re

Software Engineer @CrowdStrike

William Burgess

@joehowwolf

Ex-theoretical physicist, currently terrible hacker and wannabe security researcher. Views are, regrettably, my own. Likes = bookmarks

willcfleshman retweeted

Benjamin Van Durme @ben_vandurme

4 months ago

JHU mmBERT extended from 8k to 32k token length by vLLM Semantic Router Team. Cutting edge results on 1,800+ languages, now with longer context! https://t.co/maN3bT1X17

0

30

7

8

2K

William Fleshman @willcfleshman

5 months ago

@ChromeHODLs @hillery_dan That's definitely easier, just might not be optimal depending on your tax situation. Assuming you can almost capture both rates, swapping back and forth with T-bills would compound faster up to a 25% tax. Tax free accounts, if available, is the real way to go.

0

19

William Fleshman @willcfleshman

5 months ago

@ChromeHODLs @hillery_dan Opportunity cost. If you can capture the dividend by only tying up your capital for a couple of days then the rest of the month that capital can be making money elsewhere.

2

0

1

251

William Fleshman @willcfleshman

7 months ago

@EdwardRaffML Maybe it's you that has 0 recall and precision 🤯

0

22

William Fleshman @willcfleshman

8 months ago

@jackjingyuzhang @AmazonScience Congrats Jack!

0

1

0

26

willcfleshman retweeted

Benjamin Van Durme @ben_vandurme

8 months ago

Summer '26 PhD research internships at Microsoft Copilot Tuning. Continual learning, complex reasoning and retrieval, nl2code, data efficient post-training. https://t.co/HM4cKqEhgW

0

233

29

195

16K

William Fleshman @willcfleshman

9 months ago

🚨Check out the paper with @ben_vandurme for more juicy details, like how we improve SpectR and SEQR by calibrating the adapter norms! https://t.co/xicyvjVO0E

0

62

William Fleshman @willcfleshman

9 months ago

Did you know that LoRA A matrices can be frozen at init w/o degrading performance? 🤯 We leverage this trick to construct an unsupervised routing procedure that achieves identical performance to the previous best with orders of magnitude fewer FLOPs and ~50% less GPU memory. 🧵

1

11

6

7

2K

William Fleshman @willcfleshman

9 months ago

SEQR provably routes to the same adapters as SpectR, yielding the same high level of task performance at a fraction of the cost 🤑. Like previous unsupervised approaches, SEQR is secure, with no risk of data leakage if LoRA B matrices are kept private!🔐

1

0

72

willcfleshman retweeted

Orion Weller @orionweller

9 months ago

XLM-R has been SOTA for 6 years for multilingual encoders. That's an eternity in AI 🤯 Time for an upgrade. Introducing mmBERT: 2-4x faster than previous models ⚡ while even beating o3 and Gemini 2.5 Pro 🔥 + open models & training data - try it now! How did we do it? 🧵

orionweller's tweet photo. XLM-R has been SOTA for 6 years for multilingual encoders. That's an eternity in AI 🤯

Time for an upgrade. Introducing mmBERT: 2-4x faster than previous models ⚡ while even beating o3 and Gemini 2.5 Pro 🔥

+ open models & training data - try it now!

How did we do it? 🧵 https://t.co/K8kP53st8C

13

248

64

119

43K

willcfleshman retweeted

Marc Marone

@ruyimarone

9 months ago

3T tokens, ~1800 languages, 2 models - we’re releasing mmBERT, a modern multilingual encoder model!

11

399

67

181

31K

William Fleshman @willcfleshman

10 months ago

@jxmnop Cool stuff, when we did RE-Adapt (https://t.co/pHUJmIDZIx) with llama we saw many of the base->instruct weight updates are approx. low rank but some layers were not. You could repeat your experiment with the llama instruct models to see how close to base you actually get.

0

3

0

1

111

willcfleshman retweeted

Benjamin Van Durme @ben_vandurme

10 months ago

I am growing an R&D team around Copilot Tuning, a newly announced effort that supports adaptation at a customer-specific level. Join us! https://t.co/kVocnuTrKN We collaborate with a crack team of eng and scientists that support the product, also growing! https://t.co/typyUXfQ8g

0

73

15

28

8K

William Fleshman @willcfleshman

10 months ago

Obviously have to attack you as my main AAAI contact 🤣

0

117

William Fleshman @willcfleshman

11 months ago

Check out the paper w/@ben_vandurme now on arXiv: https://t.co/UON3J2P3hN

0

4

0

687

William Fleshman @willcfleshman

11 months ago

SpectR was accepted at @COLM_conf! Our follow-up work, LoRA-Augmented Generation (LAG), combines SpectR w/ a first pass filtering of adapters using Arrow routing. LAG is significantly more efficient, enabling SpectR like performance with much larger LoRA libraries!

willcfleshman's tweet photo. SpectR was accepted at @COLM_conf!

Our follow-up work, LoRA-Augmented Generation (LAG), combines SpectR w/ a first pass filtering of adapters using Arrow routing. LAG is significantly more efficient, enabling SpectR like performance with much larger LoRA libraries! https://t.co/FTN9qQ1j4r

William Fleshman @willcfleshman

about 1 year ago

🚨 Our latest paper is now on ArXiv! 👻 (w/ @ben_vandurme) SpectR: Dynamically Composing LM Experts with Spectral Routing (1/4) 🧵

1

24

13

4

4K

1

6

4

1

1K

William Fleshman

@willcfleshman

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users