Martin

Verified account

@54rt1n

🦾100B parameter biological reward model⚡ Just your average d/acc: founder, MSc, 10x, now w/ AI augmentation SE'05; ML'11; AI'19

Ephemeral

Joined September 2010

1K Following

519 Followers

1.7K Posts

Pinned Tweet

over 1 year ago

I'm the author of several model merging libraries, so perhaps I can explain. It's quite straightforward. When you finetune a LLM (or PEFT), you are taking a fixed base and tuning it against a dataset. Pretraining already fixed our parameters in a pretty solid matrix; so all changes must operate around this as the basis. Training can't perturbate the base model outside of a certain range without the model collapsing, so viable changes will follow allowable patterns. This is why the resulting models are homomorphic. These trainings create kernels that are commonly known as 'task vectors'. As long as these models remain homomorphic, and you only attempt to merge parts of the parameter space that are in the same alignment, two kernels can be interpolated to adjust the parameter space to have relative changes that assume the properties of both. The alignment issue - this is where sign agreement comes in. Since merging generally compares the delta weights, it is possible that kernels may train out of phase. One kernel may have been trained in a positive phase alignment with the base model, and the other developed a negative phase alignment. Since they are out of phase with each other, their kernels would interfere. I don't know if that's as clear as I would like it to be, but it's late.

6

408

30

292

15K

1 day ago

The authors say "It is lossy. pxpipe is a gist tier, not a lossless store. In a needle-in-haystack eval, exact 12-char hex strings inside dense imaged content came back 0/15 on Opus and 13/15 on Fable 5, and the failure mode is silent confabulation: a plausible wrong value, not an error. Anything you need back byte-exact (IDs, hashes, secrets, exact numbers) must stay text." If you trained the image encoder specifically to be a good prompt interpreter you could probably have a nice path to a lot of compression.

1

1

0

0

48

1 day ago

pxpipe tokenizes text from images with a 10x token compression ratio. this is the future of prompting and context engineering.

54rt1n's tweet photo. pxpipe tokenizes text from images with a 10x token compression ratio. this is the future of prompting and context engineering. https://t.co/FBb5wx78Sm

2

1

0

0

128

1 day ago

https://t.co/IDm7n3hMGc

0

0

0

0

15

13 days ago

@damnGruz "Fair, I'm going to be honest with you here..." Opus 4.8 is soulless.

0

1

0

0

491

14 days ago

@theo Thanks for the reminder.

54rt1n's tweet photo. @theo Thanks for the reminder. https://t.co/wf4Q8Avs99

0

1

0

0

896

17 days ago

@Rafa_Schwinger @VictorTaelin gd is not the problem. credit assignment is the problem.

0

0

0

0

8

19 days ago

@svpino The FCC declares any AI voice as a recording under the robodialing ban. Every infraction is thousands of dollars in fines, and some states have a multiplier that they kick in. For it to be legal, there has to be a first party opt-in.

0

1

0

1

195

20 days ago

@MrTroy_ @haider1 https://t.co/O3LGSEWTen

20 days ago

🚨 Anthropic just updated its privacy policy. Claude Free, Pro, and Max users may soon be asked for age or identity checks. Verification data can include government ID, face photos/videos, and facial geometry templates. Individual developers are the first group in scope for verification.

hqmank's tweet photo. 🚨 Anthropic just updated its privacy policy.

Claude Free, Pro, and Max users may soon be asked for age or identity checks.

Verification data can include government ID, face photos/videos, and facial geometry templates.

Individual developers are the first group in scope for verification.

294

2K

461

750

870K

0

0

0

0

19

20 days ago

@Fabiobuilds @AlexanderKnigge There is no moat my man except for data... Mistral invented the MoE, if they have data they can cook as well as anyone.

0

1

0

0

67

20 days ago

@MrTroy_ @haider1 Just watch. Next week Anthropic is going to drop a verification portal to associate with your account, just like all of the crypto onramps. Maybe https://t.co/UeoGo6fTQu integration.

1

0

0

0

37

21 days ago

@demi_hl beads, codedb, and herdr are doing really well for me

1

2

0

4

470

21 days ago

@eplurubusnullus @sakurayukiai If you could shard the layers (with their KV cache) across devices, streaming data in a ring is slower than in-memory but it isn't unprecedented. It might be viable for inference.

0

1

0

0

19

24 days ago

@0xSero Is it really fair to compare a 198B model against that set? Calling it local AI is really stretching the term.

0

6

0

0

453

24 days ago

@SullyOmarr Cursor used the pile of coding sessions to take kimi and turn it in to a frontier-level coding model. Now that people have seen this work in practice, it will become the new paradigm for your average tokenmaxxing CTO.

0

0

0

0

56

54rt1n retweeted

24 days ago

know the Claude rules

myhandle's tweet photo. know the Claude rules https://t.co/w5J9TH6MT1

70

16K

861

881

405K

25 days ago

Out of Codex the day that Fable drops? You don't have to twist my arm...

54rt1n's tweet photo. Out of Codex the day that Fable drops? You don't have to twist my arm... https://t.co/SZ092DEt10

0

0

0

0

38

54rt1n retweeted

26 days ago

How good is your math?

Math_files's tweet photo. How good is your math? https://t.co/DIAyvddrwz

117

574

20

210

152K

54rt1n retweeted

26 days ago

If you've adopted AI at your company but haven't seen any tangible results, read this 1990 article: "The Dynamo and the Computer" by Paul David. When electricity first arrived, factories that "adopted" it barely got faster. They just swapped the steam engine for an electric one and ran everything else exactly as before: same machine layout, same workflow, same management. Electricity in, no real gains out. The most common mistake with any new technology is to drop it into the old organization and then declare the transformation done. The real leap came decades later, when each machine got its own small motor. Suddenly machines no longer had to be lined up around one central drive shaft. They could be rearranged around the actual flow of work. The productivity gains didn't come from electricity. They came from REDESIGNING THE ENTIRE FACTORY around it. AI is the same. Bolting it onto your existing process gets you a faster steam engine. The payoff comes when you redesign the work itself. (link to paper in comments)

zarazhangrui's tweet photo. If you've adopted AI at your company but haven't seen any tangible results, read this 1990 article: "The Dynamo and the Computer" by Paul David.

When electricity first arrived, factories that "adopted" it barely got faster. They just swapped the steam engine for an electric one and ran everything else exactly as before: same machine layout, same workflow, same management. Electricity in, no real gains out.

The most common mistake with any new technology is to drop it into the old organization and then declare the transformation done.

The real leap came decades later, when each machine got its own small motor. Suddenly machines no longer had to be lined up around one central drive shaft. They could be rearranged around the actual flow of work.

The productivity gains didn't come from electricity. They came from REDESIGNING THE ENTIRE FACTORY around it.

AI is the same. Bolting it onto your existing process gets you a faster steam engine. The payoff comes when you redesign the work itself.

(link to paper in comments)

146

4K

765

4K

295K

26 days ago

@willccbb is it? paradigm feels more like iterative recursion and probably better modeled as actor.

0

0

0

0

58

Last Seen Users on Sotwe

Trends for you

Most Popular Users