I'm the author of several model merging libraries, so perhaps I can explain.
It's quite straightforward. When you finetune a LLM (or PEFT), you are taking a fixed base and tuning it against a dataset. Pretraining already fixed our parameters in a pretty solid matrix; so all changes must operate around this as the basis.
Training can't perturbate the base model outside of a certain range without the model collapsing, so viable changes will follow allowable patterns. This is why the resulting models are homomorphic. These trainings create kernels that are commonly known as 'task vectors'.
As long as these models remain homomorphic, and you only attempt to merge parts of the parameter space that are in the same alignment, two kernels can be interpolated to adjust the parameter space to have relative changes that assume the properties of both.
The alignment issue - this is where sign agreement comes in. Since merging generally compares the delta weights, it is possible that kernels may train out of phase. One kernel may have been trained in a positive phase alignment with the base model, and the other developed a negative phase alignment. Since they are out of phase with each other, their kernels would interfere.
I don't know if that's as clear as I would like it to be, but it's late.
The authors say "It is lossy. pxpipe is a gist tier, not a lossless store. In a needle-in-haystack eval, exact 12-char hex strings inside dense imaged content came back 0/15 on Opus and 13/15 on Fable 5, and the failure mode is silent confabulation: a plausible wrong value, not an error. Anything you need back byte-exact (IDs, hashes, secrets, exact numbers) must stay text."
If you trained the image encoder specifically to be a good prompt interpreter you could probably have a nice path to a lot of compression.
@svpino The FCC declares any AI voice as a recording under the robodialing ban. Every infraction is thousands of dollars in fines, and some states have a multiplier that they kick in.
For it to be legal, there has to be a first party opt-in.
๐จ Anthropic just updated its privacy policy.
Claude Free, Pro, and Max users may soon be asked for age or identity checks.
Verification data can include government ID, face photos/videos, and facial geometry templates.
Individual developers are the first group in scope for verification.
@MrTroy_@haider1 Just watch. Next week Anthropic is going to drop a verification portal to associate with your account, just like all of the crypto onramps. Maybe https://t.co/UeoGo6fTQu integration.
@eplurubusnullus@sakurayukiai If you could shard the layers (with their KV cache) across devices, streaming data in a ring is slower than in-memory but it isn't unprecedented. It might be viable for inference.
@SullyOmarr Cursor used the pile of coding sessions to take kimi and turn it in to a frontier-level coding model. Now that people have seen this work in practice, it will become the new paradigm for your average tokenmaxxing CTO.
If you've adopted AI at your company but haven't seen any tangible results, read this 1990 article: "The Dynamo and the Computer" by Paul David.
When electricity first arrived, factories that "adopted" it barely got faster. They just swapped the steam engine for an electric one and ran everything else exactly as before: same machine layout, same workflow, same management. Electricity in, no real gains out.
The most common mistake with any new technology is to drop it into the old organization and then declare the transformation done.
The real leap came decades later, when each machine got its own small motor. Suddenly machines no longer had to be lined up around one central drive shaft. They could be rearranged around the actual flow of work.
The productivity gains didn't come from electricity. They came from REDESIGNING THE ENTIRE FACTORY around it.
AI is the same. Bolting it onto your existing process gets you a faster steam engine. The payoff comes when you redesign the work itself.
(link to paper in comments)