@KouichiYan61008 Многие русские искренне любят Японию и её культуру. Поэтому особенно странно видеть враждебность со стороны японцев — мы, например, не припоминаем им Цусиму.
@elonmusk Nope. For people who struggling it literally can. Money can buy you a house, a decent life. Rest is up to you.
But without money you will expell all your time just to stay afloat.
DeepSeek just fixed one of AI's oldest problems.
(using a 60-year-old algorithm)
Here's the story:
When deep learning took off, researchers hit a wall. You can't just stack layers endlessly. Signals either explode or vanish. Training deep networks was nearly impossible.
ResNets solved this in 2016 with residual connections:
output = input + what the layer learned
That "+" creates a direct highway for information. This is why we can now train networks with hundreds of layers.
Recently, researchers asked: what if we had multiple highways instead of one?
Hyper-Connections (HC) expanded that single lane into 4 parallel lanes with learnable matrices that mix information between streams.
The performance gains were real. But there was a problem:
Those mixing matrices compound across layers. A tiny 5% amplification per layer becomes 18x after 60 layers. The paper measured amplification reaching 3000x. Training collapses.
The usual fixes? Gradient clipping. Careful initialization. Hoping things work out.
These are hacks. And hacks don't scale.
DeepSeek went back to first principles. What mathematical constraint would guarantee stability?
The answer was sitting in a 1967 paper: the Sinkhorn-Knopp algorithm.
It forces mixing matrices to be "doubly stochastic," where rows and columns each sum to 1.
The results:
- 3000x instability reduced to 1.6x
- Stability guaranteed by math, not luck
- Only 6.7% additional training overhead
No hacks. Just math.
I've shared link to the paper in the next tweet.
🚨 MIT proved you can delete 90% of a neural network without losing accuracy.
Five years later, nobody implements it.
"The Lottery Ticket Hypothesis" just went from academic curiosity to production necessity, and it's about to 10x your inference costs.
Here's what changed (and why this matters now):