Zhenya Ptitsyn 🐦 @eptitsyn - Twitter Profile

@KouichiYan61008 Многие русские искренне любят Японию и её культуру. Поэтому особенно странно видеть враждебность со стороны японцев — мы, например, не припоминаем им Цусиму.

0

5

Zhenya Ptitsyn 🐦 @EPtitsyn

2 months ago

@UKYoshiOk Да, не забывай, что ВСУ убили гражданский народ для фальш-флаг операции, которая провалилась.

0

6

2

0

225

Zhenya Ptitsyn 🐦 @EPtitsyn

2 months ago

Newer leaving this app

0

11

Zhenya Ptitsyn 🐦 @EPtitsyn

2 months ago

#Moscow screwed me over and left me without my medication, so I’m getting the hell out of here. I’m not a Muscovite anymore.

0

17

Zhenya Ptitsyn 🐦 @EPtitsyn

2 months ago

Будь проклята Москва где бордюры важнее лечения граждан. Потемкинские деревни.

0

1

0

15

Zhenya Ptitsyn 🐦 @EPtitsyn

3 months ago

@BrianMcDonaldIE Some kind of hearse

0

437

Zhenya Ptitsyn 🐦 @EPtitsyn

3 months ago

@GunterFehlinger You are a Nazi

0

1

Zhenya Ptitsyn 🐦 @EPtitsyn

5 months ago

@elonmusk Nope. For people who struggling it literally can. Money can buy you a house, a decent life. Rest is up to you. But without money you will expell all your time just to stay afloat.

0

6

EPtitsyn retweeted

Akshay 🚀

@akshay_pachaar

6 months ago

DeepSeek just fixed one of AI's oldest problems. (using a 60-year-old algorithm) Here's the story: When deep learning took off, researchers hit a wall. You can't just stack layers endlessly. Signals either explode or vanish. Training deep networks was nearly impossible. ResNets solved this in 2016 with residual connections: output = input + what the layer learned That "+" creates a direct highway for information. This is why we can now train networks with hundreds of layers. Recently, researchers asked: what if we had multiple highways instead of one? Hyper-Connections (HC) expanded that single lane into 4 parallel lanes with learnable matrices that mix information between streams. The performance gains were real. But there was a problem: Those mixing matrices compound across layers. A tiny 5% amplification per layer becomes 18x after 60 layers. The paper measured amplification reaching 3000x. Training collapses. The usual fixes? Gradient clipping. Careful initialization. Hoping things work out. These are hacks. And hacks don't scale. DeepSeek went back to first principles. What mathematical constraint would guarantee stability? The answer was sitting in a 1967 paper: the Sinkhorn-Knopp algorithm. It forces mixing matrices to be "doubly stochastic," where rows and columns each sum to 1. The results: - 3000x instability reduced to 1.6x - Stability guaranteed by math, not luck - Only 6.7% additional training overhead No hacks. Just math. I've shared link to the paper in the next tweet.

akshay_pachaar's tweet photo. DeepSeek just fixed one of AI's oldest problems.

(using a 60-year-old algorithm)

Here's the story:

When deep learning took off, researchers hit a wall. You can't just stack layers endlessly. Signals either explode or vanish. Training deep networks was nearly impossible.

ResNets solved this in 2016 with residual connections:

output = input + what the layer learned

That "+" creates a direct highway for information. This is why we can now train networks with hundreds of layers.

Recently, researchers asked: what if we had multiple highways instead of one?

Hyper-Connections (HC) expanded that single lane into 4 parallel lanes with learnable matrices that mix information between streams.

The performance gains were real. But there was a problem:

Those mixing matrices compound across layers. A tiny 5% amplification per layer becomes 18x after 60 layers. The paper measured amplification reaching 3000x. Training collapses.

The usual fixes? Gradient clipping. Careful initialization. Hoping things work out.

These are hacks. And hacks don't scale.

DeepSeek went back to first principles. What mathematical constraint would guarantee stability?

The answer was sitting in a 1967 paper: the Sinkhorn-Knopp algorithm.

It forces mixing matrices to be "doubly stochastic," where rows and columns each sum to 1.

The results:

- 3000x instability reduced to 1.6x
- Stability guaranteed by math, not luck
- Only 6.7% additional training overhead

No hacks. Just math.

I've shared link to the paper in the next tweet.

69

3K

461

2K

188K

EPtitsyn retweeted

Alex Prompter

@alex_prompter

6 months ago

🚨 MIT proved you can delete 90% of a neural network without losing accuracy. Five years later, nobody implements it. "The Lottery Ticket Hypothesis" just went from academic curiosity to production necessity, and it's about to 10x your inference costs. Here's what changed (and why this matters now):