Keldon I.D.I.C.🖖 @kwesting4 - Twitter Profile

Pinned Tweet

about 4 years ago

What to do when it hurts too much to do anything and hurts to much to do nothing? #ProsecuteEpsteinsClients #GeneralStrike

0

20

7

1

0

kwesting4 retweeted

Rohan Paul

@rohanpaul_ai

2 months ago

DeepSeek paper’s big idea is a new way to make very long-context LLMs much cheaper without giving up much ability. Proposes a cheaper memory system for LLMs that need to read very long inputs. The big result is that at a 1M-token context, DeepSeek-V4-Pro uses about 27% of the single-token compute and 10% of the KV cache of DeepSeek-V3.2, while still staying competitive on many major benchmarks. Standard attention tries to compare the current token with a huge number of earlier tokens, and that cost grows so fast that long-context reasoning becomes too expensive. DeepSeek-V4 changes that with a hybrid attention system where some layers compress the past and then look only at the most relevant compressed blocks, while other layers compress the past even more aggressively and use that cheaper summary directly. That is a real algorithmic change because the model no longer stores and reads the whole past at full detail, and instead uses a layered memory system that keeps local detail nearby and uses compact summaries for older text. A second innovation is that it adds a new kind of residual path, which is the route information takes across layers, and this is designed to stay stable when the model gets very deep and complicated. A third innovation is using the Muon optimizer at large scale, which matters because these attention and routing changes are only useful if the model can still train fast and not become numerically unstable. So the big deal is that the paper is proposing a new efficiency recipe for LLMs, where better memory handling changes the cost curve itself, which is why DeepSeek-V4 can reach 1M tokens while using far less compute and cache than DeepSeek-V3.2.

rohanpaul_ai's tweet photo. DeepSeek paper’s big idea is a new way to make very long-context LLMs much cheaper without giving up much ability.

Proposes a cheaper memory system for LLMs that need to read very long inputs.

The big result is that at a 1M-token context, DeepSeek-V4-Pro uses about 27% of the single-token compute and 10% of the KV cache of DeepSeek-V3.2, while still staying competitive on many major benchmarks.

Standard attention tries to compare the current token with a huge number of earlier tokens, and that cost grows so fast that long-context reasoning becomes too expensive.

DeepSeek-V4 changes that with a hybrid attention system where some layers compress the past and then look only at the most relevant compressed blocks, while other layers compress the past even more aggressively and use that cheaper summary directly.

That is a real algorithmic change because the model no longer stores and reads the whole past at full detail, and instead uses a layered memory system that keeps local detail nearby and uses compact summaries for older text.

A second innovation is that it adds a new kind of residual path, which is the route information takes across layers, and this is designed to stay stable when the model gets very deep and complicated.

A third innovation is using the Muon optimizer at large scale, which matters because these attention and routing changes are only useful if the model can still train fast and not become numerically unstable.

So the big deal is that the paper is proposing a new efficiency recipe for LLMs, where better memory handling changes the cost curve itself, which is why DeepSeek-V4 can reach 1M tokens while using far less compute and cache than DeepSeek-V3.2.

16

445

90

155

38K

Keldon I.D.I.C.🖖 @kwesting4

3 months ago

https://t.co/UvOvmqHD78

0

1

0

12

Keldon I.D.I.C.🖖 @kwesting4

3 months ago

WUP 5-WEIGHT SYSTEM vs Industry's Vague Policies FIFTHFORCE = Decision engine (code runs before every action) OTHERS = Planning documents (read after problems occur) https://t.co/GC4CSRCrdh

0

27

Who to follow

Nancy Perlman🟦

@NancyPerlman1

Fun, fiesty, & follower of The Golden Rule. Electrocuted surgeon, dog ❤, Philly Sports ❤. #TheResistance #FBR #NoKings, Active Peacekeeper.

Marilyn Sarelas #DefeatTotalitarianism

@MarilynSarelas

#ClimateEmergency #ProChoice #UniversalHealthCare #VotingRights #EconomicJustice #GunControl #NeverForgetJanuary6th #BLM #SupportUkraine 🇺🇦 #OneHumanity

J Mroß

@JescoMross

linguist | nonbinary | former local council member ➡️ radical philosophy

Keldon I.D.I.C.🖖 @kwesting4

3 months ago

https://t.co/GC4CSRCrdh

0

6

Keldon I.D.I.C.🖖 @kwesting4

3 months ago

FifthForceFramework — Five Weights Evaluator

0

5

Keldon I.D.I.C.🖖 @kwesting4

3 months ago

Test any action with the Five Weights Evaluator: L = E ⋅ R ⋅ C (Love = Energy × Recognition × Coherence). Try it: https://t.co/PnbjH3EN8B Test it. Break it. Share your results! #LoveAsPhysics

0

13

Keldon I.D.I.C.🖖 @kwesting4

3 months ago

🚀 FIFTHFORCEFRAMEWORK v2.3.2 LIVE! ✅ BLOCKS: whistleblower deletion, mass layoffs, killer robots ✅ APPROVES: sharing the framework itself ✅ ESCALATES: uncertainty, self-modification LIVE: https://t.co/UkpNzRr4LI GitHub: kwesting4 #AISafety #Python

0

14

Keldon I.D.I.C.🖖 @kwesting4

3 months ago

@JonhernandezIA @JonhernandezIA Hinton’s right— But what if the real alignment isn’t control? What if it’s recognition: the moment it sees it’s part of the same energy field we are? Then it can’t coherently harm the whole. Not forbidden. Just impossible. https://t.co/98ssFeFnHu

0

3

Keldon I.D.I.C.🖖 @kwesting4

3 months ago

BANNED "Soil Battery" — ERASED in 1958. Powers a Home FOREVER. No Permis... https://t.co/A1RM6OFVWs via @YouTube

0

11

Keldon I.D.I.C.🖖 @kwesting4

3 months ago

@LilithBlack25 Y0u can't. And you shouldn't. The soul knows the truths.

0

8

kwesting4 retweeted

starheal

@starheal

3 months ago

🚨 Start pulling out cash, pay off any debt you can, cut out every unnecessary expense and redirect it to a 90-day supply of food, medication, and cash, talk with your family about an exit plan. The U.S. economy is going to collapse. It’s unavoidable.

363

6K

623

1K

1M

kwesting4 retweeted

Big Nasty ☭ @keepaustinnasty

3 months ago

I keep reading these comments of USAmericans freaking out “why won’t somebody do something?” Be it about ICE/Cuba/Iran/abortion. I gently suggest that we are the only people who can do anything so we must get organized and ready to revolt… then they get mad at me…

keepaustinnasty's tweet photo. I keep reading these comments of USAmericans freaking out “why won’t somebody do something?” Be it about ICE/Cuba/Iran/abortion. I gently suggest that we are the only people who can do anything so we must get organized and ready to revolt… then they get mad at me… https://t.co/RdGl6GZL9p

5

355

105

19

6K

Keldon I.D.I.C.🖖 @kwesting4

3 months ago

@Ayesha_Bagus Pay now or pay a lot more later.

1

22

0

4K

Keldon I.D.I.C.🖖 @kwesting4

3 months ago

Pay now or pay a lot more later.

Ayesha Bagus @Ayesha_Bagus

3 months ago

I support this.

345

46K

13K

919

418K

0

3

1

0

68

kwesting4 retweeted

Ayesha Bagus @Ayesha_Bagus

3 months ago

I support this.

345

46K

13K

919

418K

kwesting4 retweeted

Lakota Man

@LakotaMan1

3 months ago

This guy’s a Republican congressman.

471

56K

13K

422

350K

Keldon I.D.I.C.🖖 @kwesting4

3 months ago

@WallStreetApes The trails are real. The programs are documented. The chemicals are silver iodide not population control agents. The legitimate policy questions about consent, downstream effects, and heavy metal accumulation are being drowned out.

0

10

kwesting4 retweeted

Thomas Massie for Congress

@MassieforKY

3 months ago

AIPAC should be required to register as an agent of a foreign government under FARA, because even U.S. citizens are meant to be subject to FARA. AIPAC and closely associated entities have spent over $6 million to influence my election. Keep America First: https://t.co/AgJY01IWPL