tangent @tangentwei - Twitter Profile

about 1 month ago

Gated DeltaNet-2 is here. 🚀 🔥 New paper: Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention Gated DeltaNet-2 outperforms KDA and Mamba-3, the latest and best recurrent architectures, head to head at 1.3B. 🏆 💡 Here's the idea behind it: Linear attention squeezes an unbounded KV cache into a fixed-size recurrent state. The hard part isn't just what to forget, it's how to edit that memory without scrambling the associations already in it. Prior delta-rule models like Gated DeltaNet and KDA use one scalar gate to do two jobs at once: erasing old content and writing new content. But these two decisions act on different axes of the state, so tying them together is a real limitation. Gated DeltaNet-2 decouples them. ✂️ a channel-wise erase gate b_t picks which key-side coordinates to read and remove ✍️ a channel-wise write gate w_t picks which value-side coordinates to commit 🔁 recovers KDA when both gates collapse to a scalar, and Gated DeltaNet when the decay collapses too ⚡ still trains fast: chunkwise WY algorithm with gate-aware backward, fused in Triton 📊 Results: We train 1.3B models on 100B tokens of FineWeb-Edu, matched in recurrent state size, against Mamba-2, Gated DeltaNet, KDA, and Mamba-3. Best average on language modeling + commonsense reasoning, in both recurrent and hybrid settings Biggest gains on long-context RULER retrieval. S-NIAH-3 jumps from 63 to 90 over KDA, and multi-key needle retrieval climbs from 28 to 38 Joint work with @YejinChoinka and @jankautz. 📄 Paper: https://t.co/Zw6yXbHjGU 💻 Code: https://t.co/s8IWwaRU18 #LinearAttention #StateSpaceModels #Mamba #LLM

ahatamiz1's tweet photo. Gated DeltaNet-2 is here. 🚀

🔥 New paper: Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Gated DeltaNet-2 outperforms KDA and Mamba-3, the latest and best recurrent architectures, head to head at 1.3B. 🏆

💡 Here's the idea behind it:

Linear attention squeezes an unbounded KV cache into a fixed-size recurrent state. The hard part isn't just what to forget, it's how to edit that memory without scrambling the associations already in it.

Prior delta-rule models like Gated DeltaNet and KDA use one scalar gate to do two jobs at once: erasing old content and writing new content. But these two decisions act on different axes of the state, so tying them together is a real limitation.

Gated DeltaNet-2 decouples them.

✂️ a channel-wise erase gate b_t picks which key-side coordinates to read and remove
✍️ a channel-wise write gate w_t picks which value-side coordinates to commit
🔁 recovers KDA when both gates collapse to a scalar, and Gated DeltaNet when the decay collapses too
⚡ still trains fast: chunkwise WY algorithm with gate-aware backward, fused in Triton

📊 Results:

We train 1.3B models on 100B tokens of FineWeb-Edu, matched in recurrent state size, against Mamba-2, Gated DeltaNet, KDA, and Mamba-3.

Best average on language modeling + commonsense reasoning, in both recurrent and hybrid settings
Biggest gains on long-context RULER retrieval. S-NIAH-3 jumps from 63 to 90 over KDA, and multi-key needle retrieval climbs from 28 to 38

Joint work with @YejinChoinka and @jankautz.

📄 Paper: https://t.co/Zw6yXbHjGU
💻 Code: https://t.co/s8IWwaRU18

#LinearAttention #StateSpaceModels #Mamba #LLM

25

661

100

436

197K

tangentwei retweeted

区块链行情研究

@qkl2058

10 months ago

OKX #Boost 利润最大化教程很多人觉得 OKX Boost 不值得刷？大错特错！第一期的例子就很典型：当前 1 分积分大概能分到价值 65~70U 的 $LINEA，而实际磨损只有 3U 左右。换句话说——稳赚！所以问题不是“要不要刷”，而是“怎么刷更合适”。首先一定要okx钱包的返佣，降低自己的成本！走我的链接可以享受永久返佣，最高可以返40%！👇👇 https://t.co/deqr7F1B7b 或者直接填写邀请码 YINGGE888 一、Boost 基础规则 •周期：15 天 •得分来源： 1）钱包余额（≥10U，建议放 100U+，更稳） 2）交易量（分档计分，最高 8 分）代币分类： •一类：0 手续费（少） •二类：0.25% 手续费（主流币） •其他类：0.85% 手续费（杂币） → Boost 积分加成：分别是 0 / 0.25 / 1 → 理论上“其他类”刷起来更划算，但二类币更稳二、利润测算以第一期 $LINEA 为例（奖池 1.6 亿枚，价值约 450 万美金）： •单分价值 ≈ 50~70U •成本：手续费+点差，约 2~3U •结论：性价比极高不过注意：并不是刷得越多赚得越多。根据推算： •3-6 档最优（性价比最高） •超过 512 分后，边际利润开始下降（见图一）三、实操策略 1. 余额保持 ≥100U 稳定币或主流币，避免被卡门槛。 2.交易量 •推荐刷第 3-6 档（128~512 U/天，来回交易），性价比最佳。 •其他类币（如 $PUMP 等小币）磨损更低，但有流动性风险； •二类币（ETH、USDT、BTC 等）更稳。 3.交易习惯 •不要集中在某一天刷完，拉平均值； •不建议一个设备多号，防女巫； •平时就顺手做些低买高卖，顺便把 Boost 刷了。四、参与流程 1.下载并创建 OKX Wallet 插件，保存助记词 🔗https://t.co/yEZCqst8oG 2.绑定邀请码：YINGGE888（最高可省40% 手续费） 🔗 https://t.co/deqr7F1B7b（邀请码YINGGE888） 3.钱包充值 ≥100U 稳定币 & Gas 4.找合适的代币刷量（建议 128~512 U 档位） 5.活动结束后，手动领取奖励（见图二）五、最后的建议不要只看明面门槛（10U/32U），实际竞争会抬高门槛多关注下一期规则是否调整（余额要求可能上调）稳扎稳打比盲目大额更划算一句话总结： OKX Boost 刷 3-6 档，余额 100U+，刷交易量，就是目前最优解。

qkl2058's tweet photo. OKX #Boost 利润最大化教程

很多人觉得 OKX Boost 不值得刷？大错特错！
第一期的例子就很典型：当前 1 分积分大概能分到价值 65~70U 的 $LINEA，而实际磨损只有 3U 左右。换句话说——稳赚！

所以问题不是“要不要刷”，而是“怎么刷更合适”。

首先一定要okx钱包的返佣，降低自己的成本！

走我的链接可以享受永久返佣，最高可以返40%！👇👇
https://t.co/deqr7F1B7b

或者直接填写邀请码 YINGGE888

一、Boost 基础规则

•周期：15 天

•得分来源：
1）钱包余额（≥10U，建议放 100U+，更稳）
2）交易量（分档计分，最高 8 分）

代币分类：
•一类：0 手续费（少）
•二类：0.25% 手续费（主流币）
•其他类：0.85% 手续费（杂币）
→ Boost 积分加成：分别是 0 / 0.25 / 1
→ 理论上“其他类”刷起来更划算，但二类币更稳

二、利润测算

以第一期 $LINEA 为例（奖池 1.6 亿枚，价值约 450 万美金）：
•单分价值 ≈ 50~70U
•成本：手续费+点差，约 2~3U
•结论：性价比极高

不过注意：并不是刷得越多赚得越多。

根据推算：
•3-6 档最优（性价比最高）
•超过 512 分后，边际利润开始下降
（见图一）

三、实操策略

1. 余额
保持 ≥100U 稳定币或主流币，避免被卡门槛。

2.交易量

•推荐刷第 3-6 档（128~512 U/天，来回交易），性价比最佳。

•其他类币（如 $PUMP 等小币）磨损更低，但有流动性风险；

•二类币（ETH、USDT、BTC 等）更稳。

3.交易习惯

•不要集中在某一天刷完，拉平均值；

•不建议一个设备多号，防女巫；

•平时就顺手做些低买高卖，顺便把 Boost 刷了。

四、参与流程

1.下载并创建 OKX Wallet 插件，保存助记词
🔗https://t.co/yEZCqst8oG

2.绑定邀请码：YINGGE888（最高可省40% 手续费）

🔗 https://t.co/deqr7F1B7b（邀请码YINGGE888）

3.钱包充值 ≥100U 稳定币 & Gas
4.找合适的代币刷量（建议 128~512 U 档位）
5.活动结束后，手动领取奖励
（见图二）

五、最后的建议

不要只看明面门槛（10U/32U），实际竞争会抬高门槛
多关注下一期规则是否调整（余额要求可能上调）
稳扎稳打比盲目大额更划算

一句话总结：
OKX Boost 刷 3-6 档，余额 100U+，刷交易量，就是目前最优解。

64

111

22

124

35K

tangent

@tangentwei

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users