Shom

@ShomLinEd

language model | sequence modeling | education | HCI

Web

Joined September 2021

2.3K Following

368 Followers

684 Posts

ShomLinEd retweeted

@TheDawningRoad

about 13 hours ago

Introducing Nex-N2 — true Agentic Thinking, built with @NexEcosystem 🚀 Thinking is now standard in foundation models, but it sits in an awkward position in Agent tasks: either the performance gains aren't significant, or it's verbose, and switching scenarios means readapting all over again. The root cause is that thinking from the o1/R1 era was built around RLVR for math and code tasks, not for long-horizon Agent tasks—there's a layer of separation between thinking and action. Nex-N2 introduces a complete Agentic Thinking framework, split into two parts: Adaptive Thinking and Coherent Thinking. The former achieves adaptive reasoning intensity, improving speed (which really matters in long-horizon tasks spanning hundreds of steps) and saving unnecessary token expenditure. The latter unifies thinking patterns across different tasks, making actions more stable, consistent, and robust. - Adaptive Thinking, auto-scales reasoning depth per step. Saves ~20% tokens, zero performance loss. - Coherent Thinking, one thinking paradigm across search, coding, and tool use. No more fragile mode-switching. On coding and Agent tasks, Nex-N2 ranks in the top tier of open-source models. The model is fully open-sourced and available simultaneously on Hugging Face, ModelScope, and SiliconFlow. We welcome everyone to try it out. Official website: https://t.co/8j4ZvLAQZt Huggingface: https://t.co/UEx20Th3wQ

TheDawningRoad's tweet photo. Introducing Nex-N2 — true Agentic Thinking, built with @NexEcosystem 🚀

Thinking is now standard in foundation models, but it sits in an awkward position in Agent tasks: either the performance gains aren't significant, or it's verbose, and switching scenarios means readapting all over again. The root cause is that thinking from the o1/R1 era was built around RLVR for math and code tasks, not for long-horizon Agent tasks—there's a layer of separation between thinking and action.

Nex-N2 introduces a complete Agentic Thinking framework, split into two parts: Adaptive Thinking and Coherent Thinking. The former achieves adaptive reasoning intensity, improving speed (which really matters in long-horizon tasks spanning hundreds of steps) and saving unnecessary token expenditure. The latter unifies thinking patterns across different tasks, making actions more stable, consistent, and robust.

- Adaptive Thinking, auto-scales reasoning depth per step. Saves ~20% tokens, zero performance loss.
- Coherent Thinking, one thinking paradigm across search, coding, and tool use. No more fragile mode-switching.

On coding and Agent tasks, Nex-N2 ranks in the top tier of open-source models. The model is fully open-sourced and available simultaneously on Hugging Face, ModelScope, and SiliconFlow. We welcome everyone to try it out.

Official website: https://t.co/8j4ZvLAQZt
Huggingface: https://t.co/UEx20Th3wQ

TheDawningRoad's tweet photo. Introducing Nex-N2 — true Agentic Thinking, built with @NexEcosystem 🚀

Thinking is now standard in foundation models, but it sits in an awkward position in Agent tasks: either the performance gains aren't significant, or it's verbose, and switching scenarios means readapting all over again. The root cause is that thinking from the o1/R1 era was built around RLVR for math and code tasks, not for long-horizon Agent tasks—there's a layer of separation between thinking and action.

Nex-N2 introduces a complete Agentic Thinking framework, split into two parts: Adaptive Thinking and Coherent Thinking. The former achieves adaptive reasoning intensity, improving speed (which really matters in long-horizon tasks spanning hundreds of steps) and saving unnecessary token expenditure. The latter unifies thinking patterns across different tasks, making actions more stable, consistent, and robust.

- Adaptive Thinking, auto-scales reasoning depth per step. Saves ~20% tokens, zero performance loss.
- Coherent Thinking, one thinking paradigm across search, coding, and tool use. No more fragile mode-switching.

On coding and Agent tasks, Nex-N2 ranks in the top tier of open-source models. The model is fully open-sourced and available simultaneously on Hugging Face, ModelScope, and SiliconFlow. We welcome everyone to try it out.

Official website: https://t.co/8j4ZvLAQZt
Huggingface: https://t.co/UEx20Th3wQ

TheDawningRoad's tweet photo. Introducing Nex-N2 — true Agentic Thinking, built with @NexEcosystem 🚀

Thinking is now standard in foundation models, but it sits in an awkward position in Agent tasks: either the performance gains aren't significant, or it's verbose, and switching scenarios means readapting all over again. The root cause is that thinking from the o1/R1 era was built around RLVR for math and code tasks, not for long-horizon Agent tasks—there's a layer of separation between thinking and action.

Nex-N2 introduces a complete Agentic Thinking framework, split into two parts: Adaptive Thinking and Coherent Thinking. The former achieves adaptive reasoning intensity, improving speed (which really matters in long-horizon tasks spanning hundreds of steps) and saving unnecessary token expenditure. The latter unifies thinking patterns across different tasks, making actions more stable, consistent, and robust.

- Adaptive Thinking, auto-scales reasoning depth per step. Saves ~20% tokens, zero performance loss.
- Coherent Thinking, one thinking paradigm across search, coding, and tool use. No more fragile mode-switching.

On coding and Agent tasks, Nex-N2 ranks in the top tier of open-source models. The model is fully open-sourced and available simultaneously on Hugging Face, ModelScope, and SiliconFlow. We welcome everyone to try it out.

Official website: https://t.co/8j4ZvLAQZt
Huggingface: https://t.co/UEx20Th3wQ

0

15

5

3

214

Shom @ShomLinEd

8 days ago

@_m0se_ It seems in hybrid models linear and full attentions take on different roles as full attentions capture long term dependencies more easily leaving linear attention to focus on local mixing.

0

1

0

0

112

Shom @ShomLinEd

12 days ago

@can some of sqlte's tests are public tho

ShomLinEd's tweet photo. @can some of sqlte's tests are public tho https://t.co/oPmnMMSJGE

0

5

0

0

2K

Shom @ShomLinEd

12 days ago

@jarredsumner Is this fuzzer a library or made by you?

1

1

0

0

880

Who to follow

Lucas Correa Alves

@lucascorrealves

Verified account

Senior Product Designer at @zen_com • ex @ZabkaPolska • @netguru 10+ years of commercial experience. Interfaces, BD Ownership, Interaction + Building nbs_studio

Shom @ShomLinEd

22 days ago

@boshen_c it's from zig presumably

0

3

0

0

3K

ShomLinEd retweeted

28 days ago

This is growth-hacking dressed up in open-source language, @radixark please stop doing it immediately. Paying people in platform credits to star a GitHub repo and repost a marketing tweet isn't "fueling the community" — it's laundering paid promotion through the trust signals open source depends on. Stars are supposed to mean someone found a project useful. Attach a $200 bounty and the number means nothing. GitHub's own policies prohibit this for exactly that reason.

KaichaoYou's tweet photo. This is growth-hacking dressed up in open-source language, @radixark please stop doing it immediately.

Paying people in platform credits to star a GitHub repo and repost a marketing tweet isn't "fueling the community" — it's laundering paid promotion through the trust signals open source depends on. Stars are supposed to mean someone found a project useful. Attach a $200 bounty and the number means nothing. GitHub's own policies prohibit this for exactly that reason.

5

285

18

44

45K

Shom @ShomLinEd

about 1 month ago

@zephyr_z9 full attention also has linear scaling...

0

0

0

0

181

ShomLinEd retweeted

about 1 month ago

New modded-NanoGPT optimization benchmark result: @wen_kaiyue has improved upon both the Muon and AdamW baselines, by replacing their weight decay with hyperball optimization. The new record is 3325 steps.

kellerjordan0's tweet photo. New modded-NanoGPT optimization benchmark result: @wen_kaiyue has improved upon both the Muon and AdamW baselines, by replacing their weight decay with hyperball optimization. The new record is 3325 steps. https://t.co/8hitPuZmU7

7

425

42

161

61K

Shom @ShomLinEd

about 1 month ago

@kalomaze if you spam on book scans you can get probably even more tokens

0

1

0

0

439

Shom @ShomLinEd

about 1 month ago

@teortaxesTex i don't think they solved long agentic context reasoning, only some basic hashhop stuff...

0

0

0

0

288

Shom @ShomLinEd

about 1 month ago

The massive kv cache reduction of deepseek may unlock agent scaling as an economical choice...Imagine defaulting to 4 parallel agents solving one of your problem with each agents calling 10~20 subagents in parallel to explore different choices.

0

0

0

0

86

ShomLinEd retweeted

Yu Zhang 🐙🌘

about 2 months ago

It's just a small piece of our bigger puzzle, to build a solid ecosystem for linear attention, and to make KDA as plug-and-play as flash-attn.

3

119

9

10

11K

Shom @ShomLinEd

about 2 months ago

@_ueaj it may be due to increased parameters instead of increased kv cache rank from enlarged projections. What if you enlarge MLP layers for smaller kv cache rank models to balance the two models' params? Or do an mlp style expansion in projections?

1

0

0

0

121

Shom @ShomLinEd

about 2 months ago

@fleetwood___ maybe it's too small?

0

0

0

0

248

Shom @ShomLinEd

about 2 months ago

@facontidavide I suggest checking the code rigorously for hacking perhaps by asking another code agent like codex to do it. They have very creative ways to game the benchmark and get high scores.

1

7

0

0

398

Shom @ShomLinEd

about 2 months ago

@Dorialexander isn't gpt-4 rumored to have 200B activated params

1

2

0

0

2K

Shom @ShomLinEd

2 months ago

@TommyGun_AB @honeylemon0124 i am not able to find mentions of bee in IHNMAIMS hmm

1

3

0

0

226

Shom @ShomLinEd

2 months ago

@bigeagle_xd code廉价，验证不廉价

1

4

0

0

279

Shom @ShomLinEd

2 months ago

@ClementDelangue https://t.co/qZU0l1vrdp We published 70k high quality agent traces:)

0

0

0

0

41

Last Seen Users on Sotwe

Trends for you

Most Popular Users