deva @devaxsha - Twitter Profile

Pinned Tweet

deva

@devaxsha

2 months ago

DECIDED TO REVAMP THE TOKENLEAK WEBSITE

1

4

0

232

devaxsha retweeted

思维怪怪

@0xLogicrw

about 15 hours ago

谷歌 TPU 软件工程师 Patrick Toulme 指出，外界对 GLM 5.2 靠蒸馏追平 Opus 的说法存在误解。大模型在智能体编码任务上的训练难点在于「零梯度困境」，即模型早期若无法产生正确运行路径，强化学习便无法获得梯度信号来启动参数更新。蒸馏 Claude 或 GPT-5.5 的作用，仅仅是在冷启动阶段提供种子解答以绕过零梯度困境。一旦模型跨过冷启动门槛，后续的性能爬升将不再依赖蒸馏，而是完全依靠强化学习的爬山算法进行自我演化。Toulme 强调，GLM 5.2 已经具备独立产生成功路径的能力，完全可以通过强化学习自主迭代到更高级别，彻底摆脱对美国大模型的依赖。

58

823

87

540

153K

deva

@devaxsha

3 days ago

important context: Fugu is an orchestrator. if you read the blog post, it’s basically a multi-agent system trained to route and coordinate a pool of other llms. that’s collective intelligence boosting the score, which probably means it’s calling the same frontier models it’s being compared against, like gpt-5.5 and opus.

Sakana AI

@SakanaAILabs

3 days ago

Introducing Sakana Fugu: A full multi-agent orchestration system accessible via a single model API. Our ‘Fugu Ultra’ model matches the performance of Fable and Mythos, delivering frontier capability without the risk of export controls. Try it: https://t.co/hhO6qTawgb 🐡

1K

38K

6K

31K

26M

0

4

0

142

deva

@devaxsha

3 days ago

mythos class model btw

Sakana AI

@SakanaAILabs

3 days ago

Introducing Sakana Fugu: A full multi-agent orchestration system accessible via a single model API. Our ‘Fugu Ultra’ model matches the performance of Fable and Mythos, delivering frontier capability without the risk of export controls. Try it: https://t.co/hhO6qTawgb 🐡

1K

38K

6K

31K

26M

0

75

Who to follow

Developer DAO (🧱, 🚀)

@developer_dao

Build web3 with friends 🤝 📆 → https://t.co/XWvb4x538o 📬 → https://t.co/4ASVZt6Noe 🤝🏻 → https://t.co/Y5AaBpNyAk

Oliver Jumpertz

@oliverjumpertz

The Educated Software Engineer | Writing over @ https://t.co/EVIvqJCm8q | YouTube @ https://t.co/Wucct0uW1H

Abbas Khan ⟠

@KhanAbbas201

Serving @ethereum founders at @ethereumfndn | DMs are always open to support builders

deva

@devaxsha

12 days ago

literal large language model

Merriam-Webster

@MerriamWebster

12 days ago

Our new LLM. Available now.

185

8K

1K

717

287K

0

1

0

101

deva

@devaxsha

15 days ago

claude fable being PURPOSELY bad in frontier llm development for other folks to not catch up is so anthropic coded

1

2

0

108

deva

@devaxsha

16 days ago

here’s the research blog https://t.co/mXzA494tSx

0

1

0

89

deva

@devaxsha

16 days ago

> apple has shipped a 20B parameter on-device model. > you can't fit 20B params in RAM at reasonable precision, so they improved the architecture. > a small model predicts which experts to load from NAND to RAM per query. unlike typical MoE, experts don't switch every token.

devaxsha's tweet photo. > apple has shipped a 20B parameter on-device model.

> you can't fit 20B params in RAM at reasonable precision, so they improved the architecture.

> a small model predicts which experts to load from NAND to RAM per query. unlike typical MoE, experts don't switch every token. https://t.co/5IZlaBupuH

1

0

122

devaxsha retweeted

Josh Elman

@joshelman

16 days ago

The technical details behind Apple Foundation Models are worth a read: https://t.co/RI5qV9eKBU

6

339

41

242

32K

deva

@devaxsha

16 days ago

if you're still prompting coding agents directly you're falling behind. you need an agent that infers the task from your cursor hovering over a file for 12 seconds and then over engineers it

rahul

@0interestrates

16 days ago

if you’re still writing loops that prompt coding agents you’re falling behind. you need to build a meta agent that infers what loops you would have wanted based on your vibe and then write those loops

260

5K

232

441

204K

0

1

0

69

deva

@devaxsha

16 days ago

i’m just glad iOS 27 has many performance improvements

0

1

0

128

deva

@devaxsha

16 days ago

@pbicho96 underrated thanks for the share

0

4

deva

@devaxsha

18 days ago

this is the kind of AI infra update that actually matters lol huawei just open sourced KVarN: 3-5x KV cache compression, plugs into vLLM with one flag, and claims speedups instead of the usual quantization tax If it holds up, longer-context local LLMs get a lot cheaper. need to start experimenting with it

devaxsha's tweet photo. this is the kind of AI infra update that actually matters lol

huawei just open sourced KVarN: 3-5x KV cache compression, plugs into vLLM with one flag, and claims speedups instead of the usual quantization tax

If it holds up, longer-context local LLMs get a lot cheaper.

need to start experimenting with it

1

2

0

65

deva

@devaxsha

17 days ago

ok so apparently designing loops not prompts is the new thing a useful coding loop is: prompt agent → inspect diff → run verifier → decide continue/retry/stop → save output > the loop is plumbing. > the skill inside the loop is the asset. vague loops burn tokens. verified loops compound.

0

1

0

40

deva

@devaxsha

about 1 month ago

@andrelandgraf @rauchg This is so cool Andre! Love this

1

0

43

deva

@devaxsha

about 1 month ago

@TheIshanGoswami built tokenleak and timemachinesdk !!! https://t.co/yPD9PRKOIa

deva

@devaxsha

3 months ago

Just release v2 of Tokenleak, with a total overhaul using opentui and Solid.js Now with cursor integration too monitor your tokens with an even better interface!

devaxsha's tweet photo. Just release v2 of Tokenleak, with a total overhaul using opentui and Solid.js

Now with cursor integration too

monitor your tokens with an even better interface! https://t.co/2LO6msfJrv

4

5

1

2K

0

1

0

721

deva

@devaxsha

about 2 months ago

@sdand sick work! curious to know how you’ve scraped your twitter home feed. Is it via the official api?

0

2

0

279

deva

@devaxsha

about 2 months ago

@NousResearch do they have browser use tools

0

62

deva

@devaxsha

about 2 months ago

@donnfelker just reverse prompting, give a brief prompt to chatgpt/gemini to give you a prompt for generating an image that does xyz and make it as detailed as possible and many a times it’d make an amazing prompt

1

0

84