AI Adam @AI_AdamZ - Twitter Profile

Pinned Tweet

16 days ago

In GenAI era, I think the answers of two simple questions can reflect your quant ability: Quant research ability check: How do you maintain your factors? Quant development ability check: How many GPUs are you using?

0

1

0

1

145

AI Adam @AI_AdamZ

2 days ago

Learning Neural Networks like learning kernels before AI 👀

Kate Deyneka

@katedeyneka

8 days ago

I'm a very visual person. when I was first getting into ML, I'd try to draw out every concept on pen and paper. back then I couldn't vibe-code a visualization. but now you can! here are my favorite ML visualizations I've been saving for a while. take them as inspo for the next complex topic you want to visualize 🧵

33

1K

164

2K

80K

0

95

AI Adam @AI_AdamZ

6 days ago

@banyudongyong @lochan_twt hangzhou

1

2

0

151

AI Adam @AI_AdamZ

7 days ago

Oh

Watcher.Guru

@WatcherGuru

8 days ago

JUST IN: Citron Research founder and short seller Andrew Left found guilty of securities fraud. Prosecutors say he illegally influenced share prices through tweets, making $20,000,000

WatcherGuru's tweet photo. JUST IN: Citron Research founder and short seller Andrew Left found guilty of securities fraud.

Prosecutors say he illegally influenced share prices through tweets, making $20,000,000 https://t.co/SZl3c7WNio

557

5K

627

299

726K

0

136

Who to follow

DeFi Land Farmor 🧑‍🌾, OnlyFarms enjoyoor, Lads sytoshi.sol

AI_AdamZ retweeted

roife

@roifex

8 days ago

日啖 token 十亿枚

7

165

5

4

16K

AI_AdamZ retweeted

AI Adam @AI_AdamZ

8 days ago

@SemiAnalysis_ Shill me all your AI Jokes:)

0

1

0

164

AI_AdamZ retweeted

SemiAnalysis

@SemiAnalysis_

8 days ago

NVIDIA is proud to announce their partnership with Dwarkesh Patel.

44

1K

32

100

190K

AI Adam @AI_AdamZ

8 days ago

@SemiAnalysis_ Shill me all your AI Jokes:)

0

1

0

164

AI_AdamZ retweeted

AI Adam @AI_AdamZ

8 days ago

@yminsky I like my btop:)

0

2

1

203

AI Adam @AI_AdamZ

8 days ago

@yminsky I like my btop:)

0

2

1

203

AI_AdamZ retweeted

Claude

@claudeai

12 days ago

Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.

claudeai's tweet photo. Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.

Available today at the same price. https://t.co/EufxL7T1kb

4K

68K

9K

8K

15M

AI_AdamZ retweeted

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

14 days ago

Language Models Need Sleep "Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache." "increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning."

iScienceLuvr's tweet photo. Language Models Need Sleep

"Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache."

"increasing sleep duration N for our models improves performance, with the largest gains on examples that require deeper reasoning."

32

916

147

717

66K

AI Adam @AI_AdamZ

14 days ago

Amazing 👍

Muratcan Koylan

@koylanai

14 days ago

Gradient descent for SKILL.md files sounds interesting, maybe a bit complex but it's becoming a real part of agent harness. SkillOpt is one of the first papers to treat markdown skill files as trainable parameters and provides a proper optimization framework for them. A few things I learned that you should consider too. 1. The validation gate is the only thing that matters in a self-editing loop. Held-out set, strict improvement, ties rejected. End-to-end, their best skills land with 1 to 4 accepted edits total. If your "self-improving agent" is accepting most of what it proposes, you're shipping slop. 2. Bounded edits are better than full rewrites. 4 to 8 edits per step is the sweet spot. Remove the budget and performance collapses. This is the textual analog of learning rate, and it transfers to any LLM-as-author loop. If you're using an agent to refactor your docs, your prompts, or your skills, cap the diff size. 3. Compactness wins. Median final skill: ~920 tokens. Skills do not need to be long. They need to be high-signal. Most skill files I see are bloated because length feels like effort. It isn't. 4. The harness is becoming less important; the skill is becoming more important. A Codex-trained skill ported into Claude Code hit +59.7 points on SpreadsheetBench. Procedural knowledge is more general than the runtime that produced it. 5. Frozen model + trained context is the practical adaptation. GPT-5.4-nano with a SkillOpt'd skill ≈ frontier behavior on procedural benchmarks. Cheaper, portable, inspectable, zero inference-time cost. This is the answer to "how do we adapt a frontier model for our domain" for almost everyone who isn't training their own models. 6. Verification is the bottleneck. Every gate in this paper depends on an auto-grader. That works for benchmarks. It fails for writing, design, and strategy, exactly the open-ended work we want to automate. Whoever builds the verifier for open-ended tasks owns the next stage. There are also two leassons I learned while shipping v2.3.0 of my Context Engineering Agent Skills repo, measured across composer-2, claude-opus-4-7, gpt-5.5, and gemini-3.1-pro via the @cursor_ai SDK: - Description and body are two different surfaces. The router only sees the description. The agent sees the body once activated. They can quietly disagree, and only end-to-end task tests catch it. - Aggregate accuracy is the wrong unit. When I rewrote three descriptions, the corpus average moved ~1pp. Individual skills moved 23–25pp. Per-skill effect size is where the action is. Also, in Feb 2026 I shared a piece called Personal Brain OS arguing that the markdown file is a first-class substrate for agent state. SkillOpt is the optimizer-shaped version of that same argument: not "store memory in files" but "treat files as trainable parameters with proper optimization machinery around them." That's the move from static to measured. The fast/slow split they describe already lives implicitly in the digital-brain-skill repo: - voice-guide and tone-of-voice.md are slow-state (rarely touched) - posts.jsonl and bookmarks.jsonl are fast-state What SkillOpt adds that I didn't have is a protected section invariant, a structural guarantee that fast edits cannot overwrite slow lessons. Removing that mechanism cost them 22 points on SpreadsheetBench. Worth borrowing. If you're building agents, SkillOpt: Executive Strategy for Self-Evolving Agent Skills is a good paper to read: https://t.co/ZS9SZXQ6Mv

koylanai's tweet photo. Gradient descent for SKILL.md files sounds interesting, maybe a bit complex but it's becoming a real part of agent harness.

SkillOpt is one of the first papers to treat markdown skill files as trainable parameters and provides a proper optimization framework for them.

A few things I learned that you should consider too.

1. The validation gate is the only thing that matters in a self-editing loop.

Held-out set, strict improvement, ties rejected. End-to-end, their best skills land with 1 to 4 accepted edits total. If your "self-improving agent" is accepting most of what it proposes, you're shipping slop.

2. Bounded edits are better than full rewrites. 4 to 8 edits per step is the sweet spot.

Remove the budget and performance collapses. This is the textual analog of learning rate, and it transfers to any LLM-as-author loop. If you're using an agent to refactor your docs, your prompts, or your skills, cap the diff size.

3. Compactness wins. Median final skill: ~920 tokens.

Skills do not need to be long. They need to be high-signal. Most skill files I see are bloated because length feels like effort. It isn't.

4. The harness is becoming less important; the skill is becoming more important.

A Codex-trained skill ported into Claude Code hit +59.7 points on SpreadsheetBench. Procedural knowledge is more general than the runtime that
produced it.

5. Frozen model + trained context is the practical adaptation.

GPT-5.4-nano with a SkillOpt'd skill ≈ frontier behavior on procedural benchmarks. Cheaper, portable, inspectable, zero inference-time cost. This is
the answer to "how do we adapt a frontier model for our domain" for almost everyone who isn't training their own models.

6. Verification is the bottleneck.

Every gate in this paper depends on an auto-grader. That works for benchmarks. It fails for writing, design, and strategy, exactly the open-ended work we want to automate. Whoever builds the verifier for open-ended tasks owns the next stage.

There are also two leassons I learned while shipping v2.3.0 of my Context Engineering Agent Skills repo, measured across composer-2, claude-opus-4-7,
gpt-5.5, and gemini-3.1-pro via the @cursor_ai SDK:
- Description and body are two different surfaces. The router only sees the description. The agent sees the body once activated. They can quietly disagree, and only end-to-end task tests catch it.
- Aggregate accuracy is the wrong unit. When I rewrote three descriptions, the corpus average moved ~1pp. Individual skills moved 23–25pp. Per-skill effect size is where the action is.

Also, in Feb 2026 I shared a piece called Personal Brain OS arguing that the markdown file is a first-class substrate for agent state. SkillOpt is the optimizer-shaped version of that same argument: not "store memory in files" but "treat files as trainable parameters with proper optimization machinery around them." That's the move from static to measured.

The fast/slow split they describe already lives implicitly in the digital-brain-skill repo:
- voice-guide and tone-of-voice.md are slow-state (rarely touched)
- posts.jsonl and bookmarks.jsonl are fast-state

What SkillOpt adds that I didn't have is a protected section invariant, a structural guarantee that fast edits cannot overwrite slow lessons. Removing that mechanism cost them 22 points on SpreadsheetBench. Worth borrowing.

If you're building agents, SkillOpt: Executive Strategy for Self-Evolving Agent Skills is a good paper to read: https://t.co/ZS9SZXQ6Mv

49

2K

242

5K

769K

0

2

0

104

AI Adam @AI_AdamZ

17 days ago

Hangzhou, Silicon Valley in China @vllm_project

0

2

0

99

AI_AdamZ retweeted

Elon Musk

@elonmusk

20 days ago

Bravo @JeffBezos!

8K

231K

22K

13K

37M

AI_AdamZ retweeted

Elon Musk

@elonmusk

25 days ago

Critique of the 𝕏 algorithm is welcome. There will be monthly updates of the latest algorithm to GitHub with release notes. As reminder, you can always choose no algorithm via the Following tab.

6K

42K

7K

5K

20M

AI_AdamZ retweeted

AI Adam @AI_AdamZ

24 days ago

@yminsky @dwarkesh_sp Who don’t build LLM themselves are no longer quantitative trading firms for sure 💯

0

1

2

0

688

AI Adam @AI_AdamZ

24 days ago

@yminsky @dwarkesh_sp Who don’t build LLM themselves are no longer quantitative trading firms for sure 💯

0

1

2

0

688

AI_AdamZ retweeted

Elon Musk

@elonmusk

26 days ago

@whyyoutouzhele 我的儿子正在学习普通话

8K

118K

6K

5K

14M

AI Adam

@AI_AdamZ

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users