数理の弾丸 Ph.D. @_mathbullet - Twitter Profile

Pinned Tweet

数理の弾丸 Ph.D.

@_mathbullet

2 months ago

独立に伴って私が何者かを公開しました。 https://t.co/85nYi0dyjZ 書籍もよければぜひ！ https://t.co/d2Ll05cIai

0

28

1

5

3K

_mathbullet retweeted

elvis

@omarsar0

1 day ago

This SkillOpt paper from Microsoft is a must-read! (bookmark it) I was a bit skeptical of the results reported in the paper when I shared it a few days ago. However, I managed to integrate it into my agent orchestrator and ran a few experiments. The results are mindblowing. Essentially, all my agent skills now have a proper testing framework and a way to self-evolve. I have started to improve all my agent skills with this. One exciting result was when I applied it to my paper-figure-extraction skill, which requires an agent to do multimodal analysis. In particular, it improved quality by +20 points (0.73 → 0.93). I went to see the extracted tables and figures, and I was absolutely stunned by how much better my skill got at the task. Self-improving AI is in the early days, but I think this work is a clear example of the current ability of agents to self-improve. In this case, it was skills, but it's not hard to imagine how this scales to optimizing agent patterns, tool use, context engineering efforts, agentic search, workflows, evals, and even the harness itself. I already started with a few of these ideas inspired by SkillOpt. Stay tuned!

omarsar0's tweet photo. This SkillOpt paper from Microsoft is a must-read!

(bookmark it)

I was a bit skeptical of the results reported in the paper when I shared it a few days ago.

However, I managed to integrate it into my agent orchestrator and ran a few experiments.

The results are mindblowing.

Essentially, all my agent skills now have a proper testing framework and a way to self-evolve. I have started to improve all my agent skills with this.

One exciting result was when I applied it to my paper-figure-extraction skill, which requires an agent to do multimodal analysis. In particular, it improved quality by +20 points (0.73 → 0.93). I went to see the extracted tables and figures, and I was absolutely stunned by how much better my skill got at the task.

Self-improving AI is in the early days, but I think this work is a clear example of the current ability of agents to self-improve.

In this case, it was skills, but it's not hard to imagine how this scales to optimizing agent patterns, tool use, context engineering efforts, agentic search, workflows, evals, and even the harness itself. I already started with a few of these ideas inspired by SkillOpt.

Stay tuned!

36

597

101

810

37K

数理の弾丸 Ph.D.

@_mathbullet

2 days ago

人間の狭い経験知がAIの探索空間を狭める

Ben Vinegar

@bentlegen

2 days ago

A pattern I'm seeing with AI debugging: it's easy to get stuck inside the model's search space. So you burn tokens & time chasing candidate fixes, while the real answer sits in context only you have ... but never explored, because you quietly surrendered your thinking.

bentlegen's tweet photo. A pattern I'm seeing with AI debugging: it's easy to get stuck inside the model's search space.

So you burn tokens & time chasing candidate fixes, while the real answer sits in context only you have ... but never explored, because you quietly surrendered your thinking. https://t.co/GdrBQ1MiPn

14

56

3

17

6K

0

5

0

974

_mathbullet retweeted

数理の弾丸 Ph.D.

@_mathbullet

3 days ago

paper-details スキルを更新しました。explain-via-htmlスキルとの併用で快適な論文読みを実現できるぞ https://t.co/6NAeQCTJ1c

0

35

7

32

2K

数理の弾丸 Ph.D.

@_mathbullet

3 days ago

paper-details スキルを更新しました。explain-via-htmlスキルとの併用で快適な論文読みを実現できるぞ https://t.co/6NAeQCTJ1c

0

35

7

32

2K

_mathbullet retweeted

数理の弾丸 Ph.D.

@_mathbullet

6 days ago

HRMはもっと注目されていいと思うんだよな　まあでもあんまり認知されきってない感じが愛せるというところもある【論文解読】省エネで強い新たな言語モデル【HRM-Text】 https://t.co/GXDg28wd5l

0

7

1

1K

数理の弾丸 Ph.D.

@_mathbullet

6 days ago

HRMはもっと注目されていいと思うんだよな　まあでもあんまり認知されきってない感じが愛せるというところもある【論文解読】省エネで強い新たな言語モデル【HRM-Text】 https://t.co/GXDg28wd5l

0

7

1

1K

数理の弾丸 Ph.D.

@_mathbullet

8 days ago

YouTube内で動画についてQAできるのか。便利じゃないか

数理の弾丸 Ph.D.

@_mathbullet

9 days ago

grill-me、ご存知の方も多いと思いますがかなーりおすすめですどういうスキルなのか納得感持って使いたいよね、という回です /grill-me：AI駆動開発を超助けてくれるSkill【開発以外にも！】 https://t.co/zsmvBHyxCH

0

8

2

5

2K

0

3

0

481

_mathbullet retweeted

数理の弾丸 Ph.D.

@_mathbullet

9 days ago

grill-me、ご存知の方も多いと思いますがかなーりおすすめですどういうスキルなのか納得感持って使いたいよね、という回です /grill-me：AI駆動開発を超助けてくれるSkill【開発以外にも！】 https://t.co/zsmvBHyxCH

0

8

2

5

2K

_mathbullet retweeted

蒼空快人 @aozorakaito1017

9 days ago

数理の弾丸（@_mathbullet）様のYouTube動画にて、タイトルコールを読み上げさせていただきました！人工知能や言語にまつわる高度な内容をわかりやすく解説している素敵なチャンネルです。ぜひご覧ください！ https://t.co/qT7xlJyPMw

1

5

3

0

593

数理の弾丸 Ph.D.

@_mathbullet

9 days ago

grill-me、ご存知の方も多いと思いますがかなーりおすすめですどういうスキルなのか納得感持って使いたいよね、という回です /grill-me：AI駆動開発を超助けてくれるSkill【開発以外にも！】 https://t.co/zsmvBHyxCH

0

8

2

5

2K

数理の弾丸 Ph.D.

@_mathbullet

9 days ago

原点に帰るなぜわざわざブランチを切るかhttps://t.co/BKbuiS5e5y

0

6

1

4

708

_mathbullet retweeted

数理の弾丸 Ph.D.

@_mathbullet

13 days ago

ハーネスで性能が変動するなら、RAG手法の検討をするときはハーネス横断的に評価するべき評価に使ってるベクトル検索はだいぶ古典的だし全体的に網羅性はあまりないけど、それでも重要な指摘ではあるなとエージェント時代のRAG評価【Is Grep All You Need?】 https://t.co/vEFcAJh3R3

0

16

3

8

2K

数理の弾丸 Ph.D.

@_mathbullet

13 days ago

ハーネスで性能が変動するなら、RAG手法の検討をするときはハーネス横断的に評価するべき評価に使ってるベクトル検索はだいぶ古典的だし全体的に網羅性はあまりないけど、それでも重要な指摘ではあるなとエージェント時代のRAG評価【Is Grep All You Need?】 https://t.co/vEFcAJh3R3

0

16

3

8

2K

_mathbullet retweeted

Matt Pocock

@mattpocockuk

15 days ago

You asked for it, so here it is: a deep-dive on my new /handoff skill. It's an alternative to /compact that gives you WAY more flexibility with your context window. - Think of an idea, handoff to another agent to implement - Grill, handoff to prototype, handoff BACK Enjoy:

43

1K

69

1K

85K

_mathbullet retweeted

数理の弾丸 Ph.D.

@_mathbullet

17 days ago

AIを使って論文を読むという作業は「いかにして理解に集中するか」と捉えています。という話 AI駆動の論文解読 https://t.co/aGzvaNaHCx

0

41

4

26

3K

数理の弾丸 Ph.D.

@_mathbullet

17 days ago

AIを使って論文を読むという作業は「いかにして理解に集中するか」と捉えています。という話 AI駆動の論文解読 https://t.co/aGzvaNaHCx

0

41

4

26

3K

数理の弾丸 Ph.D.

@_mathbullet

17 days ago

ものすごく再生されてないので是非見てください

数理の弾丸 Ph.D.

@_mathbullet

20 days ago

オートエンコーダをLLMベースで構築して、LLM内部のベクトルを自然言語で説明する技術報酬ハック的な挙動の裏で、回答には直接出てこないズルの計画を立てている可能性などを示唆 AIの心をテキストで読めるとしたら【Natural Language Autoencoder】 https://t.co/Zrq2mXNQPC

0

38

4

20

6K

1

14

0

5

1K

数理の弾丸 Ph.D.

@_mathbullet

20 days ago

オートエンコーダをLLMベースで構築して、LLM内部のベクトルを自然言語で説明する技術報酬ハック的な挙動の裏で、回答には直接出てこないズルの計画を立てている可能性などを示唆 AIの心をテキストで読めるとしたら【Natural Language Autoencoder】 https://t.co/Zrq2mXNQPC

0

38

4

20

6K

数理の弾丸 Ph.D.

@_mathbullet

20 days ago

Webサイト/Webアプリはじめ外のネットワークとやり取りする何かを作るというのはそうじゃない開発と比べて遥かにセンシティブなので、人間のリテラシー必要だよねというのもそうだがAIも他の開発と同じノリで人間の言いなりに開発するべきじゃないでしょという考え

0

1

0

271

数理の弾丸 Ph.D.

@_mathbullet

24 days ago

AI応答をHTMLでいい感じに出させるSkill https://t.co/P848EWtAWL

0

11

0

3

692

数理の弾丸 Ph.D.

@_mathbullet

Last Seen Users on Sotwe

Trends for you

Most Popular Users