かいせ @kai_ds04 - Twitter Profile

かいせ @kai_ds04

10 days ago

MoEのトレーニングはうまくいっていて知識を詰め込むことはできているけど、推論が弱いということ？確かdenseモデルの方が推論パフォーマンスは良いから、そこのトレードオフで他に劣るってことなんかな。

kabikabi

@jakevin7

10 days ago

有个事挺有意思的。 DeepSeek V4 的技术报告，对所有主流大模型做了一轮横评，结论是——Gemini 3.1 Pro 的世界知识是所有模型里最强的。不是 GPT，不是 Claude，是 Gemini。但大家用 Gemini 的感受普遍是：这玩意好用吗？问题不在模型本身，在于它极度懒得动。你要问它最新的新闻，它有搜索工具，但就是不主动用。很多时候你得明确说你去搜一下，它才搜。就像一个博览群书的人，你问他最近发生了什么，他耸耸肩：我没看今天的报纸。一个世界知识最强的模型，工具懒得调——这才是 Gemini 用起来别扭的真正原因。

233

1K

81

473

389K

0

1

0

129

かいせ @kai_ds04

12 days ago

kaggle cliを使う用のskillをcodexに作らせて、読み取り専用コマンドでサーベイだけするためのskillとkaggleそのものをやるためのskillを作ってみた。

0

233

かいせ @kai_ds04

14 days ago

encodingしていないので、 LLMに渡す時は意味的には整合していなくて次元数しか合っていない。という状態だと思うのですが、LLMのアーキテクチャを変えずにトレーニングがうまくいっているのが謎すぎるんですよね。

うえぞう@うな技研代表

@uezochan

14 days ago

Gemma 4 12B、マルチモーダルのためのエンコーダーをなくして画像はエンベディング、音声は直接LLMに入れるみたいな説明に見えるんだけど音声側はなんでそんなことできるのかさっぱりわからん

1

211

29

94

20K

0

164

かいせ @kai_ds04

14 days ago

>encoder-freeにしたことによって、Vision Encoderが当初行っていた処理や理解の多くをLLMが引き継ぐ必要があり、それはトレーニング中に学習されます。と述べられているけど、LLM側のアーキテクチャを変えずになぜそのトレーニングがうまくいっているのかの言及が一切ないのがモヤモヤする。

0

73

Who to follow

たちょ

@4fEgc

04/27卒/ MUデータサイエンス(5期)/BohPJ/DS垢▶︎@2525_pafe

ini

@ini0702

武蔵野大学データサイエンス学部4年 27卒Data Scientist DS学科生→@ini7219

fki

@f_rnhl

Data Scientist / ハード系のグミが好きです。

かいせ @kai_ds04

14 days ago

DeepMindの方のブログです。画像・音声どちらについても、位置情報は付与した状態でLLMの次元に合わせる。ということだけを行なっているようでした。知識抽出や、意味的な部分はアライメントしないという意味でのencoder-freeなのかと。 A Visual Guide to Gemma 4 12B https://t.co/tMNdwnmh8a

1

0

133

かいせ @kai_ds04

19 days ago

ああやっぱりそっちの方向性だったんだ DeepSeek、「モデル優先」から方針転換か——Claude Codeをベンチマークにコード生成AIへ参入(36Kr Japan) #Yahooニュース https://t.co/DypG8G7toK

0

252

かいせ @kai_ds04

21 days ago

おおー！腰重たくて gemini embedding 2 触ってなかったけど、ペーパーみながら触ってみよっかな！気になる技術だ👀

Mojtaba Seyedhosseini

@mseyed

22 days ago

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini 🚀 Today, we’re sharing the @GoogleDeepMind white paper for GE 2, our first native multimodal embedding model. Whether it’s text, audio, video, or image, GE 2 provides a unified representation of the input.

mseyed's tweet photo. Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini 🚀

Today, we’re sharing the @GoogleDeepMind white paper for GE 2, our first native multimodal embedding model. Whether it’s text, audio, video, or image, GE 2 provides a unified representation of the input. https://t.co/B8NPRK2Gf4

28

1K

191

741

113K

0

152

かいせ @kai_ds04

21 days ago

これ勉強になった自分でagent開発はしないからインフラ周りの解像度は低いけど、自分の責務を見定めてコストを最小化するのはなににでも通ずるから改めて意識したいなあ！提供元: Zenn https://t.co/CnP4RSY7N4

0

109

かいせ @kai_ds04

23 days ago

自分の所属する学部のnoteにて、自分のインタビュー形式の就活体験記が投稿されました！たくさんの人と関わらせていただく中で、自分の価値観は常にアップデートされています。そんなダイナミックな価値観でどういう選択をしていったか、一つ参考になればと思います。 https://t.co/unvbn8noXu

0

3

0

272

かいせ @kai_ds04

26 days ago

今日阪大の柳澤先生の講演をみたんだけど、AI界隈の人間はテンション上がりそうな内容で面白かった。対して一緒に同じ講演みてたゼミの教授は冷静に限界を見定めててやっぱすごいなって思った

0

110

かいせ @kai_ds04

26 days ago

今日の勉強会でself supervised learningについて全く理解してなかったことに気がついた。自己教師の自己に値する部分ってデータのことなのね訓練データのtokenを一個ずらしただけで自己教師ってよくわかんなかったけど、データ自身が教師データを作ってると捉えるととても納得いった

0

76

かいせ @kai_ds04

26 days ago

え、まじか

Zephyr

@zephyr_z9

27 days ago

So OpenAI cut the intelligence of their normal models (low to no web search) Instant/medium/high are pure trash now Only Pro works now

zephyr_z9's tweet photo. So OpenAI cut the intelligence of their normal models (low to no web search)
Instant/medium/high are pure trash now
Only Pro works now https://t.co/v1KPHOUiKD

68

705

28

132

224K

0

193

かいせ @kai_ds04

26 days ago

ゼミで確率的機械学習入門編I 読んでるけど全然進まんw まだ16ページ！だけどちゃんとメンバーが疑問に思ったことを深ぼってるので有意義ではある！

0

3

0

80

かいせ @kai_ds04

26 days ago

相関おじさん口では因果とは言ってないって言ってるのに自分自身混同しがちなのおもろいw

統計たん @stattan

26 days ago

尤度おじさんくっそワロタｗｗｗ

3

926

240

308

145K

0

168

かいせ @kai_ds04

26 days ago

外出するとき暇をみつけてMacを開くのではなく、計画的に家にMacを置いておいてスマホからCodexを叩くほうが生産性高そうw

ぬこぬこ / NUKO 🇯🇵

@nukonuko

27 days ago

Codex App で Mac をロックの解除をせずとも Computer Use を使えるように Codex App の設定から Computer use→Locked use をオンに https://t.co/4c0v2SrL3q

0

175

20

98

15K

0

3

0

192

かいせ @kai_ds04

26 days ago

@alexabelonix この研究は、GRPOのような決定論的に報酬を与えるよりも、やはりLLMで報酬を与えたほうがうまくいくという結果でしょうか？あまりPPOとの違いがわかっておらず、初歩的な質問になってしまい申し訳ないです

0

18

かいせ @kai_ds04

27 days ago

平たく言えばllm-as-a-judgeってことかな？それがきついっていうのがGRPOのモチベだったかと記憶してるんだけど... どうだったっけか

Avi Chawla

@_avichawla

28 days ago

Karpathy's prediction about RL is coming true now! He called reward functions unreliable and argued that a single reward number is too low-dimensional to teach an agent what "good" means for complex tasks. To solve this, Agents need a knowledge-guided review as a higher-dimensional feedback channel. Every major AI lab trains models with RL today (OpenAI, Anthropic, DeepSeek). And their key bottleneck has always been the reward functions. GRPO by DeepSeek worked well for math and code because the environment gave a binary signal. But for real agent tasks, someone still has to hand-code the scoring function. That takes days and breaks every time the pipeline changes. RULER (implemented in OpenPipe ART, 10k stars) addresses the exact problem Karpathy identified. The reward criteria are defined in plain English, and an LLM evaluates each trajectory against that description to provide feedback for training. I trained a Qwen3 1.4B agent that plays 2048 using GRPO with this exact workflow. In this case, the agent saw the board, picked a direction, and RULER evaluated the outcome, all from this natural language definition. You can see the full implementation on GitHub and try it yourself. Here's the ART Repo: https://t.co/fsoLXDK4Zu (don't forget to star it ⭐ ) Just like RLHF replaced manual rankings and GRPO replaced the critic model, natural language rewards are replacing hand-coded scoring functions. RL reward engineering is now prompt engineering. I wrote a full walkthrough covering RL for LLM agents, from RLHF to GRPO to RULER, in the article below.

57

2K

187

3K

348K

1

0

153

かいせ @kai_ds04

26 days ago

appshots試してるけど、zoomのスクショは失敗したので使いどころは限られそう。急いでcodexに聞きたいときって難しい講演とかなので俺のユースケースに合わん後はappshotsでどういうコンテキストが含まれるのかが気になる

OpenAI Developers

@OpenAIDevs

27 days ago

It’s Codex Thursday, and yes, we have updates for you. First up: Appshots, a new way to bring the context of what you’re working on into Codex. On your Mac, press Command-Command to attach your app window to a Codex thread. Codex gets both a screenshot and text from the window, including content beyond what’s visible onscreen. Appshots are available across plans on Mac, with enterprise access coming soon.

483

6K

486

2K

2M

0

1

0

199

かいせ @kai_ds04

27 days ago

あれこれってcliで使えないやつだったんだどうやって使うのかと悩んでたのに

OpenAI

@OpenAI

27 days ago

3️⃣ Goal mode is now available in the Codex app, IDE extension, and CLI. Goal mode makes Codex more hands-off, letting you set a goal that it can work towards for hours or even days.

22

457

25

90

72K

0

151

かいせ @kai_ds04

27 days ago

codex cloudを使ってスマホからcodex使ってたけど、これら最近のアップデートおかげでスマホから直接デスクトップ内のcodexをいじれるようになってアツい

OpenAI

@OpenAI

27 days ago

Highlights from today’s Codex Thursday launches: 1️⃣ Codex can now securely use apps on your Mac from your phone, even when your Mac is locked and the screen is off. https://t.co/JUOss3M2Va

OpenAI's tweet photo. Highlights from today’s Codex Thursday launches:

1️⃣ Codex can now securely use apps on your Mac from your phone, even when your Mac is locked and the screen is off.

https://t.co/JUOss3M2Va https://t.co/CAEUe1aswm

330

4K

399

1K

897K

0

2

0

267

かいせ

@kai_ds04

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users