S.KATO @s_kat0 - Twitter Profile

Pinned Tweet

2 days ago

今週の #JSAI2026 では、主著・共著で5件の発表があります。 1. Temporal Fusion Transformerによるデータ駆動型食後血糖値極値予測 2. LA-Bench 2025：実験指示から実行可能手順を生成するためのデータセット（主著、発表）

1

31

3

2

2K

s_kat0 retweeted

Haruka Ozaki (尾崎遼) @yuifu

about 11 hours ago

JSAI、明日6/10[水] 9:00からのOS-8「生成AIと研究自動化が拓く医歯薬学・生命科学DX」にて、LA-Bench 2025の発表があります！ [3F1-OS-8-03] LA-Bench 2025：実験指示から実行可能手順を生成するためのデータセット要旨 https://t.co/oXutZmk1w6 LA-Bench 2025公式サイト https://t.co/8lxrpCXtaY

0

26

7

6

2K

S.KATO @s_kat0

about 15 hours ago

明日朝9時半よりF会場で発表します！よろしくお願いします！

S.KATO @s_kat0

2 days ago

2. LA-Bench 2025：実験指示から実行可能手順を生成するためのデータセット LA-Bench 2025では、実験指示から実行可能な手順を生成するLLMの能力を評価するデータセットを構築しました。LLM-as-a-judgeの評価設計を比較し、複数設計の併用が重要と示しました。

1

3

2

1

2K

0

8

3

1

2K

S.KATO @s_kat0

about 17 hours ago

会場から駅まで、歩いて10分くらい。

0

2

0

241

Who to follow

Reina Akama

@reinaakama

Associate Professor at Tohoku University. FaiLab, Tohoku NLP (@tohoku_nlp). Associate Professor at NINJAL E3P. Visiting scientist at RIKEN AIP. NLP/ML/CL/CS.

Hiroki Ouchi

@blankeyelephant

准教授@奈良先端科学技術大学院大学 ■空間知能と言語知能を兼ね備えたAI https://t.co/vXXB03fbp9 ■科研費基盤(A)「歴史的な時空間計算基盤の構築と人文学研究の資料解釈支援への応用」(2026-2029)

Shohei Tanaka 🐿️

@shohei_ta_ds7

Doctor of Engineering. Researcher @Stockmark_japan.

S.KATO @s_kat0

about 17 hours ago

会場から駅までのバス満席（乗れず）

0

1

0

260

S.KATO @s_kat0

about 21 hours ago

【2C4-KS-22】生成AI・プレプリント時代における研究成果公開の再設計 ― トップカンファレンス文化はどこへ向かうのかを聴講中。企画背景と趣旨：https://t.co/eUO8LQE1tP

0

10

0

4

1K

s_kat0 retweeted

Kento Kawaharazuka / 河原塚健人 @KKawaharazuka

1 day ago

今日から #JSAI2026 に参加します！ 13:30-19:00まで, 「基盤モデル時代におけるPhysical AI」OSのオーガナイザーをやっております！いつでも話しかけてください！ 14:45-15:00にBBOに基づく最適ロボット設計の発表, 16:30-17:00に招待講演として最近のロボット基盤モデルをまとめます！来てね！

KKawaharazuka's tweet photo. 今日から #JSAI2026 に参加します！
13:30-19:00まで, 「基盤モデル時代におけるPhysical AI」OSのオーガナイザーをやっております！
いつでも話しかけてください！

14:45-15:00にBBOに基づく最適ロボット設計の発表,
16:30-17:00に招待講演として最近のロボット基盤モデルをまとめます！
来てね！ https://t.co/yjfnYCrlms

2

89

10

14K

S.KATO @s_kat0

about 23 hours ago

普段から使わせていただいているoverleafのラバーダック。

0

15

0

397

S.KATO @s_kat0

1 day ago

今日の朝はここに参加してます。

S.KATO @s_kat0

1 day ago

明日の午前に、昨年人工知能学会に支援いただいて開催したコンペティションであるLA-Bench 2025の報告があります！

0

7

3

0

3K

0

8

0

965

s_kat0 retweeted

Paavo

@PaavoParmas

1 day ago

New preprint📣 Typical reinforcement learning policy gradient algorithms target the mean reward E[R], while deployment often cares about other properties of the reward distribution: pass@k, max@k, tail risk like CVaR, robust metrics like medians, etc. We introduce OrderGrad, a method that can flexibly optimize any of these targets via a one line of code reward transformation. Everything else about your code, whether you use GRPO, PPO, REINFORCE can remain unchanged. Arxiv: https://t.co/aOdAMI5J8S Code: https://t.co/nDssyjRl5D 🥇🥈🥉OrderGrad is based on order-statistic estimation. Specifically, consider a batch of K sampled rewards and sort them: R_(1:K) < R_(2:K) < … < R_(K:K) Now apply weights a_i and take the expected value at each rank: Sum_i a_i * E[R_(i:K)] This allows flexibly defining different objectives that target different regions of the reward distribution. Notably, putting all of the weight on the top rank becomes Pass@K / Max@K, but our approach generalizes this to arbitrary ranks. You can target TopM@K, Medians, CVaR, Winsorized means, or any other weighting of your choosing. The order-statistics connect back to the original distribution in the sense that the j-th order-statistic corresponds roughly to the j/(K+1) quantile of the reward distribution (see the right figure). As K becomes large, the order-statistics converge to the CDF, so essentially, putting weights on the order-statistics is equivalent to weighting different regions of the reward distribution. Our main contribution is an unbiased gradient estimator for the weighted order-statistic objective when the batch size is N and the subset size for ranking K. Increasing K improves the CDF approximation, but also increases variance (a classical bias-variance tradeoff). We give an estimator in both REINFORCE policy gradient and in reparameterized backpropagation form. Computation time is negligible (<1ms). I still want to improve the preprint, so comments and suggestions are very welcome. The code is available so please try it out! 🙏 Many thanks to my collaborators: Paavo Parmas Yongmin Kim Kohsei Matsutani Shota Takashiro Soichiro Nishimori Takeshi Kojima Yusuke Iwasawa Yutaka Matsuo

PaavoParmas's tweet photo. New preprint📣
Typical reinforcement learning policy gradient algorithms target the mean reward E[R], while deployment often cares about other properties of the reward distribution: pass@k, max@k, tail risk like CVaR, robust metrics like medians, etc.
We introduce OrderGrad, a method that can flexibly optimize any of these targets via a one line of code reward transformation. Everything else about your code, whether you use GRPO, PPO, REINFORCE can remain unchanged.

Arxiv: https://t.co/aOdAMI5J8S
Code: https://t.co/nDssyjRl5D

🥇🥈🥉OrderGrad is based on order-statistic estimation. Specifically, consider a batch of K sampled rewards and sort them:
R_(1:K) < R_(2:K) < … < R_(K:K)
Now apply weights a_i and take the expected value at each rank:
Sum_i a_i * E[R_(i:K)]

This allows flexibly defining different objectives that target different regions of the reward distribution. Notably, putting all of the weight on the top rank becomes Pass@K / Max@K, but our approach generalizes this to arbitrary ranks. You can target TopM@K, Medians, CVaR, Winsorized means, or any other weighting of your choosing.

The order-statistics connect back to the original distribution in the sense that the j-th order-statistic corresponds roughly to the j/(K+1) quantile of the reward distribution (see the right figure). As K becomes large, the order-statistics converge to the CDF, so essentially, putting weights on the order-statistics is equivalent to weighting different regions of the reward distribution.

Our main contribution is an unbiased gradient estimator for the weighted order-statistic objective when the batch size is N and the subset size for ranking K. Increasing K improves the CDF approximation, but also increases variance (a classical bias-variance tradeoff). We give an estimator in both REINFORCE policy gradient and in reparameterized backpropagation form. Computation time is negligible (<1ms).

I still want to improve the preprint, so comments and suggestions are very welcome. The code is available so please try it out! 🙏

Many thanks to my collaborators:
Paavo Parmas
Yongmin Kim
Kohsei Matsutani
Shota Takashiro
Soichiro Nishimori
Takeshi Kojima
Yusuke Iwasawa
Yutaka Matsuo

1

18

7

4

2K

S.KATO @s_kat0

1 day ago

明日の午前に、昨年人工知能学会に支援いただいて開催したコンペティションであるLA-Bench 2025の報告があります！

u++ @upura0

1 day ago

9 日朝の #JSAI2026 で「KS-21 人工知能とコンペティション」を開催します！ https://t.co/WcRlVqjmXj 学会のコンペ開催支援制度で採択した「LA-Bench 2025: 実験手順生成AIコンペティション」の主催者や、「国際人工知能オリンピック」の入賞者らが登壇します。「JAPAN AI CUP」開催報告もあります。

0

30

5

6

4K

0

7

3

0

3K

s_kat0 retweeted

u++ @upura0

1 day ago

9 日朝の #JSAI2026 で「KS-21 人工知能とコンペティション」を開催します！ https://t.co/WcRlVqjmXj 学会のコンペ開催支援制度で採択した「LA-Bench 2025: 実験手順生成AIコンペティション」の主催者や、「国際人工知能オリンピック」の入賞者らが登壇します。「JAPAN AI CUP」開催報告もあります。

0

30

5

6

4K

s_kat0 retweeted

佐藤竜馬 / Ryoma Sato

@joisino_

1 day ago

拙著『検索システム』が予約開始しました🎉 転置インデックスから LLM まで、検索システムのすべてが詰まっております！検索のしくみを理解したい人から、RAG や LLM アプリの精度を高めたい人まで、幅広く役立つ内容になっています！ぜひお買い求めください！ Amazon: https://t.co/Y0fsgF47Pb