Yukito Tajima

15 days ago

GPT-OSS-Swallow v0.1 の MXFP4 版を公開しました。 GPT-OSS-Swallow を、より少ないメモリで動かせるようにするための追加リリースです。これにより、これまで動作環境の制約で試しづらかった場合にも、利用しやすくなります。 https://t.co/qwy0nTUB2q

2

24

11

5

2K

TitaniumJely retweeted

3 months ago

GTC2026でSwallow LLMの開発に関するトークを行いました！お越しいただいた方、ありがとうございました。

0

102

11

6

5K

急拡大�� Vertical SaaS VPoE 株式会社SOUHL ← LayerX ← Recruit ← Merpay ← Kauche 東工大情報

3 months ago

Our work on Swallow LLM at Science Tokyo was featured in the keynote presentation at GTC 2026.

0

7

3

0

1K

Who to follow

kounosuke

@kounosukexxx

Frontend Engineer at Cybozu / WAIC (作業部会2) 名前の読み方は「めふも」アクセシビリティとか興味あります 🦋にいます

TitaniumJely retweeted

Naoaki Okazaki @chokkanorg

3 months ago

NVIDIA-Nemotron-3-Super-120B-A12BをSwallow LLM Leaderboardに掲載しました。日本語タスクではgpt-oss-120bよりも性能が高く、GPT-OSS Swallow 120Bに迫ります。特に学術・科学の知識が��富で、日本語能力はCPTで伸びそうです。NVIDIA様から事前アクセスを頂戴しました。 https://t.co/uI3REi97rh

chokkanorg's tweet photo. NVIDIA-Nemotron-3-Super-120B-A12BをSwallow LLM Leaderboardに掲載しました。日本語タスクではgpt-oss-120bよりも性能が高く、GPT-OSS Swallow 120Bに迫ります。特に学術・科学の知識が��富で、日本語能力はCPTで伸びそうです。NVIDIA様から事前アクセスを頂戴しました。 https://t.co/uI3REi97rh https://t.co/1U3ur65xrA

0

181

48

60

33K

TitaniumJely retweeted

Masaki Kawamura @Masakichi333210

4 months ago

Our paper "PowerCLIP: Powerset Alignment for Contrastive Pre-Training" has been accepted to @CVPR 2026! 🎉 See you in Denver!

4

95

17

24

26K

4 months ago

@AiXsatoshi 元モデルが NVFP4 ということもあり悩ましいところではあります。ロードマップには乗っていませんが、愚直に AWQ でよければ可能ですので検討してみます

1

2

0

119

Naoaki Okazaki @chokkanorg

4 months ago

Qwen3-Swallow と GPT-OSS-Swallow モデルを公開しました。今回は GPTQ/AWQ の 4bit 版も提供しておりますのでぜひお試しください。

4 months ago

📢 GPT-OSS Swallow と Qwen3 Swallow をリリースしました。継続事前学習＋SFT＋強化学習を全面刷新し、日本語性能と推論能力を両立させたオープンなLLMを、 Apache 2.0ライセンスで利用できます。 Qwen3 Swallow: https://t.co/tTRVGHnF4M GPT-OSS Swallow: https://t.co/L6a2zCjc7i

13

1K

341

741

238K

1

44

12

8

15K

TitaniumJely retweeted

4 months ago

We've officially released Qwen3-Swallow and GPT-OSS-Swallow! 🚀 It’s quite an emotional moment for me, as we’ve been working hard on these models since the summer of 2025. For this release, I was responsible for the continual pre-training (CPT), SFT, and training data refinement across all models. We successfully enhanced the Japanese language capabilities while fully preserving the strong math and coding performance of the base models. I'll also be giving a talk about this at NVIDIA GTC 2026 in San Jose, CA! See you there! #SwallowLLM #GTC2026

2

46

11

2

8K

TitaniumJely retweeted

Masanari Oi @stjohn2007

4 months ago

評価フレームワークの実装に携わりました。 (評価は@koshiro_sa110 をはじめとする評価チームがやってくれました🤞) 自分は最近RLをやってます😎

0

23

9

2

6K

4 months ago

@alfredplpl リリースに間に合わなかったのですが、需要がありそうですので GGUF 版も検討します。

2

20

3

1

2K

TitaniumJely retweeted

Koshiro Saito @koshiro_sa110

4 months ago

Swallow LLM Projectでは長年手付かずだった量子化回りを今回田島さんに担当いただきました。量子化モデルにより使いやすい形となっていますので、ぜひご利用ください！ https://t.co/4kiawQYdr4

0

31

11

7

6K

TitaniumJely retweeted

4 months ago

We are thrilled to announce the release of GPT-OSS Swallow and Qwen3 Swallow 🎉 I was involved in evaluation, framework development, and mentoring as a student leader. Leaderboard: https://t.co/CxhlRA2EIO Swallow-Evaluation-Instruct: https://t.co/OI75Q40ro8

0

20

8

0

7K

TitaniumJely retweeted

Taishi Nakamura

@taishinakamura_

4 months ago

Qwen3-Swallow と GPT-OSS-Swallow モデルを公開しました。 RL学習の担当をしました。強化学習の段階においても、日本語タスクの性能改善が見られています。

1

155

29

33

21K

TitaniumJely retweeted

Masaki Kawamura @Masakichi333210

4 months ago

Swallow Project(Swallow LLM)についてご存知の方はかなり狭い界隈に閉じており、もう少し広く使われて欲しいなと思っています… 加えて、LLM開発と研究の狭間のようなところの認知が広がれば良いなとも思っています。(ブログ執筆で少しでもSwallowの認知度を上げようとしていますが限界があり…)

0

64

8

2

13K

TitaniumJely retweeted

Daisuke Nohara @D_Nohara

4 months ago

New arXiv preprint! "On the Optimal Reasoning Length for RL-Trained Language Models" Two failure modes in RL-trained reasoning: long outputs increase dispersion, short outputs cause under-thinking. This tradeoff can be monotonic or non-monotonic depending on the model.

D_Nohara's tweet photo. New arXiv preprint! "On the Optimal Reasoning Length for RL-Trained Language Models"

Two failure modes in RL-trained reasoning: long outputs increase dispersion, short outputs cause under-thinking.
This tradeoff can be monotonic or non-monotonic depending on the model. https://t.co/dawLfuWZ13

2

52

8

29

9K

TitaniumJely retweeted

Taishi Nakamura

@taishinakamura_

4 months ago

Accepted as an ICLR 2026 Oral! 🎉 Interested in scaling MoE reasoning? Let's chat! https://t.co/BpXNv1ihpW See you in Brazil! 🇧🇷 #iclr #iclr2026

taishinakamura_'s tweet photo. Accepted as an ICLR 2026 Oral! 🎉
Interested in scaling MoE reasoning? Let's chat!
https://t.co/BpXNv1ihpW

See you in Brazil! 🇧🇷

#iclr #iclr2026 https://t.co/6IrV31o9Tv

3

157

19

43

21K

TitaniumJely retweeted

4 months ago

ICLRに投稿されているVLA論文のサーベイ記事を執筆しました。自身の勉強も含めて、VLAの課題点や流行を把握するためにまとめを作成しました。論文をジャンル別に整理し、��チベーションと提案手法の要点を短くまとめています。気になるところだけでもぜひ！ https://t.co/I4athytumb

3

297

64

163

20K

TitaniumJely retweeted