フクロウ@コーヒー底つき研究室

Verified account

@prof_LLM

医療系研究におけるAIの活用方法を考えています/コーヒー好き/学術誌エディター/LLMを用いた執筆トレーニングを模索中/医療機器開発/言語処理や音に興味あります/

Joined June 2025

569 Following

3.6K Followers

2.3K Posts

フクロウ@コーヒー底つき研究室

3 days ago

codexですが、5.6を使い倒そうと思ってリセットを2回残している人です🫠

0

4

0

0

1K

フクロウ@コーヒー底つき研究室

3 days ago

多分、5.６じゃなくなった。発表まで待つとします😌

0

1

0

0

679

フクロウ@コーヒー底つき研究室

5 days ago

アカウントを作って1年が経過しました😌1年でAIがここまで進歩するとは。AIに感謝！

0

6

0

0

449

フクロウ@コーヒー底つき研究室

5 days ago

音声入力が便利だから、日々静かな環境を求めて彷徨う

6 days ago

Bro it’s June 2026. Stop hand editing your prompts. Hold down the dictation button and ramble for 10 minutes. Give the model every fragment, caveat, example, and vibe in your head. It is literally a large language model. If it’s superhuman at anything, it’s reconstructing latent intent from language.

233

8K

513

3K

1M

0

3

0

1

932

フクロウ@コーヒー底つき研究室

5 days ago

Fugu使ってみたけど、ツール呼ぶとミス連発🥲今はまだいいかな。

0

0

0

0

424

フクロウ@コーヒー底つき研究室

5 days ago

輸出規制来ないでくれ〜

5 days ago

GPT-5.5-Cyber is our most capable cyber model yet, designed for advanced, authorized defensive work: tracing vulnerable code, validating issues, developing patches, and preparing evidence for human review.

OpenAI's tweet photo. GPT-5.5-Cyber is our most capable cyber model yet, designed for advanced, authorized defensive work: tracing vulnerable code, validating issues, developing patches, and preparing evidence for human review. https://t.co/KcDoGGD2tx

61

2K

189

372

932K

0

1

0

0

591

フクロウ@コーヒー底つき研究室

5 days ago

@cursorvers これは！！ありがとうございます♪

1

1

0

0

21

フクロウ@コーヒー底つき研究室

5 days ago

GLM凄いなー

5 days ago

We've kept hearing how GLM-5.2 beats Opus 4.8, and are skeptical of benchmarks - so we tested them on a real bug from the Cline repo. While both models fixed the issue, GLM was the winner in terms of cost and code quality: - GLM used twice as many tokens (GLM 1.1m vs Opus 660K) but cost half as much (GLM $0.41 vs Opus $0.81) - Opus finished quicker - 1.6 min and 12 tool calls vs GLM 4.7 min and 28 tool calls - GLM cleaned up dead code and verified the build compiled before completing. Opus didn't - it left type errors that passed tests but broke the production build. Both runs used the same Cline harness prompting and tools, so it seems GLM is RL trained to spend more tokens verifying its work before completing. Impressive work by the @Zai_org team!

cline's tweet photo. We've kept hearing how GLM-5.2 beats Opus 4.8, and are skeptical of benchmarks - so we tested them on a real bug from the Cline repo. While both models fixed the issue, GLM was the winner in terms of cost and code quality:

- GLM used twice as many tokens (GLM 1.1m vs Opus 660K) but cost half as much (GLM $0.41 vs Opus $0.81)

- Opus finished quicker - 1.6 min and 12 tool calls vs GLM 4.7 min and 28 tool calls

- GLM cleaned up dead code and verified the build compiled before completing. Opus didn't - it left type errors that passed tests but broke the production build.

Both runs used the same Cline harness prompting and tools, so it seems GLM is RL trained to spend more tokens verifying its work before completing. Impressive work by the @Zai_org team!

221

8K

612

2K

873K

1

4

0

0

1K

フクロウ@コーヒー底つき研究室

5 days ago

@cursorvers 本当にコスト高ですね🥲OpenCode Goから触ろうとしたところだったので、驚きました笑

1

1

0

0

44

フクロウ@コーヒー底つき研究室

5 days ago

採択結果は全てAIが吸収しました🥸

0

6

0

0

777

フクロウ@コーヒー底つき研究室

6 days ago

SPReAD全敗でした！みんなよく頑張りました！

0

6

0

0

938

フクロウ@コーヒー底つき研究室

6 days ago

@Shuntarooo3 私もです！ありがとうございました！次に進みたいと思います。

0

1

0

0

103

フクロウ@コーヒー底つき研究室

6 days ago

SPReADまだかよ。集中できないから早くしてくれ🥲

1

6

0

1

5K

フクロウ@コーヒー底つき研究室

6 days ago

@Shuntarooo3 以下みたいです！ https://t.co/eZZrQcsZ4G

1

1

0

0

227

フクロウ@コーヒー底つき研究室

6 days ago

想定より、随分少ない🫠 https://t.co/TvUvcEioml

0

5

0

0

1K

フクロウ@コーヒー底つき研究室

6 days ago

@Shuntarooo3 とんでもございません��友人から転送してもらったものをコピペしたので、もし違っていたらすみません🙇🏻‍♂️Xで採択者の報告があったら確認しようと思います笑

0

1

0

0

315

フクロウ@コーヒー底つき研究室

6 days ago

960が5.6Proみたいです😞

0

2

0

1

1K

フクロウ@コーヒー底つき研究室

6 days ago

ぐぬぬ。まだ5.5Proだった。

prof_LLM's tweet photo. ぐぬぬ。まだ5.5Proだった。 https://t.co/rjatHmD5at

1

6

0

1

2K

Last Seen Users on Sotwe

Trends for you

Most Popular Users