はち

@CurveWeb

IT企業勤務。犬とコーヒーが好き。 HuggingFace → → Synthetic data(合成データ), Agent Systemについて発言します

Joined March 2021

869 Following

2K Followers

4.1K Posts

Pinned Tweet

はち

@CurveWeb

over 1 year ago

OpenAI o1再現を目指し、LLMの推論能力を高めるライブラリを作成しました。 MCTSアルゴリズムを簡単にLLM（CoTデータ学習済）に統合して推論できるようにしてあります。また、Transformersとなるべく近い使い方になっているので比較的簡単に試せると思います。 https://t.co/8kt6g5b8I5

509

304

59K

はち

@CurveWeb

about 1 month ago

https://t.co/Vwz8b3CKlf

233

はち

@CurveWeb

about 1 month ago

XではSSA (Subquadratic Sparse Attention)と呼んでいるが、ブログだとSSA (Subquadratic Selective Attention)と呼んでいる箇所がある。単なる間違いでなければ、Deepseek Sparse Attention的な検索の仕組みにSelective Attentionの忘却の仕組みを組み合わせたりしているのかな。

Alexander Whedon

@alex_whedon

about 1 month ago

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

23K

19K

13M

はち

@CurveWeb

about 1 month ago

Even G2買ってみた

846

Who to follow

仲井康雄

@ynakaiR8

アプリ開発など、世の中の人が便利になるスタイルを作成してます。ちなみにラーメンが好きなのと、デザインセンスを磨いてます。

日乃本鏡花

@arayashiki04

政治批判ではなく、何が正しい情報か、マスゴミが報道しない情報を探していきます

まろぴよ🐤6/7ゲムオデ【A2】

@maropiyooo

鳥とゲームが大好き！趣味でゲームを作っています！

CurveWeb retweeted

Sakana AI

@SakanaAILabs

about 1 month ago

We’re excited to introduce KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI, accepted at #ICASSP2026! 🐢 Blog https://t.co/arVz1TGpJJ Paper https://t.co/0EwpyRXeCs Can a speech AI think deeply without pausing to process? In real conversation, we don’t wait until we’ve fully worked out what we want to say—we start talking, and our thoughts catch up as the sentence unfolds. Fast speech-to-speech models achieve this, but their reasoning tends to stay shallow. Cascaded pipelines that route through a knowledgeable LLM are smarter, but the added latency breaks the flow—they fall back to "think, then speak." In our new paper, we propose a way to break this trade-off. We call it KAME (Turtle in Japanese). A speech-to-speech model handles the fast response loop and starts replying immediately. In parallel, a backend LLM runs asynchronously, generating response candidates that are continuously injected as "oracle" signals in real time. This shifts the AI paradigm from "think, then speak" to "speak while thinking." The backend LLM is completely swappable. You can plug in GPT-4.1, Claude Opus, or Gemini 2.5 Flash depending on the task without changing the frontend. In our experiments, Claude tended to score higher on reasoning, while GPT did better on humanities questions. Try the model yourself here: https://t.co/uDA0nvvjhS

738

144

439

291K

CurveWeb retweeted

Kye Gomez (swarms)

@KyeGomezB

about 2 months ago

Introducing OpenMythos An open-source, first-principles theoretical reconstruction of Claude Mythos, implemented in PyTorch. The architecture instantiates a looped transformer with a Mixture-of-Experts (MoE) routing mechanism, enabling iterative depth via weight sharing and conditional computation across experts. My implementation explores the hypothesis that recursive application of a fixed parameterized block, coupled with sparse expert activation, can yield improved efficiency–performance tradeoffs and emergent multi-step reasoning. Learn more ⬇️🧵

KyeGomezB's tweet photo. Introducing OpenMythos

An open-source, first-principles theoretical reconstruction of Claude Mythos, implemented in PyTorch.

The architecture instantiates a looped transformer with a Mixture-of-Experts (MoE) routing mechanism, enabling iterative depth via weight sharing and conditional computation across experts.

My implementation explores the hypothesis that recursive application of a fixed parameterized block, coupled with sparse expert activation, can yield improved efficiency–performance tradeoffs and emergent multi-step reasoning.

Learn more ⬇️🧵

240

CurveWeb retweeted

Naoaki Okazaki @chokkanorg

4 months ago

📢 GPT-OSS Swallow と Qwen3 Swallow をリリースしました。継続事前学習＋SFT＋強化学習を全面刷新し、日本語性能と推論能力を両立させたオープンなLLMを、 Apache 2.0ライセンスで利用できます。 Qwen3 Swallow: https://t.co/tTRVGHnF4M GPT-OSS Swallow: https://t.co/L6a2zCjc7i

341

741

238K

CurveWeb retweeted

Aratako @Aratako_LM

4 months ago

コーデックからスクラッチで開発した新たな軽量TTSモデル「MioTTS」を公開しました！ 0.1B～2.6Bまで様々なサイズのモデルを公開しています！特に0.1Bは非常に小さいですが割とちゃんとした音声を合成できます。デモや推論コード、コーデックなども同時に公開しています。 https://t.co/m3mw6riaHA

590

129

410

96K

CurveWeb retweeted

ITmedia AI＋

@itm_aiplus

4 months ago

日本政府、AIの社会実装を妨げている規制の情報を募集　制度見直しの参考に https://t.co/NUJMazNRxj

928

528

680K

CurveWeb retweeted

Haitham Bou Ammar

@hbouammar

4 months ago

We found that much of LLM “reasoning” doesn’t come from RL training; it comes from how you sample the model. Building on power sampling (Karan & Du 2025), we show you can approximate global reasoning without MCMC, without training, and 10× faster. 🧠 Inference-time intelligence is real. 📝 Blog ↓ https://t.co/wVjPImCu8w

hbouammar's tweet photo. We found that much of LLM “reasoning” doesn’t come from RL training; it comes from how you sample the model.

Building on power sampling (Karan & Du 2025), we show you can approximate global reasoning without MCMC, without training, and 10× faster.

🧠 Inference-time intelligence is real.
📝 Blog ↓
https://t.co/wVjPImCu8w

667

669

66K

CurveWeb retweeted

Mason Daugherty

@masondrxy

4 months ago

https://t.co/mJW69Yx4yQ

309

703

92K

CurveWeb retweeted

布留川英一 / Hidekazu Furukawa

@npaka123

4 months ago

Gemini 3 Flash の新機能 Agentic Vision の概要｜npaka @npaka123 https://t.co/vh0f3gjol7

148

105

53K

CurveWeb retweeted

alphaXiv

@askalphaxiv

4 months ago

Learning to Discover at Test Time This paper TTT-Discover shows that by replacing best-of-N prompting with RL at test time on a continuous verifiable reward (via LoRA), it can learn from its own attempts and reliably push past the prior performance. The “learn-while-solving” loop during problem-solving is capable of improving GPT-OSS-120B's mathematical bounds, has it write faster GPU kernels, and top scores programming competitions "Assuming an average prompt length of 3000 tokens and 16000 sampling tokens on average, a training run with 50 steps and 512 rollouts costs around $500 on Tinker"

askalphaxiv's tweet photo. Learning to Discover at Test Time

This paper TTT-Discover shows that by replacing best-of-N prompting with RL at test time on a continuous verifiable reward (via LoRA), it can learn from its own attempts and reliably push past the prior performance.

The “learn-while-solving” loop during problem-solving is capable of improving GPT-OSS-120B's mathematical bounds, has it write faster GPU kernels, and top scores programming competitions

"Assuming an average prompt length of 3000 tokens and 16000 sampling tokens on average, a training run with 50 steps and 512 rollouts costs around $500 on Tinker"

252

122

12K

CurveWeb retweeted

isaac 🧩

@isaacbmiller1

5 months ago

The dspy.RLM module is now released 👀 Install DSPy 3.1.2 to try it. Usage is plug-and-play with your existing Signatures. A little example of it helping @lateinteraction and I figure out some scattered backlogs:

isaacbmiller1's tweet photo. The dspy.RLM module is now released 👀

Install DSPy 3.1.2 to try it. Usage is plug-and-play with your existing Signatures.

A little example of it helping @lateinteraction and I figure out some scattered backlogs: https://t.co/Avgx04sNJP

480

333

134K

CurveWeb retweeted

Anthropic

@AnthropicAI

5 months ago

New Anthropic Fellows research: the Assistant Axis. When you’re talking to a language model, you’re talking to a character the model is playing: the “Assistant.” Who exactly is this Assistant? And what happens when this persona wears off?

AnthropicAI's tweet photo. New Anthropic Fellows research: the Assistant Axis.

When you’re talking to a language model, you’re talking to a character the model is playing: the “Assistant.” Who exactly is this Assistant? And what happens when this persona wears off? https://t.co/hDNGZX0pCK

318

579

はち

@CurveWeb

5 months ago

なんかCluade調子悪いですね。 APIも含めて遅い。

505

CurveWeb retweeted

DailyPapers

@HuggingPapers

5 months ago

GlimpRouter A training-free framework that uses the entropy of a single token to route reasoning steps between small and large language models, reducing latency by 25.9% while boosting accuracy by 10.7% on AIME25.

HuggingPapers's tweet photo. GlimpRouter

A training-free framework that uses the entropy of a single token to route reasoning steps between small and large language models, reducing latency by 25.9% while boosting accuracy by 10.7% on AIME25. https://t.co/ggis7XKyHA

CurveWeb retweeted

Karan Dalal

@karansdalal

5 months ago

LLM memory is considered one of the hardest problems in AI. All we have today are endless hacks and workarounds. But the root solution has always been right in front of us. Next-token prediction is already an effective compressor. We don’t need a radical new architecture. The missing piece is to continue training the model at test-time, using context as training data. Our full release of End-to-End Test-Time Training (TTT-E2E) with @NVIDIAAI, @AsteraInstitute, and @StanfordAILab is now available. Blog: https://t.co/woCpiIrq0T Arxiv: https://t.co/3VkFlS3wx3 This has been over a year in the making with @arnuvtandon and an incredible team.

karansdalal's tweet photo. LLM memory is considered one of the hardest problems in AI.

All we have today are endless hacks and workarounds. But the root solution has always been right in front of us.

Next-token prediction is already an effective compressor. We don’t need a radical new architecture. The missing piece is to continue training the model at test-time, using context as training data.

Our full release of End-to-End Test-Time Training (TTT-E2E) with @NVIDIAAI, @AsteraInstitute, and @StanfordAILab is now available.

Blog: https://t.co/woCpiIrq0T
Arxiv: https://t.co/3VkFlS3wx3

This has been over a year in the making with @arnuvtandon and an incredible team.

321

574K

CurveWeb retweeted

Takuya Akiba

@iwiwi

5 months ago

論文公開しました！RoPE、実は学習を手助けしているだけで、最終的には要らないかも、って論文です。NoPE（位置埋め込みなし）でも実は位置を扱えること自体は有名かもと思うのですが、実際のところ最初からNoPEだと学習うまく行かないんですよね。途中でRoPEをdropする"DroPE"でいいとこ取りします。

585

234

85K

CurveWeb retweeted

ところてん

@tokoroten

5 months ago

先日のプログラミングシンポジウムでの発表資料を公開しました LLMで遺伝的アルゴリズムをやって、システムプロンプトを漏洩させるような敵対的プロンプトを自動生成します 10年ぶりくらいにセキュリティ業界に戻ってきたちゃんと実験してCSECに持っていきたいがー https://t.co/LIIlwOX1sJ

280

177

25K

CurveWeb retweeted

Sakana AI

@SakanaAILabs

5 months ago

Introducing DroPE: Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings https://t.co/brpejkosWR We are releasing a new method called DroPE to extend the context length of pretrained LLMs without the massive compute costs usually associated with long-context fine-tuning. The core insight of this work challenges a fundamental assumption in Transformer architecture. We discovered that explicit positional embeddings like RoPE are critical for training convergence but eventually become the primary bottleneck preventing models from generalizing to longer sequences. Our solution is radically simple: We treat positional embeddings as a temporary training scaffold rather than a permanent architectural necessity. Real-world workflows like reviewing massive code diffs or analyzing legal contracts require context windows that break standard pretrained models. While models without positional embeddings (NoPE) generalize better to these unseen lengths, they are notoriously unstable to train from scratch. Here, we achieve the best of both worlds by using embeddings to ensure stability during pretraining and then dropping them to unlock length extrapolation during inference. Our approach unlocks seamless zero-shot context extension without any expensive long-context training. We demonstrated this on a range of off-the-shelf open-source LLMs. In our tests, recalibrating any model with DroPE requires less than 1% of the original pretraining budget, yet it significantly outperforms established methods on challenging benchmarks like LongBench and RULER. We have released the code and the full paper to encourage the community to rethink the role of positional encodings in modern LLMs. Paper: https://t.co/Fp5IJS4LIC Code: https://t.co/Wvea7tQ5Ay

259

471K

はち

@CurveWeb

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users