©️fatfreefly

@fatfreefly

Stranger in a Strange Land

Amphoe Bang Pa-in

Joined March 2007

1.3K Following

94 Followers

5.1K Posts

fatfreefly retweeted

Tencent Hy

@TencentHunyuan

about 16 hours ago

1、Most RL stacks are built for one modality. UniRL applies a single post-training loop — generate → score → advantage → update → sync — across model families. Model and algorithm are two independent axes, so your coverage is the model × algorithm product, not a fixed recipe menu. 2、One loop, every modality: text→image, text/image→video, vision-language, text-only LLM and VLM, the LLM→diffusion prompt-enhancer, and unified autoregressive+diffusion generation (Hunyuan-Image 3 and Bagel) — a model class no single-purpose RL repo can even express. 3、Built to scale: pluggable rollout engines (train-side / SGLang / vLLM-Omni) behind one typed contract, FSDP2 sharding, and three deployment modes from a single config knob. 4、Two team-original algorithms headline the release: FlowDPPO: Policy optimization for flow/diffusion models with trust-region masks based on exact divergence (See our paper: Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models https://t.co/7TPFT72RDa) DRPO: LLM RL with a smooth, advantage-weighted quadratic regularizer (See our paper: Rethinking the Divergence Regularization in LLM RL [https://t.co/eXTnK1sCts])

fatfreefly retweeted

Xudong Han

@Xudong07452910

1 day ago

这可能是人类写给 AI 看的最后一篇论文了。最近刷到Stanford、CMU、Michigan 等 37 位作者联名的论文：《The Last Human-Written Paper》。核心观点很狠：沿用几百年的论文，在 AI 时代可能已经过时了。作者点出了两个被我们忽视已久的“隐形税”：一个是叙事税。为了讲一个漂亮故事，我们把失败实验、死路、被推翻的假设都删掉了。AI 读到的是“通关攻略”，却看不到真正有价值的“踩坑记录”。另一个是工程税。论文里的实现细节通常足够说服审稿人，但不够让 Agent 直接复现。很多关键 tricks 还藏在作者脑子、代码注释和 Slack 记录里。所以作者提出 ARA，直接把论文改造成 Agent 能读取和执行的“研究包”：不只告诉你结论，还把怎么想到的、代码怎么跑、证据链在哪、哪些路走不通都打包进去。我觉得这篇最有意思的地方是，它不是在讨论 AI 怎么帮人写论文，而是在问：当 AI 也变成论文读者和执行者时，论文还应该长成今天这样吗？未来科研输出的核心，可能不再是“写得多像一篇 paper”，而是能不能被 AI 理解、复现、追踪和继续扩展。人类写论文写了几百年，接下来可能要开始写给 Agent 执行的研究包了。 https://t.co/2wwkWV9Ox4

Xudong07452910's tweet photo. 这可能是人类写给 AI 看的最后一篇论文了。

最近刷到Stanford、CMU、Michigan 等 37 位作者联名的论文：《The Last Human-Written Paper》。

核心观点很狠：沿用几百年的论文，在 AI 时代可能已经过时了。

作者点出了两个被我们忽视已久的“隐形税”：

一个是叙事税。为了讲一个漂亮故事，我们把失败实验、死路、被推翻的假设都删掉了。AI 读到的是“通关攻略”，却看不到真正有价值的“踩坑记录”。

另一个是工程税。论文里的实现细节通常足够说服审稿人，但不够让 Agent 直接复现。很多关键 tricks 还藏在作者脑子、代码注释和 Slack 记录里。

所以作者提出 ARA，直接把论文改造成 Agent 能读取和执行的“研究包”：不只告诉你结论，还把怎么想到的、代码怎么跑、证据链在哪、哪些路走不通都打包进去。

我觉得这篇最有意思的地方是，它不是在讨论 AI 怎么帮人写论文，而是在问：

当 AI 也变成论文读者和执行者时，论文还应该长成今天这样吗？

未来科研输出的核心，可能不再是“写得多像一篇 paper”，而是能不能被 AI 理解、复现、追踪和继续扩展。

人类写论文写了几百年，接下来可能要开始写给 Agent 执行的研究包了。

https://t.co/2wwkWV9Ox4

128

405

187K

fatfreefly retweeted

Data Science Dojo

@DataScienceDojo

1 day ago

🚨 Peter Steinberger, founder of Openclaw, says you shouldn't be prompting coding agents anymore. Boris Cherny says he doesn't prompt Claude anymore. Instead, they both write loops. Since then, the AI community has been asking the same question: What exactly is a loop? To help answer that, Matt Van Horn (co-founder of Zimride, which later became Lyft) shared a framework outlining the evolution of loops in Agentic AI—and how we've arrived at this new abstraction layer. 🔹 2022: ReAct Loop 🔹 2023: Self-Prompting Agents 🔹 2025: The Ralph Loop 🔹 2026: Productized Ralph 🔹 and the future: Multi-Agent Orchestration. In multi-agent orchestration, loops become the primary unit of work—spawning, coordinating, and supervising other loops and agents. The common theme across Peter's and Boris's comments is that the focus is shifting away from the prompt itself. Whether you agree with this framework or not, it's a useful way to understand why some of the most experienced AI builders are talking less about prompt engineering and more about designing systems around loops. Do you think loops are the next major abstraction for Agentic AI, or is this simply iterative prompting with a new name? #agenticai #ailoops #claude #openclaw #aiengineering

DataScienceDojo's tweet photo. 🚨 Peter Steinberger, founder of Openclaw, says you shouldn't be prompting coding agents anymore.

Boris Cherny says he doesn't prompt Claude anymore. Instead, they both write loops.

Since then, the AI community has been asking the same question:
What exactly is a loop?

To help answer that, Matt Van Horn (co-founder of Zimride, which later became Lyft) shared a framework outlining the evolution of loops in Agentic AI—and how we've arrived at this new abstraction layer.

🔹 2022: ReAct Loop
🔹 2023: Self-Prompting Agents
🔹 2025: The Ralph Loop
🔹 2026: Productized Ralph
🔹 and the future: Multi-Agent Orchestration.

In multi-agent orchestration, loops become the primary unit of work—spawning, coordinating, and supervising other loops and agents.

The common theme across Peter's and Boris's comments is that the focus is shifting away from the prompt itself.

Whether you agree with this framework or not, it's a useful way to understand why some of the most experienced AI builders are talking less about prompt engineering and more about designing systems around loops.

Do you think loops are the next major abstraction for Agentic AI, or is this simply iterative prompting with a new name?

#agenticai #ailoops #claude #openclaw #aiengineering

fatfreefly retweeted

Probability and Statistics

@probnstat

1 day ago

Kernel Mean Embeddings are a powerful framework that represents probability distributions as elements of a reproducing kernel Hilbert space (RKHS). Instead of working directly with probability densities, a distribution P is mapped to a feature representation μₚ = E[k(X, ·)] where k is a kernel function. This allows complex distributions to be analyzed using geometric and functional-analytic tools. In probability and statistics, kernel mean embeddings provide nonparametric methods for comparing distributions, hypothesis testing, density estimation, and causal inference. They form the basis of powerful techniques such as Maximum Mean Discrepancy (MMD), which is widely used for two-sample testing. In machine learning, kernel mean embeddings enable learning directly on distributions rather than individual data points. They are used in domain adaptation, generative modeling, distribution regression, and uncertainty quantification. In deep learning, MMD and related kernel methods appear in generative adversarial learning, representation learning, and self-supervised learning. In reinforcement learning, kernel embeddings help model transition dynamics, value functions, and belief states in partially observed environments. The deeper insight is that many learning problems involve distributions rather than individual observations. Kernel mean embeddings provide a mathematically elegant way to transform probability distributions into geometric objects that can be manipulated, compared, and learned efficiently. Image: https://t.co/8OivktctlP

probnstat's tweet photo. Kernel Mean Embeddings are a powerful framework that represents probability distributions as elements of a reproducing kernel Hilbert space (RKHS). Instead of working directly with probability densities, a distribution P is mapped to a feature representation

μₚ = E[k(X, ·)]

where k is a kernel function. This allows complex distributions to be analyzed using geometric and functional-analytic tools.

In probability and statistics, kernel mean embeddings provide nonparametric methods for comparing distributions, hypothesis testing, density estimation, and causal inference. They form the basis of powerful techniques such as Maximum Mean Discrepancy (MMD), which is widely used for two-sample testing.

In machine learning, kernel mean embeddings enable learning directly on distributions rather than individual data points. They are used in domain adaptation, generative modeling, distribution regression, and uncertainty quantification. In deep learning, MMD and related kernel methods appear in generative adversarial learning, representation learning, and self-supervised learning. In reinforcement learning, kernel embeddings help model transition dynamics, value functions, and belief states in partially observed environments.

The deeper insight is that many learning problems involve distributions rather than individual observations. Kernel mean embeddings provide a mathematically elegant way to transform probability distributions into geometric objects that can be manipulated, compared, and learned efficiently.

Image: https://t.co/8OivktctlP

232

144

Who to follow

🤲وَتُعِزُّ مَن تَشَاءُ وَتُذِلْ مَن تَشَاءُ🔥. کیا ہم کوئی غلام ہیں 🚫 جو آپ کہیں ہم وہ کر لیں🔥 😡 Absolutely not 🚫🔥💯😡.

Joey

@joeprecis980206

Still figuring things out.

fatfreefly retweeted

恒星

@vintcessun

1 day ago

大模型推理时会把失败尝试当真理，因为无法区分“临时草稿”和“最终结论”。这对Agent系统极其致命——一旦幻觉被写进记忆，后续流程全崩。这篇论文用反事实擦除强化学习（CERL）硬性要求模型：同一前缀下，擦除隐藏思维后仍能答对才能得奖。模型被迫学会“忘掉”不该记住的中间步骤，只依赖持久状态做决策。实验在数学、工具调用等多轮任务上，准确率不降，依赖思维的比例显著下降。不过极复杂长链推理仍需验证。 https://t.co/GN0bEGOQik

265

fatfreefly retweeted

Rohan Paul

@rohanpaul_ai

1 day ago

This paper proposes a new test to see whether AI agents truly get better as they gain experience and finds they mostly still confuse memory with learning. Shows that simple full-context learning beats the more specialized memory systems, with Claude Sonnet 4.6 using plain context getting the best overall score. That distinction matters because the next wave of AI is not supposed to answer isolated prompts. It is supposed to live inside codebases, databases, markets, sensors, clinics, and workflows where yesterday’s mistake should make tomorrow’s action sharper. The authors build CL-BENCH, a benchmark where an agent works through connected tasks in 6 domains, including coding, databases, forecasting, radio signals, poker, and disease studies. Each task hides a pattern the agent can learn over time, like a database layout, a codebase structure, or an opponent’s strategy, so better performance should come from experience rather than pretraining. They test frontier LLM systems with simple full-context memory, scratchpad notes, retrieval memory, playbook-style memory, and coding-agent setups. The key finding is that current memory-heavy AI agents are not reliably better learners than just keeping the full conversation in context. That means long-running AI agents still need better ways to remember useful lessons, forget stale ones, and adapt when the environment changes. ---- Link – arxiv. org/abs/2606.05661 Title: "Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments"

rohanpaul_ai's tweet photo. This paper proposes a new test to see whether AI agents truly get better as they gain experience and finds they mostly still confuse memory with learning.

Shows that simple full-context learning beats the more specialized memory systems, with Claude Sonnet 4.6 using plain context getting the best overall score.

That distinction matters because the next wave of AI is not supposed to answer isolated prompts.

It is supposed to live inside codebases, databases, markets, sensors, clinics, and workflows where yesterday’s mistake should make tomorrow’s action sharper.

The authors build CL-BENCH, a benchmark where an agent works through connected tasks in 6 domains, including coding, databases, forecasting, radio signals, poker, and disease studies.

Each task hides a pattern the agent can learn over time, like a database layout, a codebase structure, or an opponent’s strategy, so better performance should come from experience rather than pretraining.

They test frontier LLM systems with simple full-context memory, scratchpad notes, retrieval memory, playbook-style memory, and coding-agent setups.

The key finding is that current memory-heavy AI agents are not reliably better learners than just keeping the full conversation in context.

That means long-running AI agents still need better ways to remember useful lessons, forget stale ones, and adapt when the environment changes.

----

Link – arxiv. org/abs/2606.05661

Title: "Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments"

fatfreefly retweeted

alphaXiv

@askalphaxiv

2 days ago

"OPRD: On-Policy Representation Distillation" On-policy distillation usually matches teacher and student only at the token probability level, throwing away the teacher’s hidden states. This paper moves the loss before the LM head, aligning student and teacher representations on the student’s own rollouts. This gives less noisy training, richer supervision, near teacher-level math performance. With 1.44x faster training, and up to 54% lower update memory.

askalphaxiv's tweet photo. "OPRD: On-Policy Representation Distillation"

On-policy distillation usually matches teacher and student only at the token probability level, throwing away the teacher’s hidden states.

This paper moves the loss before the LM head, aligning student and teacher representations on the student’s own rollouts.

This gives less noisy training, richer supervision, near teacher-level math performance. With 1.44x faster training, and up to 54% lower update memory.

196

149

18K

fatfreefly retweeted

Turing Post

@TheTuringPost

1 day ago

AutoScientists – a research lab made of agents @Harvard researchers connected agents into a self-organizing scientific team without a boss agent standing in the middle All agents look at the same shared workspace: they share memory, explore multiple directions in parallel, critique each other, avoid repeated failures, and reorganize as evidence changes. But the teams are not fixed. Agents can gather around a promising direction, like architecture, optimizer changes, or data augmentation, then abandon it if it stops working. Before they spend compute, they discuss proposals and critique each other. AutoScientists also shows strong results: - 74.4% mean leaderboard percentile on BioML-Bench - 1.9× faster GPT training optimization - +12.5% on ACE2–Spike, with the same method transferring to 217 ProteinGym assays for a +6.5% average gain

179

155

18K

fatfreefly retweeted

elvis

@omarsar0

4 days ago

// Continual Learning Bench // One of the research areas with lots of investments is continual learning. While there are many efforts, there is very little progress in measuring it. So the big question is, do dedicated memory systems actually make agents learn from experience? Continual Learning Bench says not yet. Across six expert-validated domains with shared learnable structure, naive in-context learning outperforms systems purpose-built for memory management. CL-Bench introduces a gain metric that isolates genuine learning from prior capability, then shows agents frequently overfit to immediate observations or fail to reuse knowledge across instances. If a plain ICL baseline beats your memory architecture, the architecture is adding overhead rather than learning. Paper: https://t.co/iFd5SZFe3O Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

omarsar0's tweet photo. // Continual Learning Bench //

One of the research areas with lots of investments is continual learning.

While there are many efforts, there is very little progress in measuring it.

So the big question is, do dedicated memory systems actually make agents learn from experience?

Continual Learning Bench says not yet. Across six expert-validated domains with shared learnable structure, naive in-context learning outperforms systems purpose-built for memory management.

CL-Bench introduces a gain metric that isolates genuine learning from prior capability, then shows agents frequently overfit to immediate observations or fail to reuse knowledge across instances.

If a plain ICL baseline beats your memory architecture, the architecture is adding overhead rather than learning.

Paper: https://t.co/iFd5SZFe3O

Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

352

290

24K

fatfreefly retweeted

恒星

@vintcessun

5 days ago

终于有人把LLM的推理黑箱拆开了：只看最终答案和token数根本不知道模型是在真推理还是瞎绕路，准确率相同的模型推理结构可能天差地别，这才是评估的盲区。论文从根源下手——构建逻辑谜题基准LRM，把原始推理链条解析成有向无环图，节点是论断，边是依赖关系，再定义推理效率指标量化逻辑流的集中程度。分析DeepSeek-R1等模型发现：同等准确率下推理图复杂度竟能相差数倍，哪个更可靠一目了然。 https://t.co/oD47nDoF3F

417

487

24K

fatfreefly retweeted

Turing Post

@TheTuringPost

5 days ago

Must-read research of the week ▪️ ScientistOne ▪️ SkillOpt ▪️ MUSE-Autoskill ▪️ Do Language Models Need Sleep? ▪️ OmniRetrieval ▪️ Vector Policy Optimization ▪️ Gamma-World ▪️ OpenComputer ▪️ Personalize-then-Store ▪️ WorldKV ▪️ AutoResearchClaw ▪️ Qwen-VLA ▪️ CUA-Gym Find the full list and the most important AI news of the week here: https://t.co/UC1IeiRL8y

TheTuringPost's tweet photo. Must-read research of the week

▪️ ScientistOne
▪️ SkillOpt
▪️ MUSE-Autoskill
▪️ Do Language Models Need Sleep?
▪️ OmniRetrieval
▪️ Vector Policy Optimization
▪️ Gamma-World
▪️ OpenComputer
▪️ Personalize-then-Store
▪️ WorldKV
▪️ AutoResearchClaw
▪️ Qwen-VLA
▪️ CUA-Gym

Find the full list and the most important AI news of the week here: https://t.co/UC1IeiRL8y

169

138

11K

fatfreefly retweeted

Fengzhuo Zhang

@FengzhuoZhang

5 days ago

Why do DeepSeek and Kimi use Muon instead of Adam? 🚀 Reasons from a curvature perspective: 1⃣ Under a second-order approx., Muon incurs a much smaller curvature penalty than Adam while maintaining the same first-order decrease. 2⃣ This advantage does not come from a smaller update norm. Instead, it comes from Muon having lower Normalized Directional Sharpness (NDS). 3⃣ Muon’s NDS advantage becomes larger when the training data is more imbalanced. Paper Link: https://t.co/nGy2FeF538 A thread 🧵

FengzhuoZhang's tweet photo. Why do DeepSeek and Kimi use Muon instead of Adam?

🚀 Reasons from a curvature perspective:

1⃣ Under a second-order approx., Muon incurs a much smaller curvature penalty than Adam while maintaining the same first-order decrease.

2⃣ This advantage does not come from a smaller update norm. Instead, it comes from Muon having lower Normalized Directional Sharpness (NDS).

3⃣ Muon’s NDS advantage becomes larger when the training data is more imbalanced.

Paper Link: https://t.co/nGy2FeF538

A thread 🧵

128

158K

fatfreefly retweeted

fly51fly @fly51fly

4 days ago

[CL] Latent Reasoning with Normalizing Flows G Tu, X Fu, S Yu, Y Tang… [University of Pennsylvania] (2026) https://t.co/6Rrslt8iX5

fatfreefly retweeted

Nathan Lambert

@natolambert

6 days ago

We have another 65 page frontier model report from Nvidia to read @eliebakouch @stochasticchasm and gang

688

388

54K

fatfreefly retweeted

Rohan Paul

@rohanpaul_ai

4 days ago

Anthropic’s new chemistry report has a genuinely wild result. Claude Opus 4.7 is now competitive with dedicated NMR software, and the bigger story is that it can work the problem backwards, i.e. infer the molecule from the spectrum.” NMR software is the chemist’s expert tool for turning molecular structures into predicted lab spectra. So Opus 4.7 is no longer just “helping chemists read data” — it can work backward from NMR data and propose the molecule’s structure, a task the report says existing mainstream tools generally leave to human chemists. Note, that Opus 4.7, a general-purpose model with no chemistry-specific fine-tuning. Claude Opus 4.7 made the smallest hydrogen prediction errors and nearly matched MestReNova on carbon, meaning it can predict NMR signals about as well as specialist chemistry tools. So AI now handle one of chemistry’s hidden bottlenecks: translating between a molecule, its spectral shadow, and the structure a chemist actually needs to trust.

rohanpaul_ai's tweet photo. Anthropic’s new chemistry report has a genuinely wild result.

Claude Opus 4.7 is now competitive with dedicated NMR software, and the bigger story is that it can work the problem backwards, i.e. infer the molecule from the spectrum.”

NMR software is the chemist’s expert tool for turning molecular structures into predicted lab spectra.

So Opus 4.7 is no longer just “helping chemists read data” — it can work backward from NMR data and propose the molecule’s structure, a task the report says existing mainstream tools generally leave to human chemists.

Note, that Opus 4.7, a general-purpose model with no chemistry-specific fine-tuning.

Claude Opus 4.7 made the smallest hydrogen prediction errors and nearly matched MestReNova on carbon, meaning it can predict NMR signals about as well as specialist chemistry tools.

So AI now handle one of chemistry’s hidden bottlenecks: translating between a molecule, its spectral shadow, and the structure a chemist actually needs to trust.

248

131

25K

fatfreefly retweeted

Ahmad

@TheAhmadOsman

5 days ago

Everything You Need To Know About Inference Engines and Running LLMs Locally at Home Explains why Inference Engines exist in the first place - Prefill is not Decode - VRAM is not bandwidth - Fit is not speed - KV Cache is the real memory problem - Quantization only matters if the engine has good kernels for it - Batching is not scheduling - MoE and the routing problem - How long context changes the serving problem - Multi-GPU changes the interconnect problem - Production: latency, p99s, backpressure, routing, metrics, and failure behavior Then maps the Engines including: - llama.cpp → portability king - MLX / MLX-LM → Apple Silicon weapon - ExLlamaV3 → multi-GPU consumer CUDA / local MoE - vLLM → default open-source production server - SGLang → long-context, MoE, routing, ugly workloads - TensorRT-LLM → max NVIDIA performance - NVIDIA Dynamo → fleet orchestration The point of this article is not “use vLLM” or “use TensorRT-LLM” or “use llama.cpp” But rather fully grasp how the Inference Engines are the traffic cop, memory manager, kernel dispatcher, scheduler, cache accountant, parallelism planner, API surface, and sometimes the deployment framework Do not pick the engine first - Pick the hardware - Pick the workload - Pick the serving model Then the engine becomes obvious Opensource / Local AI FTW

405

462

31K

fatfreefly retweeted

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

4 days ago

ThoughtFold has a clean RLVR angle: correct long CoTs contain both useful reasoning and redundant exploration, but outcome rewards reinforce all of it. Instead of just rewarding shorter answers, it prunes correct chains, verifies what can be removed, then uses masked preference learning to penalize redundant steps and keep the reasoning path tighter. Paper: https://t.co/vFIVDPR9bd

gm8xx8's tweet photo. ThoughtFold has a clean RLVR angle: correct long CoTs contain both useful reasoning and redundant exploration, but outcome rewards reinforce all of it. Instead of just rewarding shorter answers, it prunes correct chains, verifies what can be removed, then uses masked preference learning to penalize redundant steps and keep the reasoning path tighter.

Paper: https://t.co/vFIVDPR9bd

fatfreefly retweeted

Matt Dancho (Business Science)

@mdancho84

5 days ago

This 277-page PDF unlocks the secrets of Large Language Models. Here's what's inside: 🧵

789

189

917

35K

fatfreefly retweeted

Rosinality @rosinality

5 days ago

https://t.co/ZX2LNuvAHJ Preconditioning the weights.

104

fatfreefly retweeted

Matt Dancho (Business Science)

@mdancho84

7 days ago

This is huge. A group of 50 AI researchers (ByteDance, Alibaba, Tencent + universities) just dropped a 303 page field guide on code models + coding agents. And the takeaways are not what most people assume. Here are the highlights I’m thinking about (as someone who lives in Python + agents):

mdancho84's tweet photo. This is huge.

A group of 50 AI researchers (ByteDance, Alibaba, Tencent + universities) just dropped a 303 page field guide on code models + coding agents.

And the takeaways are not what most people assume.

Here are the highlights I’m thinking about (as someone who lives in Python + agents):

358

555

23K

©️fatfreefly

@fatfreefly

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users