Introducing LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M: two multilingual retrieval models built for ultra-fast and accurate search across 11 languages.
> End-to-end retrieval latency as low as 1.5ms with our enterprise stack! 🚀
> Consistently best-in-class multilingual and cross-lingual performance across Arabic, German, English, Spanish, French, Italian, Japanese, Korean, Norwegian, Portuguese, and Swedish.
🧵
Introducing the Open Knowledge Format (OKF), an open specification that formalizes the LLM-wiki pattern into a portable, interoperable format.
AI is only as smart as the context we give it. As we build more advanced, agentic AI systems, they need accurate metadata and context to be useful. But in most organizations, that context is locked inside fragmented data catalogs, isolated wikis, scattered code comments, or the minds of senior engineers. Every time a new AI agent is built, teams are forced to solve the exact same context-assembly problem from scratch.
To solve this, we've announced OKF, a vendor-neutral, open specification that formalizes the "LLM-wiki pattern" into a portable, interoperable format. It provides a standardized way to represent the enterprise knowledge that modern AI systems rely on.
— Just markdown: readable in any editor, renderable on GitHub, indexable by any search tool
— Just files: shippable as a tarball, hostable in any git repo, mountable on any filesystem
— Just YAML frontmatter: for the small set of structured fields that need to be queryable: type, title, description, resource, tags, and timestamp
We’ve also shipped reference implementations to help you hit the ground running, including an enrichment agent for BigQuery, a static HTML visualizer, and live sample bundles on @github → https://t.co/ilhAMCrcTc
➕ Knowledge Catalog can now natively ingest OKF!
Stop reinventing data models and building bespoke integrations for every new AI tool. Here's more about how OKF works → https://t.co/FR4kJRsgEH
Ling & Ring 2.6 technical report is out, with two open-weight base models.
We co-design model + system across architecture, training, and agentic capability:
• 7:1 hybrid linear attention
• KPop for stable agentic RL: SWE-bench Verified 76.28%
• ~4× token efficiency
We’re sharing new research on a method for anticipating how models may behave in real-world use before release: simulating deployment with recent, de-identified user requests and studying candidate model responses. https://t.co/7RJzBfNniQ
Let’s talk about evals.
We’re always looking for better ways to measure and forecast model progress, especially as benchmarks get saturated or gamed.
@tejalpatwardhan, who leads our frontier evals team, spoke to @andrewmayne about why evals matter and what models need to be judged on next.
📣 Introducing the Qwen-Robot Suite — Qwen-RobotNav, Qwen-RobotManip, Qwen-RobotWorld, three foundation models, a full stack for embodied intelligence.
🧭 Qwen-RobotNav — the gateway to mobility.
• Unifies 5 navigation tasks in one model: instruction following, point-goal, object-goal, target tracking, autonomous driving
• Controllable observation protocol
• Tool interface for agentic systems
🤖 Qwen-RobotManip — the foundation of interaction.
• Unified state-action space across heterogeneous robots
• Camera-frame delta poses for coherent cross-embodiment training
• Pretrained on a 38,100+ hour open-source corpus
🌍 Qwen-RobotWorld — infinite worlds for physical agents.
• Single world model, 20+ embodiments
• Natural-language action interface
• Predicts physically grounded futures across manipulation, driving, and navigation
Each model is independently useful, and could be composed as physical-world tools.Together, they form the low-level toolkit for general-purpose agentic systems that don't just see the world, but act in it.
📷 Blog:
https://t.co/ytLcbYET26
📖 Report:
Qwen-RobotNav: https://t.co/uPmSwDYGxg
Qwen-RobotManip: https://t.co/GeyIzJSpU8
Qwen-RobotWorld: https://t.co/SXPH1qzDFy
Here is the technical report on SubQ 1.1 Small.
https://t.co/bu8AEc4lsk
This is the second iteration on our Subquadratic Sparse Attention (SSA) model, and the first to be deployed with design partners in the coming weeks.
The results are compelling and verified by @AppenResearch.
- Near-perfect long-context retrieval up to 12M tokens on the needle-in-a-haystack test, with up to nearly 1,000x attention compute reduction.
- A balance of long-context optimization and general reasoning ability, with strong performance retained across knowledge, coding, and non-coding enterprise agent benchmarks.
- At 1M tokens, SubQ 1.1 Small requires 64.5x less compute than dense attention and runs 56x faster than FlashAttention-2.
These results highlight a significant scaling advantage thanks to the efficiency gains from the SSA architecture.
We included some details and learnings from the development process which may be helpful to the community.
Comment with questions, I’ll try to respond!
⭐ VibeThinker-3B is released — a dense 3B model for frontier-level verifiable reasoning.
🚀 Reasoning: 94.3 on AIME’26, 76.4 on IMO-AnsBench, and 80.2 Pass@1 on LCB v6; with CLR, AIME‘26 improves to 97.1 and IMO-AnsBench to 80.6.
💻 OOD Coding: On recent unseen LeetCode weekly contests, VibeThinker-3B passes 123/128 (96.1%) first-attempt Python submissions.
⚡ Efficiency: Only 3B parameters, yet reaching the performance range of much larger top-tier reasoning models.
🧠 Perspective: Small models are not just cheaper substitutes. In parameter-dense domains with clear verification signals, SLMs offer a path to frontier-level reasoning that complements traditional Scaling Law.
Model : https://t.co/94A14zpqCV
Github: https://t.co/32so5P6C7L
Paper: https://t.co/UDd264RsZb
#AI #LLM #Reasoning #OpenSource #SmallModel
Introducing GLM-5.2: Frontier Intelligence, Open Weights
- Significant improvements in coding and agentic tasks
- Strong long-horizon capabilities with a 1M context window
- Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency
- MIT-licensed open weights
- Same API pricing as GLM-5.1
Tech Blog: https://t.co/LAsxUdN0JZ
Weights: https://t.co/g0A1C4UWx4
API: https://t.co/Kc3E22cbN7
Coding Plan: https://t.co/Nk8Y98HNhU
Chat: https://t.co/WCqWT0qCQb
🚀 New blog: The next generation of speculative decoding: DFlash and Spec V2
DFlash + Spec V2 hit >4.3X baseline throughput for LLM inference, now the default speculative decoding engine in SGLang! Together with @modal and https://t.co/ZXetBKIRym, our jointly-released DFlash drafter for Qwen 3.5 397B-A17B beats both baseline and native MTP in every setting we benchmarked:
1️⃣ >4.3X baseline & 1.5X native MTP throughput (concurrency 1, HumanEval, 8xB200)
2️⃣ Block diffusion drafter: a full token block in one forward pass
3️⃣ KV injection: target-model features fed into every draft layer’s KV cache for higher acceptance
4️⃣ Spec V2 overlap scheduler: +33% end-to-end
Read the code, deploy a DFlash server, and start experimenting!
As hybrid models (Qwen 3.5 / Nemotron Ultra) run agents with massive context, Gated-DeltaNet / Mamba states become a bottleneck. A simple insight to make this 2x faster: load the states, compute, but don't store them. This recompute trick finally unlocks spec decoding for SSMs
probably the best blog i have read for some time
viewing SFT, RL, and OPD as different ways of reshaping a model's distribution makes their tradeoffs super intuitive.
- SFT pulls toward a fixed external target
- RL moves along the reward gradient on on-policy samples
- OPD sits in between, using a teacher signal but on student-generated data, which is why it inherits RL's anti-forgetting properties even when the teacher itself was an overtrained SFT model.
the post is heavily grounded in recent literature and uses the distributional perspective as a unifying bridge across all three paradigms, i really like the point it argues the load-bearing ingredient is on-policy data and OPD's convergence to RL-like outcomes is the strongest evidence
good stuff from microsoft: 4B model just to explore code bases, cutting token costs by 10-50% (!!!) while the performance of the big model stays the same :)
"From AGI to ASI": new paper from our team.
This report investigates how AI might develop beyond AGI. It describes theoretical limits, potential pathways, and potential bottlenecks.
https://t.co/x0ZEV2xhNw
🚀 Meet PRX Pixel.
Our new open-source 7B text-to-image model that generates images directly in pixel space.
After months of pretraining on hundreds of millions of images, supervised fine-tuning, and preference alignment, we're excited to share a first public preview.
The weights are already available, and we're currently working on integrating the model directly into Diffusers 🤗to make the model even easier to use.
Test it yourself in the demo below. And as always, we'll be sharing the full story behind the model through a series of technical blog posts covering the entire training recipe.
Link in the comments 👇
Today we're releasing ZONOS2, our next-generation real-time TTS model with high-fidelity voice cloning.
ZONOS2 is the most expressive open-source TTS model, released under Apache 2.0 and available on Zyphra Cloud on @AMD. 🧵
The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees.
The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance.
Access to all other Claude models is not affected.
We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible.
Read our full statement: https://t.co/bwn0sximKZ
What happens when multi-agent systems stop relying on a central “controller” agent? Can agents coordinate by sharing results directly with each other?
Introducing Decentralized Language Models (DeLM): we let agents coordinate asynchronously through a shared context. Agents claim tasks from a queue and write back compact, verified results as they finish, making progress visible to all workers without requiring a main agent to merge, filter, and rebroadcast it.
New paper with @azaliamirh!