Introducing Decentralized Language Models (DeLM)!
DeLM is a multi-agent framework that enables asynchronous, verified & reusable progress!
It makes agentic tasks more accurate and significantly cheaper. For example, it achieves 65.7% on SWE-bench Verified using Gemini 3-Flash, a ~10% jump over the best centralized alternatives at less than half the cost.
Great work led by @Mao_Yuzhen !
Design is full of codewords. Knowing them changes what you can ask for, and what you can get back, whether you're working with devs, or an AI.
“tint this neutral color”, “fix this widow”, “nudge it to the optical center”
I wrote them down: https://t.co/aFyd5avj9o
Karpathy found a way to reduce token consumption by 90%
The problem is that the LLM re-reads the same files over and over again, loses context between documents, and provides less accurate answers as a result
The solution is called Wiki Layer the LLM cleans, structures, and links all your data once, after which it never works with raw files again
Three folders `raw/` for originals, `wiki/` for a clean knowledge base in Markdown, and files with rules for the agent
Result up to 90% token savings on repeat queries, automatic links between documents, and a visual knowledge graph in Obsidian
Everything stays on your local machine nothing goes to the cloud
parakeet.cpp: native C++/ggml (@ggml_org) inference for @NVIDIAAIDev's Parakeet, one of the best speech-to-text models out there, from the @LocalAI_API team.
Every Parakeet model (TDT/CTC/RNNT/hybrid + cache-aware streaming), byte-for-byte identical output to NeMo, now running anywhere with no Python and even a bit faster, on CPU and GPU.
Quantized GGUF on @huggingface 🤗
Huge thanks to @ggerganov for ggml and to @NVIDIAAIDev for releasing Parakeet! 🧵
Every memory system for LLM agents evolves what it stores. None evolves how it retrieves.
🧬 EvolveMem is out, now shipping inside the SimpleMem v0.3.0 update. Powered by AutoResearch: the system researches its own retrieval, treating the full retrieval config as a structured action space and running a closed loop: evaluate ➜ diagnose ➜ propose ➜ validate ➜ repeat.
🔬 From a minimal baseline, 7 autonomous rounds produce a retrieval policy that beats the strongest published baseline by +25.7% on LoCoMo and +18.9% on MemBench.
🧬 It discovers entirely new retrieval dimensions not present in the original design, all integrated into the unified SimpleMem package.
📄 Paper: https://t.co/BWCXebWhG1
💻 Code: https://t.co/hhdgvVjblP
Led by @itsJiaqiLiu, @XinyeYee with contributions from @richardxp888, @ZhengBerkeley, @cihangxie
Backprop strongly shapes the GPU hardware AI runs on today.
Learning algorithms without backprop open new opportunities for neuromorphic silicon, biologically grounded models, and heterogeneous compute.
Paper: https://t.co/rNFmIKzCXz
Blog: https://t.co/oSq2pN0brU
It's never made sense to me that RL collapses all reward signals to a single scalar. Today, we fix that!
Introducing Vector Policy Optimization: we train models to inherently optimize for the varied nature of a reward vector, creating diverse sets of answers ideal for test time search. Website and code coming soon!
SCALING ISN’T EVERYTHING
Another tiny model breaking the rule.
-trained on less than 1/1000th of the data
- can be trained in a single day with <1000 USD
Human knowledge base ca be compressed & retrieved much tighter than LLMs do today.
🚨 BREAKING: NVIDIA proved back-propagation isn't the only way to build an AI.
Billion-parameter models were trained without a single gradient. No calculus, no exploding memory, no massive GPU clusters.
The culprit? A long-dismissed technique called Evolution Strategies.
NVIDIA and Oxford just made it scalable with EGGROLL, which replaces bloated mutation matrices with two tiny ones, enabling hundreds of thousands of parallel mutations at inference-level speed.
They're pretraining models from scratch using only simple integers. No backprop. No decimals.
We assumed the future of AI required endless precision hardware. Evolution had other plans.