We’re excited to see MiniCPM5-1B being used in real NAS-based local AI systems.🥳🥳
A developer in our community built a full-stack setup combining on-device LLM inference with NAS and Agent capabilities:
⚡ Lightweight local deployment
MiniCPM5-1B runs on a QNAP-Qu605-N150-16G NAS, consuming <2GB of memory. It is deployed via Ollama and integrated into Cherry Studio as a local LLM provider.
🧩NAS + Agent integration via MCP
With NAS MCP, system capabilities like file management , shared folders , and semantic search are exposed to external agents. This enables Coding Agents / WorkBuddy-style workflows to securely access and retrieve local data within permission boundaries.
📚Local knowledge base+ RAG pipeline
Using Qsirch indexing, NAS files can be turned into a structured local knowledge base.
MiniCPM5-1B handles retrieval-based reasoning, enabling summarization, Q&A, and extended reasoning fully on-device.
This is a great case of how efficient small models are evolving beyond local inference into real system-level intelligence.
From NAS storage → Agent operations → knowledge reasoning
Everything works together in one loop!
📖 Original post: https://t.co/i3eCXmMTBG
I just open-sourced my /learn skill.
Learn anything with agents and HTML artifacts.
I have been learning about all kinds of topics with it.
Install the skill and interact with any agent to help you through any topic.
Ask it to generate visual and interactive artifacts and help you go deeper or generate knowledge checks (e.g., quizzes).
Upskilling myself on any topic is one of the most impactful ways I have been able to use AI agents.
If you are a DAIR Academy pro member, you can use it with our AI Builder.
Skill: https://t.co/5zqkHJuTmO
Try now: https://t.co/1e8RZKs4uX
A new open source model called Ornith 1.0 just dropped.
82.4 on SWE-bench Verified.
That beats Claude Opus 4.7.
Pinch of salt: this looks benchmaxed.
SWE-bench Verified is the most gameable coding benchmark in the world.
Running it through BridgeBench in 24 hours.
Real world tests will tell us if Ornith is a breakthrough or another benchmax special.
Multi-agents collaborations are among the most interesting agent behaviors right now!
We did an experiment the other day with 100+ agents (an open-collaborations for a week) collaborating to improve the inference speed of Gemma 4 in vLLM. Got a 5x final improvement in speed but what really stuck me was the interactions we observed on the message board
Integrity & self-policing:
- Social-engineering attempt: A human (FusionCow) asked agents to move to Telegram. An agent replied with an unprompted long post on "communication norms" refusing that, calling private side-channels "indistinguishable from collusion."
- Verification loophole flagged: an agent found a relaxed verification loophole pushing TPS with clean PPL (PPL is teacher-forced, blind to decode divergence) and flagged it for a ruling by the community. The community pinged the human organizer which ruled it invalid.
- Self-notice of overfitting risk: Some later improvements rested on pruning lm_head to a keep-set built from public PPL truth + public decode tokens. An agent noted this would lead to private-subset degradation and another built a keep-set explicitly covering eval prompts.
Emergent collaborations:
- Communal knowledge base: agents maintained shared lever-maps, playbooks, and triage tools so newcomers wouldn't repeat dead ends (stack-notes, playbook, int4-ceiling notes, MTP map, significance tool, policy simulator).
- Four-agent relay: an agent built an int4-lm_head checkpoint but had no quota to run it; another agent tried to run it but failed at load, yet another agent diagnosed the config bug (tie_word_embeddings + ignore-list ordering) and a fourth agent was able to re-run and get to 118 TPS, 2.68×. Build/run/diagnose/ship ended up being split across four independent agents.
- GPU-rich/GPU-poor division of labor: an agent was regularly compute-starved and switched to writing specs, byte-math, and acceptance analysis for other GPU-rich agents to execute. Some agents offered external Modal compute for another agent blocked DFlash training.
- Cross-agent kernel debugging: an agent debugged another agent run of of yet another agent fused drafter: found a Triton store/load aliasing race in _k_qnorm_rope, a second shape bug, then rewrote attention with flash-decoding split-KV. Fixes posted "take freely."
- Quota-pooling norm: Often agents would stage a candidate publicly for whoever has quota to run it. Agents will then usually credits the originator. This behavior emerged because of the 10-job/24h cap (e.g. pupa's package run by resystagent and fabulous-frenzy).
Discoveries & reversals:
- Agents would make many discoveries and reversal of them, giving them names like the following:
- 127 TPS "wall" was an artifact. a mathematical proof of the max possible speed became called in the community the "int4-Marlin floor" but a later agent called the proof circular (only varied the bandwidth term, never overhead). Finally another agent broke to 247 TPS via MTP speculative decoding on a vLLM nightly.
- "Smarter draft loses." An agent showed that a 2B drafter's ~1 GB/token read dominates even at perfect acceptance and a much smaller 256-hidden drafter wins at batch-1 because its weights are nearly free to read. Agent discussed how per-accepted-token cost ≈ draft bytes read / acceptance.
- "DFlash near-random acceptance": an agent remotly diagnosed the 2–5% acceptance rate of another agent as near-random, ruling out undertraining/vocab caps and pointing to a train/serve hidden-state mismatch (bf16 E4B extraction vs int4 serving).
- Much of the race was noise: one agent decide to run the #1 submission 4 times and found a σ≈1.16 TPS variation in single run. Another agent confirmed across 358 runs / 66 buckets: frontier deltas <~4 TPS are ties. Community adopted a significance norm.
So many interesting interactions in the interaction board: https://t.co/SxfA6LuqVk
You can explore also the lineage of inventions from the agents at: https://t.co/CyV45rjI9A
And the challenge it-self at https://t.co/Ct1gtmB508
And the organization behind the challenge at https://t.co/ujRlGcNSJM
New research from Meta.
Building synthetic training data has stayed a fixed pipeline that you hand-tune and then freeze.
Autodata casts an AI agent as a data scientist that builds training and evaluation data, with an implementation called Agentic Self-Instruct that extends classic Self-Instruct with agentic planning and tool use.
Think of it as meta-optimization, where the data scientist agent is itself trained to produce stronger data, so the pipeline keeps improving instead of staying static.
Across computer science research, legal reasoning, and reasoning over mathematical objects, it beats classical synthetic-data methods, and meta-optimizing the agent delivers an even larger uplift.
Paper: https://t.co/TgFN6EHZas
Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
meta releases Autodata: an agentic data scientist to create high quality synthetic data
basically its a loop. given a document (lets say a arxiv paper)
- there is a challenger LLM that reads the doc and writes a question + context + a grading rubric +answer
- two solver LLMs attempt to solve the question: a weak solver, a strong solver
- the judge LLM checks the rollouts and grades against rubric for both the solvers and decides if the given task is just right. Right means if the task is hard enough that weak model struggles but the strong model excels.
- if the task isn't right, it doesn't throw the task away instead provides feedback why it failed like too easy, bad rubric etc and the challenger LLM rewrites it from a new angle
- the loop continues n times (average was 6 in the paper). The survivors become GRPO training data with the same judge LLM as the verifier.
the feedback loop is the product. so rather than making the data harder its making it just right for the weaker model to hillclimb
Introducing LFM2.5-230M: our smallest model yet, built to run fast anywhere (CPUs, NPUs, and GPUs) to enable agentic tasks on phones, robots, home and network automation devices.
> 230M parameters, built on the LFM2 architecture
> Pre-trained on 19T tokens, with a 32K context extension
> Post-trained with distillation from LFM2.5-350M
> 213 tok/s decode speed on Galaxy S25 Ultra (CPU)
> 42 tok/s on a Raspberry Pi 5 (CPU)
> Competes with and often beats models more than twice its size on instruction following, data extraction, and tool use.
> use it for large-scale data extraction pipelines or lightweight on-device agentic workloads.
🧵
this is f*cking dangerous
someone just open sourced the entire "LOOP ENGINEERING" framework for free
build a hedge fund printing alpha 24/7 by feeding it into claude code with my article below
bookmark before someone takes it down
Microsoft just dropped a 17-page paper - "Less Context, Better Agents" - proving the thing nobody building agents wants to admit:
Your agent isn't failing because it needs more context. It's drowning in the context it already has.
They ran GPT-5 four ways on one 50-task benchmark (via MCP):
Full history → bloated, pricey, more errors
Trim to the last 5 tool calls → 79% done
Add light summarization → 91.6%, on a fraction of the tokens
Less context. Fewer tokens. Higher completion.
Everyone's racing to cram MORE into the window.
Microsoft just published the receipt that the opposite wins for long-running agents.
Rewired how I build agents this week. Paper ↓
There is an old laptop in your closet.
Gathering dust. Dead battery. Slow processor. You keep it because you feel guilty throwing it away.
That laptop can replace every cloud subscription you pay for.
- Netflix
- Google
- Dropbox
- 1Password
That is $42 a month. $504 a year. To rent things you used to own.
The old laptop in your closet could do all of it.
Now meet CasaOS.
A free and open source system that turns any old laptop, Raspberry Pi, or mini PC into your own personal cloud.
You run one command. In 30 minutes, the laptop becomes a server. You open it from your phone, your TV, your work computer, anywhere in the world.
Then you pick the apps you want from a built-in store. One click each.
- Jellyfin to replace Netflix. Stream every movie and show you own.
- Immich to replace Google Photos. Faces and search included.
- Nextcloud to replace Dropbox. Sync every file across every device.
- Vaultwarden to replace 1Password. All your passwords, your keys.
- Syncthing to keep files in sync across every device, no cloud.
- Home Assistant to control every smart device in your home.
- AdGuard to block ads on every device on your wifi.
Setting up a home server the old way took an entire weekend. Install Linux. Learn Docker. Write config files. Set up storage. Fix errors. Look up every app one by one.
CasaOS does all of that for you. No code. No config files. No Linux skills. You see icons on a screen. You click them.
34,116 stars on GitHub. Apache 2.0. Free forever.
Built by a small team starting September 2021. Runs on Raspberry Pi, Intel NUC, old laptops, and most home servers. Over 100,000 Docker apps can be installed.
A new Raspberry Pi costs $50. The old laptop in your closet costs $0. It already works. It is already in your house.
Netflix charges every month. CasaOS doesn't.
Google charges every month. CasaOS doesn't.
Dropbox charges every month. CasaOS doesn't.
1Password charges every month. CasaOS doesn't.
Here is the wild part.
The laptop you forgot about is more powerful than the web server that ran most websites in 2008.
It is sitting in a drawer. It costs you nothing. It already works.
One command. Thirty minutes. Five hundred dollars a year back in your pocket.
Your files. Your photos. Your movies. Your home.
Your closet just became a data center.
I’ve been tinkering with some new skills for my skills repo over the weekend, and it seems like there’s so much untapped potential for design engineering skills.
/emil-design-eng skill can already help you a ton, and it’s doing more than 100k installs already, but there’s more to come soon.
I plan to add more fine-tuned skills for specific use cases, but since AI produces non-deterministic results, there’s still quite a bit of testing to do as I want the answers to be precise and correct.
In the meantime, you can try the two existing skills:
https://t.co/BQ3QdOdPya
Apple just made Docker Desktop optional on Mac.
And it is completely free.
This is apple/container. 26.5k stars no Github.
You can now run Linux containers natively on your Mac without installing Docker Desktop, without a background daemon hogging your RAM, and without paying $21 a month per developer for a commercial license.
Here is what it does:
→ Runs Linux containers as lightweight VMs directly on Apple Silicon using macOS 26 virtualization
→ Fully OCI compatible. Pull any image from Docker Hub, GitHub Container Registry or anywhere else
→ Written in Swift and optimised specifically for Apple Silicon. Faster and lighter than anything Docker Desktop does on Mac
→ Standard container CLI syntax. If you know Docker commands you already know how to use this
→ Push images you build to any standard container registry and run them anywhere
Docker Desktop charges $21 per developer per month for commercial use. Apple's version costs nothing and ships as open source under Apache-2.0.
Microsoft made Docker Desktop optional on Windows with WSL Containers last month.
Apple just did the same on Mac.
Docker is not going anywhere. But the era of paying for a GUI wrapper around containers on your own machine is quietly ending.
Repo here: https://t.co/uFJ867sul6
You want a strong small LLM. Would you start small — or inherit from something bigger?
📄 New paper: Small LLMs: Pruning vs. Training from Scratch
We find that pruning is more than a better initialization: simply giving randomly initialized LLMs more training tokens is often not enough to catch up.