2-bit Gemma 4 12B GGUF, only 4.66 GB on disk, managed to cite 15 sites from a single prompt.
Try this locally on >6GB RAM via Unsloth Studio.
GitHub: https://t.co/aZWYAtakBP
Gemma 4 12B can now run locally on just 8GB RAM via Dynamic GGUFs.
Google's new model, Gemma 4 12B Unified supports image, audio and 256K context.
You can run and train the model via Unsloth Studio.
GGUF: https://t.co/8cL321pVDh
Guide: https://t.co/odRo9WjRpA
Today we’re introducing Gemma 4 12B — our latest open model that brings advanced agentic reasoning, vision and audio directly to your laptop.
It delivers performance nearing our larger Gemma models with a much smaller total memory footprint, while being small enough to run locally with just 16GB of VRAM. It’s open and accessible for everyone to use under a permissive Apache 2.0 license.
This is all made possible by our new, unified architecture that removes separate multimodal encoders. Here’s how we did it 🧵
4-bit Qwen3.6 MTP GGUF managed to search 70+ sites from a single prompt.
Try this locally on 20GB RAM via Unsloth Studio.
Unsloth now supports auto MTP + speculative decoding & auto-selects the best MTP settings for your device (Mac, CPU, GPU).
GitHub: https://t.co/aZWYAtakBP
Qwen3.6 now runs 2x faster with MTP GGUFs! Run locally on just 18GB RAM. ⚡️
MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change.
Qwen3.6-27B MTP runs at 160 tokens/s. 35B-A3B reaches 240 t/s.
GGUFs: https://t.co/7gWhKnseZo
Guide: https://t.co/7qzk6ypWDQ
Ant group just dropped Ring-2.6-1T 🔥 1T reasoning model, built for real world agent workflows.
✨ MIT license
✨ 128K >> 256K context (YaRN)
✨ Async RL + IcePop training architecture
✨ Dual reasoning effort: "high" for fast agent loops, "xhigh" for deep reasoning = Better cost/performance tradeoff 👀
We released experimental MTP Qwen3.6 Unsloth GGUFs!
Qwen3.6 27B MTP now runs at 140 tokens/s. Qwen3.6 35B-A3B MTP gets 220 tokens/s generation on a single GPU.
Qwen3.6 27B and 35B-A3B have >1.4x speed-up over the original GGUFs without any change in accuracy.
Guide + GGUFs + Benchmarks: https://t.co/x9BYC3iXCL
In terms of average speedup, we see a 1.4x for dense models at draft tokens = 2 and for the MoE around 1.15 to 1.2x.
We do not recommend more than 2 draft tokens because the acceptance rate drops precipitously from 83% to 50% with 4 draft tokens, and the forward passes for MTP become less beneficial.
Use `--spec-type mtp --spec-draft-n-max 2`
Thanks to Aman for https://t.co/0WKkIC0kyW!
Trending repository of the day 📈
hermes-agent by nousresearch
The agent that grows with you
Last 24h: 2,065 ⭐
Total: 145,790 ⭐️
https://t.co/JzUlvlbLbr
🆕 Hugging Face 🤝 Hermes Agent 🔥
> we added Hermes Agent to local apps: run it locally with any compatible GGUF/MLX model
> shipped native traces support for Hermes Agent: visualize your Hermes traces directly on the Hub
Very soon most agents will run locally and we want to accelerate things as much as we can ⚔️
Excited to introduce Gemma 4 Multi-Token Prediction Drafters⚡️Accelerated inference right in your pockets
- Up to a 3x speedup
- Same quality guarantees
- Available in your favorite open-source tools
We're open-sourcing Asimov v1, a humanoid robot.
With Asimov v1, you can build, train on, and make it your own humanoid robot. It's the first step of building a humanoid labor force for the rest of us.
Asimov v1 is 1.2 m tall, 35 kg, with 25 actuated degrees of freedom. Structural parts machined in 7075 aluminium and 3D-printed in MJF PA12 nylon.
We're releasing the mechanical design and simulation files. Ready for locomotion policy training out of the box.
The BOM is open too. Source everything yourself, or order the DIY Kit. All components, ready to assemble. $499 deposit, $15,000 target price. Ships end of summer 2026.
GitHub: https://t.co/kjqkny2oqW
Manual: https://t.co/9tjkteOcxO
DIY Kit: https://t.co/tzvzNyXQfA
Most humanoid robots are controlled by the companies that build them. Asimov v1 is built for the rest of us. Build it, test it, and share your feedback with the community.
DeepSeek releases DeepSeek-V4. 🐋
- DeepSeek-V4-Pro: 1.6T params
- DeepSeek-V4-Flash: 284B params
DeepSeek-V4-Pro rivals Claude-Opus-4.6-Max, GPT-5.4-xHigh and Gemini-3.1-Pro-High.
They support 1M context length, thinking and set new records for Codeforces.
2-bit Qwen3.6-27B GGUF made 26 tool calls, triaged 15 GitHub issues and fixed, tested + reproed our repo’s 3 latest issues. 🔥
Try this locally in Unsloth Studio with just 12GB RAM. Studio also has a new look!
GitHub: https://t.co/aZWYAtakBP
Qwen3.6-27B can now run locally! 💜
Run on 18GB RAM via Unsloth Dynamic GGUFs.
Qwen3.6-27B surpasses Qwen3.5-397B-A17B on all major coding benchmarks.
GGUFs: https://t.co/ykKgwh2zI9
Guide: https://t.co/ITLNq20WJp