Before the week ends, let's acknowledge one of the most INSANE week ever for open AI, with 25+ notable open-weight drops across every modality:
🧠 LLMs
→ NVIDIA Nemotron 3 Ultra: 550B hybrid Mamba-MoE, only 55B active, 1M context, MMLU 89.1. NVFP4 variant claims ~5x throughput on Blackwell. First openly-weighted 550B hybrid Mamba-Transformer, closing the gap with frontier closed models.
→ Google Gemma 4 12B: fully open dense any-to-any (text/image/audio/video), 256k context, encoder-free, 140+ languages, AIME 2026 at 77.5. Shipped with a 23-checkpoint QAT wave (mobile ONNX + MLX). Most deployable model of the week.
→ StepFun Step-3.7-Flash: 198B sparse MoE VLM, ~11B active, SWE-Bench PRO 56.3. Apache 2.0.
→ Liquid AI LFM2.5-8B-A1B: edge MoE, just 1.5B active, 128k ctx, MATH500 88.8, MLX-ready. Best on-device option this week.
→ JetBrains Mellum2-12B-A2.5B-Thinking: their first open MoE, near-Qwen3-14B coding at 2.5B active. Apache 2.0.
🎨 Image gen (the surprise of the week)
→ Ideogram 4: their FIRST-EVER open weights. 9.3B flow-matching DiT trained from scratch. #2 overall behind GPT Image 2, top open-weight model on Design Arena + LMArena. Strongest open checkpoint for text-rich images, full stop. It has taste. Still can't believe this is open weights.
🔊 Audio & Speech (a breakout week for open TTS, 4 labs shipped)
→ Boson Higgs Audio v3 4B: 102 languages, 21 emotions, singing/whispering/shouting, sub-second TTFA.
→ RedNote dots.tts: the only fully continuous (no codec) open TTS pipeline, Apache 2.0.
→ Google Magenta RealTime 2: real-time music gen, <200ms latency, text+audio+MIDI. multimodalart ported it to PyTorch within hours with live ZeroGPU demos.
→ NVIDIA Nemotron-3.5 ASR: 600M streaming, 17x more concurrent streams vs Parakeet RNNT 1.1B.
👁️ Vision & VLMs
→ PaddleOCR-VL-1.6: SOTA document parsing at 1B params, Apache 2.0.
→ Baidu NAVA: 6.3B joint audio-video gen, best-in-class A/V sync, Apache 2.0.
🎬 Video, 3D & World Models
→ NVIDIA Cosmos3-Super: 64B omnimodal world model coupling action trajectories with video+audio gen, for Physical AI.
→ JD JoyAI-Echo: up to 5-min multi-shot text-to-video on LTX-2.3.
→ ByteDance Bernini-R + VAST TripoSplat (single-image-to-3D Gaussian splats, MIT).
If you are serious about options trading, this 1-hour Yale lecture is non-negotiable.
60 minutes lecture can teach you more about options trading than 99% of options trading courses.
Save this and watch it without distractions. 📌
The easiest way to find out which models you can run on your computer:
1) Install
npm install -g llm-checker
2) Detect your hardware
llm-checker hw-detect
3) Get recommendations by category
llm-checker recommend --category coding
Credits to @svpino for bringing this to my attention. I found it super useful and thought of sharing it with you all.
Here are some of the recommendations I got:
Anthropic pays $750,000+ a year for engineers who can build LLM architectures from scratch.
This 2-hour Stanford lecture gives you the exact pipeline LLM engineers get paid $750K/year for.
Data + architecture + scaling laws + post-training.
Bookmark it & watch today. Then read article below.
R.I.P. GOOGLE FLIGHTS IN 2026.
R.I.P. BOOKING COM IN 2026.
R.I.P. SKYSCANNER IN 2026.
$1,190 flight. I paid $159.
Use these 7 prompts before booking your next trip :
A single 𝗖𝗟𝗔𝗨𝗗𝗘.𝗺𝗱 file just hit 15K GitHub stars.
(derived from Karpathy's coding rules)
Andrej Karpathy observed that LLMs make the same predictable mistakes when writing code: over-engineering, ignoring existing patterns, and adding dependencies you never asked for.
If you've used AI coding assistants, you've hit all of these.
But here's the thing:
If the mistakes are predictable, you can prevent them with the right instructions.
That's exactly what this 𝗖𝗟𝗔𝗨𝗗𝗘.𝗺𝗱 does. You drop one markdown file into your repo, and it gives Claude Code a structured set of behavioral guidelines for your entire project.
This is a big deal.
- Built entirely around prompt engineering for AI coding assistants
- No framework, no complex tooling, just one .md file that shapes behavior
Developers are moving past "use AI to write code" and into "engineer the AI's behavior so the code is actually good."
The Claude Code ecosystem is growing fast, and the best tools in it aren't always software. Sometimes they're just well-crafted instructions.
100% open-source.
I've shared a link to the GitHub repo in the next tweet!
🚨 You need to see this.
@addyosmani from Google just dropped his new Agent Skills and it's incredible.
It brings 19 engineering skills + 7 commands to AI coding agents, all inspired by Google best practices 🤯
AI coding agents are powerful, but left alone, they take shortcuts.
They skip specs, tests, and security reviews, optimizing for "done" over "correct." Addy built this to fix that.
Each skill encodes the workflows and quality gates that senior engineers actually use: spec before code, test before merge, measure before optimize.
The full lifecycle is covered:
→ Define - refine ideas, write specs before a single line of code
→ Plan - decompose into small, verifiable tasks
→ Build - incremental implementation, context engineering, clean API design
→ Verify - TDD, browser testing with DevTools, systematic debugging
→ Review - code quality, security hardening, performance optimization
→ Ship - git workflow, CI/CD, ADRs, pre-launch checklists
Features 7 slash commands: (/spec, /plan, /build, /test, /review, /code-simplify, /ship) that map to this lifecycle.
It works with:
✦ Claude Code
✦ Cursor
✦ Antigravity
✦ ... and any agent accepting Markdown. Baking in Google-tier engineering culture (Shift Left, Chesterton's Fence, Hyrum's Law) directly into your agent's step-by-step workflow!
`npx skills add addyosmani/agent-skills`
Free and open-source.
Repo link in 🧵↓
BREAKING:🚨 NVIDIA just quantized Gemma 4 31B on Hugging Face 🔥
NVFP4 compression = 4x smaller weights with frontier-level accuracy.
✅99.7% of baseline on GPQA
(75.46% vs 75.71%).
📈256K context window.
🧐Multimodal (text + images + video).
vLLM-ready + Blackwell optimized.
VRAM requirements:
⚡️Weights only: ~16–21 GB
🚀Everyday use: Runs on 24 GB GPUs
📈Full 256K context = 32 GB VRAM sweet spot (RTX 5090-class consumer GPUs)
This is the 31B-class frontier model you can actually run locally on a high-end rig.
Try it today👉 https://t.co/0E6wO3PZN4
This 2 hour Stanford lecture on AI careers will teach you more about winning in the AI race than every piece of AI content you have scrolled past this year.
Bookmark this & give it 2 hours, no matter what. It'll be the most productive thing you could do this weekend.
If you want to become a world-class software engineer, learn these 19 system design case studies:
1 How Stock Exchange Works:
↳ https://t.co/iFNSX9TM9O
2 How Payment System Works:
↳ https://t.co/ARiLxGR43G
3 How YouTube Works:
↳ https://t.co/kHk3g6jz6t
4 How Google Docs Works:
↳ https://t.co/W57IkAjXpT
5 How Kafka Works:
↳ https://t.co/8rOy9KgCMo
6 How Pastebin Works:
↳ https://t.co/8NSUNlYM7q
7 How WhatsApp Works:
↳ https://t.co/VScq8QwHMr
8 How Airbnb Works:
↳ https://t.co/Bi5SAjfv5S
9 How Spotify Works:
↳ https://t.co/BxrH3oHIFS
10 How Slack Works:
↳ https://t.co/eIo29uOQOJ
11 How Reddit Works:
↳ https://t.co/o6Pw2hhj3T
12 How Google Search Works:
↳ https://t.co/jwOaC4bhnv
13 How Real-Time Leaderboard Works:
↳ https://t.co/HEChNTOHWb
14 How Twitter Works:
↳ https://t.co/pF2RYmPaIG
15 How Uber Computes ETA:
↳ https://t.co/hw1hYJqQmj
16 How Amazon Lambda Works:
↳ https://t.co/lx0BjeSRZt
17 How Amazon S3 Works:
↳ https://t.co/iReWAEHwmj
18 How Do AirTags Work:
↳ https://t.co/upWcgsXwKh
19 How ChatGPT Works:
↳ https://t.co/5lCKxq2g4N
What else should make this list?
——
👋 PS - Want my System Design Playbook (for FREE)?
Join my newsletter with 200K+ software engineers right now:
→ https://t.co/ByOFTtOihX
———
💾 Save this for later & RT to help others become good at system design.
👤 Follow @systemdesignone + turn on notifications.
Microsoft did it again!
Building with AI agents almost never works on the first try.
A dev has to spend days tweaking prompts, adding examples, hoping it gets better.
This is exactly what Microsoft's Agent Lightning solves.
It's an open-source framework that trains ANY AI agent with reinforcement learning. Works with LangChain, AutoGen, CrewAI, OpenAI SDK, or plain Python.
Here's how it works:
> Your agent runs normally with whatever framework you're using. Just add a lightweight agl.emit() helper or let the tracer auto-collect everything.
> Agent Lightning captures every prompt, tool call, and reward. Stores them as structured events.
> You pick an algorithm (RL, prompt optimization, fine-tuning). It reads the events, learns patterns, and generates improved prompts or policy weights.
> The Trainer pushes updates back to your agent. Your agent gets better without you rewriting anything.
In fact, you can also optimize individual agents in a multi-agent system.
I have shared the link to the GitHub repo in the replies!
A senior Google engineer just dropped a 421-page doc called Agentic Design Patterns.
Every chapter is code-backed and covers the frontier of AI systems:
→ Prompt chaining, routing, memory
→ MCP & multi-agent coordination
→ Guardrails, reasoning, planning
This isn’t a blog post. It’s a curriculum. And it’s free.
Best YouTube Channels To Learn AI in 2026 (No BS)
1. Fundamentals – 3Blue1Brown
2. Deep Learning – Andrej Karpathy
3. AI Research – Yannic Kilcher
4. Practical AI – AssemblyAI
5. LLMs – AI Explained
6. ML Theory – StatQuest
7. Papers Simplified – Two Minute Papers
8. GenAI – Matthew Berman
9. AI Agents – Nicholas Renotte
10. Applied ML – Krish Naik
11. PyTorch – Aladdin Persson
12. Math for ML – Serrano Academy
13. Industry Insights – Lex Fridman
14. Real-world AI – DeepLearningAI
Best GitHub repos for Claude code that will 10x your next project:
1. Superpowers
https://t.co/U5Y4BK9Lap
2. Awesome Claude Code
https://t.co/qcgoxU3Up2
3. GSD (Get Shit Done)
https://t.co/WfAhllWnTR
4. Claude Mem
https://t.co/XLQpwdnIWN
5. UI UX Pro Max
https://t.co/aQtGjMzKus
6. n8n-MCP
https://t.co/7le1aluZXH
7. Obsidian Skills
https://t.co/MUaoyUnasw
8. LightRAG
https://t.co/ye8z4UqaMc
9. Everything Claude Code
https://t.co/OAU9JE46Uz
Certainly one of the BEST channels for System Design:
https://t.co/6LJuJnIu3m
1. API Design
https://t.co/AUx3IzQkir
2. Sharding
https://t.co/sKObv3NN9H
3. Caching
https://t.co/zQr7bzsZm2
4. Concurrency
https://t.co/WcxW2fAyCZ
5. Data Modeling
https://t.co/zX2R66k7NK
6. Rate Limitter
https://t.co/zX2R66k7NK
7. DB Indexing
https://t.co/q35bD2BMiw
8. CAP Theorem
https://t.co/qAzsSi7Ej7
9. Kafka
https://t.co/PoAMbsnFOF
10. Redis
https://t.co/vbQqwgpZTw
11. System Design of Uber, WhatsApp, Bitly, etc.
https://t.co/mmUw8tamPk
You can now run ElevenLabs-level voice cloning completely offline 🤯
LuxTTS is a local TTS model that clones voices from 3 seconds of audio at insane speeds. It runs at 150x real-time without you ever having to pay a subscription.
- Works perfectly on both CPU and GPU
- Takes up just 1GB of VRAM
- Outputs crisp 48kHz audio instead of standard 24kHz
100% Open Source.
Introducing Unsloth Studio ✨
A new open-source web UI to train and run LLMs.
• Run models locally on Mac, Windows, Linux
• Train 500+ models 2x faster with 70% less VRAM
• Supports GGUF, vision, audio, embedding models
• Auto-create datasets from PDF, CSV, DOCX
• Self-healing tool calling and code execution
• Compare models side by side + export to GGUF
GitHub: https://t.co/2kXqhhvLsb
Blog and Guide: https://t.co/ENuTWal5AA
Available now on Hugging Face, NVIDIA, Docker and Colab.