Before the week ends, let's acknowledge one of the most INSANE week ever for open AI, with 25+ notable open-weight drops across every modality:
🧠 LLMs
→ NVIDIA Nemotron 3 Ultra: 550B hybrid Mamba-MoE, only 55B active, 1M context, MMLU 89.1. NVFP4 variant claims ~5x throughput on Blackwell. First openly-weighted 550B hybrid Mamba-Transformer, closing the gap with frontier closed models.
→ Google Gemma 4 12B: fully open dense any-to-any (text/image/audio/video), 256k context, encoder-free, 140+ languages, AIME 2026 at 77.5. Shipped with a 23-checkpoint QAT wave (mobile ONNX + MLX). Most deployable model of the week.
→ StepFun Step-3.7-Flash: 198B sparse MoE VLM, ~11B active, SWE-Bench PRO 56.3. Apache 2.0.
→ Liquid AI LFM2.5-8B-A1B: edge MoE, just 1.5B active, 128k ctx, MATH500 88.8, MLX-ready. Best on-device option this week.
→ JetBrains Mellum2-12B-A2.5B-Thinking: their first open MoE, near-Qwen3-14B coding at 2.5B active. Apache 2.0.
🎨 Image gen (the surprise of the week)
→ Ideogram 4: their FIRST-EVER open weights. 9.3B flow-matching DiT trained from scratch. #2 overall behind GPT Image 2, top open-weight model on Design Arena + LMArena. Strongest open checkpoint for text-rich images, full stop. It has taste. Still can't believe this is open weights.
🔊 Audio & Speech (a breakout week for open TTS, 4 labs shipped)
→ Boson Higgs Audio v3 4B: 102 languages, 21 emotions, singing/whispering/shouting, sub-second TTFA.
→ RedNote dots.tts: the only fully continuous (no codec) open TTS pipeline, Apache 2.0.
→ Google Magenta RealTime 2: real-time music gen, <200ms latency, text+audio+MIDI. multimodalart ported it to PyTorch within hours with live ZeroGPU demos.
→ NVIDIA Nemotron-3.5 ASR: 600M streaming, 17x more concurrent streams vs Parakeet RNNT 1.1B.
👁️ Vision & VLMs
→ PaddleOCR-VL-1.6: SOTA document parsing at 1B params, Apache 2.0.
→ Baidu NAVA: 6.3B joint audio-video gen, best-in-class A/V sync, Apache 2.0.
🎬 Video, 3D & World Models
→ NVIDIA Cosmos3-Super: 64B omnimodal world model coupling action trajectories with video+audio gen, for Physical AI.
→ JD JoyAI-Echo: up to 5-min multi-shot text-to-video on LTX-2.3.
→ ByteDance Bernini-R + VAST TripoSplat (single-image-to-3D Gaussian splats, MIT).
Today we’re introducing Gemma 4 12B — our latest open model that brings advanced agentic reasoning, vision and audio directly to your laptop.
It delivers performance nearing our larger Gemma models with a much smaller total memory footprint, while being small enough to run locally with just 16GB of VRAM. It’s open and accessible for everyone to use under a permissive Apache 2.0 license.
This is all made possible by our new, unified architecture that removes separate multimodal encoders. Here’s how we did it 🧵
The laptop hasn't changed in 30 years. NVIDIA just changed it
RTX Spark is their first PC chip ever.
- RTX 5070 level GPU
- 128GB unified memory
- 1 petaflop of local AI
- thin, light, barely throttles unplugged
Your AI agent lives on the machine. 24/7. No cloud.
This is step one of the agentic AI PC, and everyone else is about to copy it.
NVIDIA, ARM tabanlı yeni işlemcisi RTX Spark'ı duyurdu.
- İşlemcide RTX 5070'e denk bir GPU bulunuyor.
- Modern oyunlarda 1440P'de 100 FPS'te çalışıyor.
- Laptop, Windows olmasına rağmen prizden çektiğinizde performans düşmüyor.
- Batarya ömrü uzun.
- Sadece laptoplar için değil masaüstü bilgisayarlarını da hedefliyor.
- Sahnede 007 First Light ve Forza Horizon 6 ile gösterildi.
- Yapay zeka işlem gücü de yüksek.
- 2026 Sonbahar'ında çıkacak.
Codex CLI 0.132.0 is out.
Highlights:
- Python SDK gains first-class authentication: API key login, ChatGPT browser and device-code flows, account inspection, and logout.
- Python turn APIs now accept a plain string as input; handle-based runs return a richer TurnResult with collected items, timing, and usage data.
- `codex exec resume` now accepts `--output-schema` to enforce structured JSON output while keeping session context.
- TUI startup is faster: terminal capability probes are now batched instead of running serially before the first interactive frame.
Complete details in thread ↓
Want to (officially) use Codex at work?
Send this post to your CTO to bring your team to Codex. Eligible enterprise customers who switch in the next 30 days get 2 free months of Codex usage for new users.
He built it over a weekend. For himself. Today every developer on the planet uses it.
Meet Abhinav Asthana ,the IIT kid who built a $5.6B company from a Chrome extension.
> Born in Kanpur. IIT Bombay. Computer Science.
> Interned at Yahoo Bangalore in 2010.
> One frustration kept following him.
> Testing APIs was a nightmare. No clean tool existed.
> So he built one.
> In 2012 - A Chrome extension. Built on weekends. Named it Postman. Put it on the Chrome Web Store and forgot about it.
> Downloads started trickling in.
> Then flooding.
> His manager had no idea.
> The "side project" was quietly becoming the most downloaded developer tool in the Chrome store.
> Half a million developers. Zero marketing spend.
> In 2014 — quit his job.
> Team of 3. Office in Bangalore.
> Before Bangalore startups were cool.
> Growth strategy? None.
> Just a tool so good — developers told other developers.
> In 2016 → 1M users.
> In 2019 → 7M users. $500M valuation.
> In 2021 → 30M users. $5.6B valuation.
> No cold outreach. No sales team. No ads.
> Pure product-led growth — before anyone called it that.
> Today 98% of Fortune 500 companies use Postman.
> Every API in the world — tested, built, documented — runs through something he built over a weekend in 2012.
From Kanpur to the global developer stack.
Absolute legend.
The database behind Twitter, GitHub, Snapchat, Airbnb, Pinterest, Instagram.
In 2009 one Italian wrote it alone, on a MacBook Air. 🤯
Meet Salvatore Sanfilippo 🇮🇹
> Italian programmer. Born 1977 in southern Italy. Goes by "antirez" online.
> Left university at 17. Self-taught coder.
> 1998 ~ invented "idle scan" ~ a stealth network scanning technique now built into nmap.
> 2009 ~ started building Redis alone, on a MacBook Air 11, to fix his own startup's database problem.
> Redis became the in-memory database powering Twitter, GitHub, Stack Overflow, Snapchat, Airbnb, Instagram.
> One of the most-used databases on Earth ~ built by one self-taught coder.
> Maintained it alone as Benevolent Dictator for Life for 11 years.
> Also built Kilo (a full text editor in under 1000 lines of C), Linenoise, Dump1090, Disque, Jim Tcl ~ all open source.
> June 2020 ~ walked away at the peak. "My hands will be free," he wrote.
> Spent two years writing a science fiction novel about artificial intelligence.
> The novel described prompt engineering ~ before ChatGPT existed. 🚀
> December 2024 ~ the internet called him back. He returned to Redis.
> Built the new Vector Sets data structure for AI similarity search.
> 27k+ followers on GitHub. Active on BlueSky. Avoids Twitter.
> Lives in Catania, Italy. Codes from home. Calls himself "the Robin Hood of open source."
He built it alone. Walked away at the peak.
Came back when AI needed a new way to think.
No fame. No equity. Just code, novels, and home.
Open source GOAT. 🐐
This guy literally shows you how to master 97% of Codex in UNDER AN HOUR 🤯
Nate just published a killer article + a full 1-hour video detailing his exact workflow, zero skipped steps!
Truly one of the most generous builders out there.
Link to the YT video in 🧵↓
Claude Platform on AWS is now generally available through your AWS account.
@claudeai Platform on AWS gives you access to Anthropic's native platform experience through your existing AWS account. Claude Platform on AWS complements Claude models on Amazon Bedrock, so you can access Claude through the approach that fits your needs.
With Claude Platform on AWS, customers can:
◽ Access the full set of Claude API and console features with same day availability for all new releases and betas
◽ Use AWS Identity and Access Management (IAM) access control and CloudTrail audit logging
◽ Consolidate billing within their existing AWS account
https://t.co/WBiwtpZAdY
Google has killed the GPU mafia 🤯
VS Code now connects directly to Google Colab.
→ You get a free T4 GPU inside your editor.
→ Your local files. Their compute.
The man who killed the $10,000 GPU myth. He did it alone, from Bulgaria, with one C file. 🤯
Meet Georgi Gerganov.
>Bulgarian developer. Nobody had heard of him.
>In March 2023, Meta’s LLaMA model leaked online
>Within days he wrote a single C file
>Called it llama.cpp
>It ran a full AI model on a MacBook. No GPU. No cloud.
>The entire AI industry said you needed $10,000 GPUs to run LLMs 🔥
>He proved you didn’t. On a laptop. Alone.
>Also built whisper.cpp ~ same thing for voice AI
> His code is the foundation of Ollama, LM Studio, and GPT4All
>107,000+ GitHub stars. Fastest open-source AI project to hit 100K ever. 🚀
>In 2026 Hugging Face hired his entire team
>Still ships code. Still open source. Still free.
Every time you run AI locally, you’re running his work.
Absolute Legend 🐐
Astro 6.3 is here! Experimental advanced routing lets you take full control of your request pipeline. Bring your own framework like Hono, compose handlers, and control exactly the order they run in.
https://t.co/LzE2TDm2Pn
We’ve agreed to a partnership with @SpaceX that will substantially increase our compute capacity.
This, along with our other recent compute deals, means that we’ve been able to increase our usage limits for Claude Code and the Claude API.