Open-sourced Dirgha CLI: 17K+ downloads.
Building @DirghaAI, a sovereign agentic OS, and decentralized work/compute marketplace.
PS.📚 @MithilaReview 🕊️
Lord Rama crossed the ocean by building a bridge stone by stone, with an army of contributors, each doing what they were capable of.
Dirgha is following the same model: we're bringing together people and organizations to build fully indigenous, agentic, sovereign AI computers.
Introducing a research system that enables passive heart rate monitoring (PHRM) during everyday smartphone use. Using the front-facing camera, it achieves industry accuracy standards for heart rate across all skin tones.
Check out the blog to learn more: https://t.co/O4F4Uh8gN4
I think the challenge is that everyone can now build apps
But
1) almost nobody has distribution (like an audience), or
2) the money to pay for distribution (ads or UGC), or
3) the creative genius to get distribution for free (classically called guerilla marketing)
Before the week ends, let's acknowledge one of the most INSANE week ever for open AI, with 25+ notable open-weight drops across every modality:
🧠 LLMs
→ NVIDIA Nemotron 3 Ultra: 550B hybrid Mamba-MoE, only 55B active, 1M context, MMLU 89.1. NVFP4 variant claims ~5x throughput on Blackwell. First openly-weighted 550B hybrid Mamba-Transformer, closing the gap with frontier closed models.
→ Google Gemma 4 12B: fully open dense any-to-any (text/image/audio/video), 256k context, encoder-free, 140+ languages, AIME 2026 at 77.5. Shipped with a 23-checkpoint QAT wave (mobile ONNX + MLX). Most deployable model of the week.
→ StepFun Step-3.7-Flash: 198B sparse MoE VLM, ~11B active, SWE-Bench PRO 56.3. Apache 2.0.
→ Liquid AI LFM2.5-8B-A1B: edge MoE, just 1.5B active, 128k ctx, MATH500 88.8, MLX-ready. Best on-device option this week.
→ JetBrains Mellum2-12B-A2.5B-Thinking: their first open MoE, near-Qwen3-14B coding at 2.5B active. Apache 2.0.
🎨 Image gen (the surprise of the week)
→ Ideogram 4: their FIRST-EVER open weights. 9.3B flow-matching DiT trained from scratch. #2 overall behind GPT Image 2, top open-weight model on Design Arena + LMArena. Strongest open checkpoint for text-rich images, full stop. It has taste. Still can't believe this is open weights.
🔊 Audio & Speech (a breakout week for open TTS, 4 labs shipped)
→ Boson Higgs Audio v3 4B: 102 languages, 21 emotions, singing/whispering/shouting, sub-second TTFA.
→ RedNote dots.tts: the only fully continuous (no codec) open TTS pipeline, Apache 2.0.
→ Google Magenta RealTime 2: real-time music gen, <200ms latency, text+audio+MIDI. multimodalart ported it to PyTorch within hours with live ZeroGPU demos.
→ NVIDIA Nemotron-3.5 ASR: 600M streaming, 17x more concurrent streams vs Parakeet RNNT 1.1B.
👁️ Vision & VLMs
→ PaddleOCR-VL-1.6: SOTA document parsing at 1B params, Apache 2.0.
→ Baidu NAVA: 6.3B joint audio-video gen, best-in-class A/V sync, Apache 2.0.
🎬 Video, 3D & World Models
→ NVIDIA Cosmos3-Super: 64B omnimodal world model coupling action trajectories with video+audio gen, for Physical AI.
→ JD JoyAI-Echo: up to 5-min multi-shot text-to-video on LTX-2.3.
→ ByteDance Bernini-R + VAST TripoSplat (single-image-to-3D Gaussian splats, MIT).
Sovereign AI at population scale isn’t theory anymore, it’s shipping.
Sarvam AI is building a full-stack, “Made in India” AI platform that:
🧠 Trains 100B+ parameter MoE models efficiently across 4,096+ NVIDIA H100 GPUs
⚡ Delivers millisecond-level, multilingual voice inference for Aadhaar
📞 Powers automated KYC, sales, and support across telephony and WhatsApp for brands like Tata Capital and Infosys
By integrating ASR, LLMs, and TTS into low-latency voice agentic workflows, Sarvam is bringing natural, real-time AI interactions to 1.4B Indians in their own languages. 🇮🇳
This feels like a blueprint for how countries can build sovereign AI without compromising on performance or safety.
Every country needs to define their agent/robot framework already.
Non-human corporation will solve real, human problems at unprecedented speed and scale.
Now we know why Peter Thiel packed his bags for Argentina.
Milei just submitted his AI legislative framework to Congress, where he proposes:
- zero regulation on AI development,
- a brand-new "non-human corporation" category for AI/robot-operated entities with limited liability
-a low-tax regime with flexible governance rules.
The Dutch East India Company gave the world the limited liability company in 1602. Milei wants Argentina to do the same for autonomous AI agents in 2026.
Competition with China might be the best thing to happen to America since the cold war.
We’ve been leading the world for so long, but we got a bit complacent. Competition breeds excellence.
Congratulations @OilIndiaLimited !
An ocean of energy opportunities reinforced in the Andaman Sea!
Very happy to report the presence of natural gas in Sri Vijayapuram-3 an exploratory well drilled by Oil India Ltd. 15 km off the east coast of the Andaman Islands at a water depth of 355 meters.
Initial production testing of the well at the depth of 1900 plus meters in the Eocene formation has established the presence of natural gas through continuous flaring.
Oil India is carrying out gas sampling to assess the composition & calorific value of gas and to carry out isotope studies to understand the genesis of the gas.
Under the Samudra Manthan Mission (National Deep Water Exploration Mission) announced by Hon’ble PM @narendramodi Ji on Independence Day 2025, large number of deepwater & Ultra deepwater exploration wells are planned in our offshore basins to fully exploit our hydrocarbon reserves.
Presence of hydrocarbon is now reported in 2(Two) wells out of 3 (Three) exploratory wells drilled by OIL in current exploratory campaign off Andaman Basin.
This presence of natural gas will help us in taking forward our exploration ambitions in coordination with global deepwater exploration experts like @petrobras, @TotalEnergies, @bp_india, @Shell, @exxonmobil and will be a significant milestone in our journey through Amrit Kaal!
@PMOIndia@PIB_India@PetroleumMin@mygovindia
#OilIndia #EnergySecurity #Exploration #NaturalGas #Andaman
Gemma 4 12B can now run locally on just 8GB RAM via Dynamic GGUFs.
Google's new model, Gemma 4 12B Unified supports image, audio and 256K context.
You can run and train the model via Unsloth Studio.
GGUF: https://t.co/8cL321pVDh
Guide: https://t.co/odRo9WjRpA
Today, we’re excited to introduce Miso One, the most emotive voice model in the world.
Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency.
We’ve open-sourced the model weights, with API access coming soon.
Hear how Miso One sounds in the thread below.
Interested in learning how to run RL at scale? Here are the best resources to read…
Research on Scaling RL
1. The Art of Scaling RL compute for LLMs: https://t.co/PGjI6Gwgv0
2. Scaling Behaviors of LLM RL Post-Training: https://t.co/2u2saB3C0h
3. Optimally Scaling Sampling Compute for LLM RL: https://t.co/rUSdUvJyNH
4. Scaling up RL: https://t.co/O8vV6z8ymx
5. ProRL V2 - Prolonged Training Validates RL Scaling Laws: https://t.co/vu72juvRW4
6. Polaris - A Recipe for Scaling RL with Reasoning Models: https://t.co/rMibSAeJbg
RL Frameworks
1. Hybrid Flow (early outline of the verl framework): https://t.co/GnWXx131uD
a. More up-to-date info can be found here: https://t.co/j801HcJmPP
2. AReal - Large-Scale Async RL: https://t.co/qhOvsQK09N
3. PipelineRL - Fast On-Policy RL: https://t.co/iRM7KzySXe
4. AsyncFlow - Async Streaming RL: https://t.co/YwmzFtiU2q
RL for Agents
1. DeepSWE - Open Coding Agent Trained w/ RL: https://t.co/GHQHcmtE6F
2. AutoForge - Environment Synthesis for Agentic RL: https://t.co/mr3WDIL5vq
3. Agent-R1 - Training Agents w/ End-to-End RL: https://t.co/xpfQJGgzEv
4. AgentRL - Scaling RL for Multi-Turn, Multi-Task Agents: https://t.co/7fbVl0RWXG
5. The Landscape of Agentic RL: https://t.co/OMnSV4rgdW
6. Training SWE Agents with RL: https://t.co/YqMqySbyXS
Case Studies & Tech Reports
1. Kimi tech reports:
a. Kimi K2 - Open Agentic Intelligence: https://t.co/aAw17SXrIw
b. Kimi End-to-end Agentic RL: https://t.co/ProBpOPIiI
c. Kimi K1.5 - Scaling RL for LLMs: https://t.co/kRGOxY9Jvp
2. Composer series from Cursor:
a. Composer 2: https://t.co/K0v8rNCE6Z
b. Composer 2.5: https://t.co/D9PYimfOMU
3. Olmo 3 (also has open code / data): https://t.co/khetJFvp6N
4. MiniMax tech reports:
a. MiniMax-M2: https://t.co/HApb0OB80S
b. MiniMax-M1: https://t.co/mZj9UQsrnC
5. Nemotron 3 (NVIDIA): https://t.co/lCpE1GzxSi
MAI-Thinking-1 is a great paper because it shows the reasoning traces of a proper model research pipeline, not just the final answer. Many details, but I am interested in how they improve DAPO by dynamically adjust the upper bound of the clip ratio in the loss.
Traditionally in PPO/GRPO, the upper vs lower bound of clip ratio is the same, controlled by the same _eps scalar. The insight in DAPO is that high entropy tokens will naturally have higher logprob ratios, and they are however penalized by a strict upper bound. The DAPO paper just extends the upper bound by a bit to counter that.
The new insight in MAI-Thinking-1 is that since what matters is not over-penalizing high entropy tokens, we should instead update the upper bound every step based on the historical entropy pattern. So they initialize a bump to the base upper bound (inverse of lower bump) at 0, calculate the entropy per step, and use that to adjust the bump so that we get a running estimate of the optimal upper bound based on entropy pattern.
I like this technique because it has a classical feel to it, reminiscent of the running scale and shift factors in batch normalization during training. Mathematical details like this are abundant in the paper, so go check it out!
Paper: https://t.co/Y8paQuOcEK
Welp, that happened faster than I predicted. Thought it would be end of 2027, then early 2027, but agentic traffic growing so fast that bots have now passed human traffic online for the first time in the Internet's history. https://t.co/2zX5bHdhsa
Peter Thiel on the type of company more startup founders should build
Thiel first emphasizes his belief that when starting a company, you should always ask:
“Can this company become a monopoly?”
He then lists three of the most common types of monopolies:
Super fast distribution on a very thin product (e.g. Twitter)
A technological advantage that is continually built upon with iterative improvement and compounds over time (e.g. SaaS software)
A truly brilliant breakthrough (e.g. Bitcoin)
But he argues that there’s a different monopoly category that’s continually overlooked:
“A different modality for innovation that we do very little of and we don’t even recognize as an important category is what I would describe as ‘Complex Coordination,’ where you take a lot of different pieces and the challenge is to coordinate them into something new.”
Thiel continues:
“This is the thing that’s maybe 180 degrees antithetical to the Lean Startup ethos. It’s complicated. You have to put all the pieces together in just the right way. I think this is on some level what really drove Apple as an innovative company in the last decade… What was new about the iPhone? There was no single component that was new. It was just that you put all of these things together in just the right way… and once you built it, it was actually super hard for people to replicate. You had an advantage for many years.”
He points to Tesla and SpaceX as more recent examples.
“There’s no component to the Tesla that’s actually that new. It’s just that you put all of the pieces together. You re-engineered the whole distributor network. It was this complex coordination that made it work. There’s like this lost art of accounting where you figure out how much things cost and add them all together. And Elon has discovered this lost art of accounting which no other people practice.”
It’s time to move from renting intelligence to truly controlling your AI. Microsoft Frontier Tuning lets you take our models and make them uniquely your own, turning them from capable generalists to completely custom partners.
It starts with reinforcement learning environments (RLEs) that allow our models to learn directly from your workflows. Think of them as training gyms for AI. Here the agent learns your very specific processes, your standards, your way of working. It goes from off-the-shelf to hyper-adapted to exactly what you and your teams need. Those adaptations drive efficiency and performance, and your unique models can keep continually learning in your RLEs. This changes the nature of AI – and it changes the impact.
For example, within Microsoft we use our RLEs combined with our MAI models to climb towards the best agentic use cases for Excel. Our MAI tuned model is on par with GPT-5.4 on public and private benchmarks, while being up to 10X more efficient.
Only you control your agents made with Frontier Tuning. You keep the benefits of your hard-earned know-how, data and institutional knowledge. With us, the RLEs and the models you build in them become your moat.
This is distinct. It’s a new era. An era of AI that you control, on your terms. I think it’ll be a good one. More on the blog: https://t.co/v65eop5aHS
one of the quotes i find most inspiring on a hard day:
"Whatever your hand finds to do, do it with all your might, for in the realm of the dead, where you are going, there is neither working nor planning nor knowledge nor wisdom"
Ecclesiastes 9:10