Jishnu Venugopal @jishvenugopal - Twitter Profile

about 13 hours ago

Crazy model! It actually uses the old Qwen2.5-Coder-3B stack and got really great performance with their post-training stack. Need to use it in the next days to see if vibes of VibeCoder actually check out in practice. But impressive first impression! Based on the tech report, some of the important pieces of their post-training stack: 1. High-signal synthetic data (math problems with credible solutions, code with tests) 2. Multiple reasoning paths for each answer 3. Filtering, filtering, filtering 4. 2-stage SFT (start with broad training, then train on hard long-reasoning samples) 5. Use target (pass@k) accuracy over validation loss for checkpoint selection 6. MGPO (MaxEnt-Guided Policy Optimization) for RLVR: basically a GRPO-style RL method with an extra weighting that favors examples that are neither too easy nor too hard for the current policy 7. Single 64k long-context RL (they found that the usual progressive context expansion hurt this model because early truncation damaged long-thinking behavior) 8. Training data order: they do Math RL, then Code RL, then STEM RL in this particular oder which they found helped overall 9. After optimizing for accuracy, they add a stage that rewards shorter correct trajectories; basically making the model more efficient without accuracy degradation

rasbt's tweet photo. Crazy model! It actually uses the old Qwen2.5-Coder-3B stack and got really great performance with their post-training stack.
Need to use it in the next days to see if vibes of VibeCoder actually check out in practice. But impressive first impression!

Based on the tech report, some of the important pieces of their post-training stack:

1. High-signal synthetic data (math problems with credible solutions, code with tests)

2. Multiple reasoning paths for each answer

3. Filtering, filtering, filtering

4. 2-stage SFT (start with broad training, then train on hard long-reasoning samples)

5. Use target (pass@k) accuracy over validation loss for checkpoint selection

6. MGPO (MaxEnt-Guided Policy Optimization) for RLVR: basically a GRPO-style RL method with an extra weighting that favors examples that are neither too easy nor too hard for the current policy

7. Single 64k long-context RL (they found that the usual progressive context expansion hurt this model because early truncation damaged long-thinking behavior)

8. Training data order: they do Math RL, then Code RL, then STEM RL in this particular oder which they found helped overall

9. After optimizing for accuracy, they add a stage that rewards shorter correct trajectories; basically making the model more efficient without accuracy degradation

29

860

107

667

71K

jishvenugopal retweeted

Sydney Runkle

@sydneyrunkle

about 20 hours ago

https://t.co/0rSDtJGB9T

24

977

153

2K

151K

jishvenugopal retweeted

Z.ai @Zai_org

about 19 hours ago

Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency - MIT-licensed open weights - Same API pricing as GLM-5.1 Tech Blog: https://t.co/LAsxUdN0JZ Weights: https://t.co/g0A1C4UWx4 API: https://t.co/Kc3E22cbN7 Coding Plan: https://t.co/Nk8Y98HNhU Chat: https://t.co/WCqWT0qCQb

Zai_org's tweet photo. Introducing GLM-5.2: Frontier Intelligence, Open Weights

- Significant improvements in coding and agentic tasks
- Strong long-horizon capabilities with a 1M context window
- Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency
- MIT-licensed open weights
- Same API pricing as GLM-5.1

Tech Blog: https://t.co/LAsxUdN0JZ
Weights: https://t.co/g0A1C4UWx4
API: https://t.co/Kc3E22cbN7
Coding Plan: https://t.co/Nk8Y98HNhU
Chat: https://t.co/WCqWT0qCQb

395

8K

1K

2K

2M

jishvenugopal retweeted

Alexander Knigge

@AlexanderKnigge

2 days ago

THEY RELEASED THE PAPER! This is big. Mistral AI just released a full paper on exactly how they pulled off the creation of Le Chaton Fat. Is this Mistral's Deepseek moment?

AlexanderKnigge's tweet photo. THEY RELEASED THE PAPER! This is big. Mistral AI just released a full paper on exactly how they pulled off the creation of Le Chaton Fat.

Is this Mistral's Deepseek moment? https://t.co/hTkkK0tTwy

69

2K

105

705

372K

Who to follow

**/= open source. VR & Graphics, AI + ML focused page. Like physics & philosophy.

Rakesh BS

@rakesh_bs

Coder | in Love with Math and Physics | Hobbyist Photographer | Autistic

jishvenugopal retweeted

Tensordyne

@TensordyneInc

2 days ago

https://t.co/s5e3TQ6E9Z

7

105

23

74

65K

jishvenugopal retweeted

shadcn

@shadcn

4 days ago

In light of what happened, I'm doubling down on skills like /improve. A frontier model got pulled. If it happened once, it's gonna happen again. Fable today. 4.9 tomorrow or maybe gpt 6 one day. So, treat intelligence as borrowed. Drain intelligence when it's available. Build a catalog of plans today. Then implement later with a cheaper, open source, or a model you control. Build the backlog now. https://t.co/rqHw0fPv4G

141

7K

410

7K

306K

jishvenugopal retweeted

OpenRouter

@OpenRouter

4 days ago

Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works 👇

OpenRouter's tweet photo. Introducing the Fusion API, the smartest compound model in the market.

Fusion achieves Fable-level intelligence at half the price.

How it works 👇 https://t.co/OTUQAdTQjU

702

15K

2K

13K

6M

Jishnu Venugopal @jishvenugopal

4 days ago

Been dreaming for this 20+ years 🤩🤩🤩

Google Earth

@googleearth

5 days ago

Prepare for takeoff. ✈️ Flight simulator is now available globally on web to all users. https://t.co/hQP0No142P We've recently added many our most powerful professional desktop features to web. Elevation profiles, new import types, but there's always been one other feature you've been asking us to add to the web version of Google Earth, just for fun... Where will you fly? Share your best maneuvers, views, and flyovers with us!

464

32K

4K

21K

10M

0

1

20

jishvenugopal retweeted

Simplifying AI

@simplifyinAI

5 days ago

Tencent just open-sourced Hy-Memory. A memory plugin that gives Al agents real long-term memory using a 6-layer framework with dual reasoning. → System1: fast pattern matching for instant recall → System2: deep reasoning for complex memory retrieval → 35% reduction in token usage → 70% less memory bloat over time Most agents forget everything between sessions. This fixes that. Works for long-running collaborative Al agents that need persistent context. 100% Open Source.

simplifyinAI's tweet photo. Tencent just open-sourced Hy-Memory.

A memory plugin that gives Al agents real long-term memory using a 6-layer framework with dual reasoning.

→ System1: fast pattern matching for instant recall
→ System2: deep reasoning for complex memory retrieval
→ 35% reduction in token usage
→ 70% less memory bloat over time

Most agents forget everything between sessions. This fixes that. Works for long-running collaborative Al agents that need persistent context.

100% Open Source.

19

679

88

969

65K

jishvenugopal retweeted

Elaina

@Elaina43114880

5 days ago

MiniMax just updated the M3 license! Non-commercial use is now fully free. For commercial use: - Individuals or companies under $20M annual revenue only need to email them at [email protected] and add a “Build with MiniMax” label. - Only larger companies need to contact them for a full commercial license. This is actually pretty generous.

1

226

18

70

21K

jishvenugopal retweeted

shadcn

@shadcn

7 days ago

You have Claude Fable for only a few days. Here's how to make the most of it. Introducing /improve: use your most capable model to audit your codebase and write plans for cheaper models to execute later. Studies your code, figures out bugs, perf, tech debt, missing tests, what to build and writes plans any agent can run.

shadcn's tweet photo. You have Claude Fable for only a few days. Here's how to make the most of it.

Introducing /improve: use your most capable model to audit your codebase and write plans for cheaper models to execute later.

Studies your code, figures out bugs, perf, tech debt, missing tests, what to build and writes plans any agent can run.

179

6K

382

9K

764K

Jishnu Venugopal @jishvenugopal

6 days ago

😂😂😂

Tiffany Fong

@TiffanyFong

7 days ago

the new world order

490

52K

3K

4K

4M

0

16

jishvenugopal retweeted

Pankaj

@the2ndfloorguy

7 days ago

i hooked my whoop to my work calendar to find which coworker gives me the most stress 🚨 thanks to fable, I reverse engineered whoop to pull per minute heart rate. nd matched spikes with cal events and attendees I now have a leaderboard and I think about it daily. few info masked for obvious reasons ;)

the2ndfloorguy's tweet photo. i hooked my whoop to my work calendar to find which coworker gives me the most stress 🚨

thanks to fable, I reverse engineered whoop to pull per minute heart rate. nd matched spikes with cal events and attendees

I now have a leaderboard and I think about it daily.

few info masked for obvious reasons ;)

1K

45K

3K

15K

11M

jishvenugopal retweeted

Parth Asawa

@pgasawa

about 1 month ago

Today, we’re releasing Continual Learning Bench 1.0: the first, realistic benchmark for measuring how AI systems can improve in online settings. Benchmarks today assume models are stateless. Each example is independent, and once a system finishes a task, it moves on as if nothing happened. But deployed AI systems should learn from experience. We tested 10+ frontier systems against novel, expert-validated tasks and find there’s still plenty of headroom for learning. (1/n)

pgasawa's tweet photo. Today, we’re releasing Continual Learning Bench 1.0: the first, realistic benchmark for measuring how AI systems can improve in online settings.

Benchmarks today assume models are stateless. Each example is independent, and once a system finishes a task, it moves on as if nothing happened.

But deployed AI systems should learn from experience. We tested 10+ frontier systems against novel, expert-validated tasks and find there’s still plenty of headroom for learning. (1/n)

42

1K

167

1K

835K

jishvenugopal retweeted

How To Prompt

@HowToPrompt__

8 days ago

Stanford + Meta just dropped the paper that flips everything about AI agents. It's called "Code as Agent Harness." Right now, we treat large language models as text generators. When they need to solve a complex problem, they rely on a "chain of thought." But natural language is slippery. It's vague. It loses context. When an agent hallucinates in English, it just keeps talking. So they introduced a framework that changes the entire architecture of autonomy: "Code as Agent Harness." They stopped asking the AI to reason in words, and forced it to reason in code. Code isn't just the final output anymore. It is the memory. It is the environment. It is the boundary. Instead of writing a paragraph about how to solve a problem, the agent writes a script, executes it, and reads the output. Tests become its senses. Execution logs become its memory. Sandboxes become its physics. If an agent makes a mistake in English, it apologizes and hallucinates again. If an agent makes a mistake in code, the compiler throws an error. The trace tells it exactly what broke. The system forces it to fix it. This is where prompt engineering dies, and systems engineering takes over. The paper proves that reliability doesn't come from a smarter base model. It comes from the "harness" wrapped around it: - The model proposes. - The harness executes. - The environment returns feedback. - The verifier checks.

HowToPrompt__'s tweet photo. Stanford + Meta just dropped the paper that flips everything about AI agents.

It's called "Code as Agent Harness."

Right now, we treat large language models as text generators. When they need to solve a complex problem, they rely on a "chain of thought."

But natural language is slippery. It's vague. It loses context. When an agent hallucinates in English, it just keeps talking.

So they introduced a framework that changes the entire architecture of autonomy: "Code as Agent Harness."

They stopped asking the AI to reason in words, and forced it to reason in code.

Code isn't just the final output anymore. It is the memory. It is the environment. It is the boundary.

Instead of writing a paragraph about how to solve a problem, the agent writes a script, executes it, and reads the output.

Tests become its senses. Execution logs become its memory. Sandboxes become its physics.

If an agent makes a mistake in English, it apologizes and hallucinates again.

If an agent makes a mistake in code, the compiler throws an error. The trace tells it exactly what broke. The system forces it to fix it.

This is where prompt engineering dies, and systems engineering takes over.

The paper proves that reliability doesn't come from a smarter base model. It comes from the "harness" wrapped around it:

- The model proposes.
- The harness executes.
- The environment returns feedback.
- The verifier checks.

61

1K

191

1K

75K

jishvenugopal retweeted

Chris

@ChrissGPT

8 days ago

the new world order

704

24K

2K

3K

11M

Jishnu Venugopal @jishvenugopal

8 days ago

@jikkujose CFBR

0

20

jishvenugopal retweeted

Nex

@NexEcosystem

13 days ago

📢 Nex-N2 is here! A family of agentic models that doesn't just think, it acts! Coding, search, tool use. All fused into a single agentic reasoning loop. - Adaptive Thinking, auto-scales reasoning depth per step. Saves ~20% tokens, zero performance loss. - Coherent Thinking, one thinking paradigm across search, coding, and tool use. No more fragile mode-switching. 🏆 Result: Tier-1 open-source performance on SWE-bench, Terminal-Bench, GDPval, and more, tracking GPT-5.5 and Opus 4.7. 🎉 Open-weight. Try it now. 🔗 https://t.co/7oLSfyOCxB 📦 https://t.co/c2CGhXWaz6 https://t.co/KJYXZIpk8M https://t.co/vcjdZ9cuB6

NexEcosystem's tweet photo. 📢 Nex-N2 is here!
A family of agentic models that doesn't just think, it acts!
Coding, search, tool use. All fused into a single agentic reasoning loop.

- Adaptive Thinking, auto-scales reasoning depth per step. Saves ~20% tokens, zero performance loss.
- Coherent Thinking, one thinking paradigm across search, coding, and tool use. No more fragile mode-switching.

🏆 Result: Tier-1 open-source performance on SWE-bench, Terminal-Bench, GDPval, and more, tracking GPT-5.5 and Opus 4.7.

🎉 Open-weight. Try it now.
🔗 https://t.co/7oLSfyOCxB
📦 https://t.co/c2CGhXWaz6
https://t.co/KJYXZIpk8M
https://t.co/vcjdZ9cuB6

48

708

96

567

325K

jishvenugopal retweeted

0xSero

@0xSero

11 days ago

Most important software in my stack: #1 Tailscale - SSH into all your devices securely - Host websites for use in your home network - Easy high quality VPN #2 VibeProxy - Connect all your AI subscriptions - Single API which will connect to any agent #3 Codex App - Fire

91

2K

66

2K

127K

jishvenugopal retweeted

0xSero

@0xSero

10 days ago

Here’s GLM-5.1 in Codex app Do you want to use all your AI subs, local models, and Claude in Codex app? - compaction working - codex plugins workings - cache optimised - simple setup Thanks to onlyterp for making this possible https://t.co/Qr9VgZajHK

19

302

25

295

25K

Jishnu Venugopal

@jishvenugopal

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users