sarvesh

Microsoft introduces MAI-Thinking-1 It's a 1T@35B parameter model pre-trained on 30T tokens with a maximum context length of 256k tokens using 8192 GB200 GPUs. Based on benchmarks it seems to be around GLM-5 level. Microsoft also released a comprehensive 109 pages tech-report: https://t.co/okbkJV8C8w…

Who to follow

Dimitri S

@DimitriSud

💼 Dev Agency Owner ⭐ Interests: No code platforms, service integrations, Google Sheets, agency building

about 15 hours ago

Google DeepMind has released Gemma 4 12B, a unified encoder free multimodal model built for running agentic AI locally on laptops. 🔥 - 12B parameter model that runs on laptops with 16GB memory - Encoder free architecture for native image and audio processing - Performance close to the larger 26B MoE model - Native audio support with raw audio token processing - Multi Token Prediction for lower latency - Open sourced under Apache 2.0 - You can try here LM Studio, Ollama, Google AI Edge Gallery App, the Google AI Edge Eloquent app and the LiteRT-LM CLI - New Gemma Skills Repository for agentic workflows

sarvesh

@savy_builder

about 15 hours ago

A quick guide to ai agents

sarvesh

@savy_builder

about 15 hours ago

When your AI coding product’s API budget runs out, the answer should not be: “Sorry, service unavailable.” The answer should be: “Switch to a cheaper model path.” This is where open-source coding models become very useful. A smart coding product should not use GPT/Claude/Gemini for every single task. It should route work by difficulty. For example: Simple code explanation → open-source coding model Generate test cases → open-source coding model Find obvious bugs → open-source coding model Hint generation → open-source coding model Boilerplate / refactor suggestions → open-source coding model But for harder tasks: Repo-level reasoning → frontier model Complex debugging → frontier model System design critique → frontier model Final interview scoring → frontier model Deep personalized feedback → frontier model The winning architecture is not “one model for everything.” It is model routing. You can imagine the stack like this: Free tier: Open-source model only. Paid tier: Open-source model for normal tasks + frontier model for hard tasks. Pro tier: Frontier model used more aggressively for deeper reasoning, better feedback, and personalized coaching. This matters a lot for AI coding tools, interview-prep products, code-review agents, and developer assistants. Margins can get destroyed very quickly if every user action hits an expensive frontier API. But if 70–85% of requests can be handled by Qwen Coder, DeepSeek Coder, StarCoder-style models, or other open-weight models, the product becomes much more scalable. The UX should degrade gracefully: Premium mode: Best model, best reasoning, best feedback. Budget mode: Open-source model, still useful, slightly less deep. Fallback mode: Small model + static analysis + cached examples + templates. This is how you avoid the “AI app dies when credits run out” problem. The real moat is not just using an LLM. The real moat is: - routing - evaluation - fallback design - cost controls - product-specific rubrics - data flywheel - knowing when quality actually matters For a coding interview product, I would use frontier models for the live interviewer and final scoring, but open-source models for code analysis, hints, test cases, complexity checks, and first-pass feedback. That gives you good quality without destroying gross margin. AI products that survive will not be the ones that blindly call the most expensive model every time. They will be the ones that know exactly when to spend money on intelligence — and when not to.

savy_builder's tweet photo. When your AI coding product’s API budget runs out, the answer should not be:

“Sorry, service unavailable.”

The answer should be:

“Switch to a cheaper model path.”

This is where open-source coding models become very useful.

A smart coding product should not use GPT/Claude/Gemini for every single task.

It should route work by difficulty.

For example:

Simple code explanation → open-source coding model
Generate test cases → open-source coding model
Find obvious bugs → open-source coding model
Hint generation → open-source coding model
Boilerplate / refactor suggestions → open-source coding model

But for harder tasks:

Repo-level reasoning → frontier model
Complex debugging → frontier model
System design critique → frontier model
Final interview scoring → frontier model
Deep personalized feedback → frontier model

The winning architecture is not “one model for everything.”

It is model routing.

You can imagine the stack like this:

Free tier:
Open-source model only.

Paid tier:
Open-source model for normal tasks + frontier model for hard tasks.

Pro tier:
Frontier model used more aggressively for deeper reasoning, better feedback, and personalized coaching.

This matters a lot for AI coding tools, interview-prep products, code-review agents, and developer assistants.

Margins can get destroyed very quickly if every user action hits an expensive frontier API.

But if 70–85% of requests can be handled by Qwen Coder, DeepSeek Coder, StarCoder-style models, or other open-weight models, the product becomes much more scalable.

The UX should degrade gracefully:

Premium mode:
Best model, best reasoning, best feedback.

Budget mode:
Open-source model, still useful, slightly less deep.

Fallback mode:
Small model + static analysis + cached examples + templates.

This is how you avoid the “AI app dies when credits run out” problem.

The real moat is not just using an LLM.

The real moat is:

- routing
- evaluation
- fallback design
- cost controls
- product-specific rubrics
- data flywheel
- knowing when quality actually matters

For a coding interview product, I would use frontier models for the live interviewer and final scoring, but open-source models for code analysis, hints, test cases, complexity checks, and first-pass feedback.

That gives you good quality without destroying gross margin.

AI products that survive will not be the ones that blindly call the most expensive model every time.

They will be the ones that know exactly when to spend money on intelligence — and when not to.

115

sarvesh

@savy_builder

about 16 hours ago

@CodeByPoonam gemini at 900M MAU .. why not mention this

sarvesh

@savy_builder

about 16 hours ago

ALPHABET GEMINI APP MAU 900M+ VS 400M MAU Y/Y

sarvesh

@savy_builder

about 16 hours ago

When your AI coding product’s API budget runs out, the answer should not be: “Sorry, service unavailable.” The answer should be: “Switch to a cheaper model path.” For example: Simple code explanation → open-source coding model Generate test cases → open-source coding model Find obvious bugs → open-source coding model Hint generation → open-source coding model Boilerplate / refactor suggestions → open-source coding model You can imagine the stack like this: Free tier: Open-source model only. Paid tier: Open-source model for normal tasks + frontier model for hard tasks. Pro tier: Frontier model used more aggressively for deeper reasoning, better feedback, and personalized coaching.

sarvesh

@savy_builder

about 16 hours ago

Source: https://t.co/TZoy9mcCHi https://t.co/EJxQ1ZLNYO

sarvesh

@savy_builder

about 19 hours ago

@TMTLongShort This makes total sense for companies. There is so much litter.

sarvesh

@savy_builder

about 19 hours ago

@juliafedorin SF always

sarvesh

@savy_builder

about 19 hours ago

@futreaII The number of agents is whatever. The interesting part is the orchestration and memory so they can actually run without you babysitting every step. Most people are still manually prompting. That’s where the real difference is.

sarvesh

@savy_builder

about 19 hours ago

@tibo_maker This matches what I’m seeing too. Agents are scary good at clear input-output tasks but still collapse when taste or context is needed. The real product work right now seems to be designing the exact moment to hand control back to a human.

sarvesh

@savy_builder

about 19 hours ago

This is a solid signal. The shift from chatbots to real agent loops (research → code → evaluate → iterate) is happening faster than most people expected. I’m seeing the same thing on the side — narrow agents that actually close the loop are way more useful than general ones right now.

sarvesh

@savy_builder

about 19 hours ago

@0xMortyx Yeah the gap isn’t skill anymore, it’s how fast you can turn ideas into working loops. I’m testing this exact thing on the side — one narrow agent for X research + writing. Still early but already feels like having a very junior (but always available) teammate.

sarvesh

@savy_builder

about 19 hours ago

@0xnyxen Yeah this is the real 2026 vibe. I’m in the same boat — DeepMind during the day, shipping small agent stuff on the side at night. No hype, just caffeine and trying to make one narrow thing actually work. Respect for keeping multiple things alive.

sarvesh

@savy_builder

about 19 hours ago

Having a good AI job and trying to build something on the side is weirdly hard. Not because of the tech. The tech is actually decent now. The hard part is energy + focus after already thinking about AI models and agents the whole day.

sarvesh

@savy_builder

about 19 hours ago

Been at DeepMind for a bit now. Also trying to build stuff on the side as a solo founder. Here’s what I’ve noticed about AI agents in 2026: Most people are still building agents that look impressive in a demo and then die the moment you give them actual work. The ones that actually help are stupidly narrow. Like one agent that just does good X research + writes threads in your voice. Or one that handles lead enrichment + first reply drafting. Nothing fancy. Just one painful thing done well. I think the solo founder advantage right now is this: You can actually use your own agent every day. You feel the pain when it sucks. So you fix it fast. Big teams are still arguing about architecture while solo people are shipping and improving daily. Not saying one is better. Just saying the game has changed.

sarvesh

@savy_builder

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users