Marktechpost AI @MarkTechPost - Twitter Profile

Marktechpost AI

@Marktechpost

about 13 hours ago

https://t.co/1LYTi5RJWn

Google Gemma

@googlegemma

about 16 hours ago

Meet Gemma 4 12B! A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license. Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇

googlegemma's tweet photo. Meet Gemma 4 12B!

A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.

Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇 https://t.co/gf4FZv0WZb

311

10K

1K

4K

2M

0

1

97

Marktechpost AI

@Marktechpost

about 13 hours ago

https://t.co/CB8MUrrhqs

Google Gemma

@googlegemma

about 16 hours ago

Meet Gemma 4 12B! A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license. Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇

311

10K

1K

4K

2M

0

176

Marktechpost AI

@Marktechpost

about 13 hours ago

Google DeepMind Releases Gemma 4 12B: An Encoder-Free Multimodal Model with Native audio that runs on a 16 GB laptop Here's what's actually interesting: 1. No separate vision or audio encodersEvery prior mid-sized Gemma ran frozen encoders before the LLM. The 12B feeds raw inputs straight into the backbone. → Vision encoder: 550M → a 35M embedder → Audio conformer layers: 12 → 0 2. How the vision path works→ 48×48 pixel patches → One matrix multiplication into the LLM hidden dimension → No attention; a factorized X/Y position lookup That's the entire pipeline. 3. How the audio path works→ Raw 16 kHz audio, sliced into 40 ms frames → Projected into the same space as text tokens → RoPE handles the temporal sequence It's the first mid-sized Gemma with native audio. Video too. 4. Why developers should careOne unified weight space. Fine-tuning with LoRA updates vision, audio, and text in a single pass. No co-tuning frozen encoders. 5. It runs on a laptop→ 16 GB VRAM or unified memory → Performance nearing the 26B MoE, at under half the memory → Apache 2.0 license → Works with llama.cpp, MLX, vLLM, Ollama, LM Studio, Transformers, Unsloth The trade-off is honest: the LLM backbone now handles all multimodal processing itself, and Google published no full benchmark tables at launch. Full analysis: https://t.co/GsTQD5EOd6 Model weights: https://t.co/SajkzWtHK9 Technical details: https://t.co/V4V1fsk9nj @GoogleDeepMind @GoogleAI @googleaidevs @GoogleResearch

Marktechpost's tweet photo. Google DeepMind Releases Gemma 4 12B: An Encoder-Free Multimodal Model with Native audio that runs on a 16 GB laptop

Here's what's actually interesting:

1. No separate vision or audio encodersEvery prior mid-sized Gemma ran frozen encoders before the LLM. The 12B feeds raw inputs straight into the backbone. → Vision encoder: 550M → a 35M embedder → Audio conformer layers: 12 → 0

2. How the vision path works→ 48×48 pixel patches → One matrix multiplication into the LLM hidden dimension → No attention; a factorized X/Y position lookup That's the entire pipeline.

3. How the audio path works→ Raw 16 kHz audio, sliced into 40 ms frames → Projected into the same space as text tokens → RoPE handles the temporal sequence It's the first mid-sized Gemma with native audio. Video too.

4. Why developers should careOne unified weight space. Fine-tuning with LoRA updates vision, audio, and text in a single pass. No co-tuning frozen encoders.

5. It runs on a laptop→ 16 GB VRAM or unified memory → Performance nearing the 26B MoE, at under half the memory → Apache 2.0 license → Works with llama.cpp, MLX, vLLM, Ollama, LM Studio, Transformers, Unsloth
The trade-off is honest: the LLM backbone now handles all multimodal processing itself, and Google published no full benchmark tables at launch.

Full analysis: https://t.co/GsTQD5EOd6

Model weights: https://t.co/SajkzWtHK9

Technical details: https://t.co/V4V1fsk9nj

@GoogleDeepMind @GoogleAI @googleaidevs @GoogleResearch

1

7

1

6

231

Marktechpost AI

@Marktechpost

about 13 hours ago

@googleaidevs https://t.co/qW6JMt56Ti

Marktechpost AI

@Marktechpost

about 13 hours ago

Google DeepMind Releases Gemma 4 12B: An Encoder-Free Multimodal Model with Native audio that runs on a 16 GB laptop Here's what's actually interesting: 1. No separate vision or audio encodersEvery prior mid-sized Gemma ran frozen encoders before the LLM. The 12B feeds raw inputs straight into the backbone. → Vision encoder: 550M → a 35M embedder → Audio conformer layers: 12 → 0 2. How the vision path works→ 48×48 pixel patches → One matrix multiplication into the LLM hidden dimension → No attention; a factorized X/Y position lookup That's the entire pipeline. 3. How the audio path works→ Raw 16 kHz audio, sliced into 40 ms frames → Projected into the same space as text tokens → RoPE handles the temporal sequence It's the first mid-sized Gemma with native audio. Video too. 4. Why developers should careOne unified weight space. Fine-tuning with LoRA updates vision, audio, and text in a single pass. No co-tuning frozen encoders. 5. It runs on a laptop→ 16 GB VRAM or unified memory → Performance nearing the 26B MoE, at under half the memory → Apache 2.0 license → Works with llama.cpp, MLX, vLLM, Ollama, LM Studio, Transformers, Unsloth The trade-off is honest: the LLM backbone now handles all multimodal processing itself, and Google published no full benchmark tables at launch. Full analysis: https://t.co/GsTQD5EOd6 Model weights: https://t.co/SajkzWtHK9 Technical details: https://t.co/V4V1fsk9nj @GoogleDeepMind @GoogleAI @googleaidevs @GoogleResearch

0

8

0

316

0

281

Who to follow

AI & Robotics News - Unite.AI

@UniteAi

Unite AI offers detailed analysis and news on the latest advancements in machine learning and AI technology.

Zekun Wang (ZenMoore) 🔥

@ZenMoore1

#LLM #MLLM #GenAI Researcher @Kling_ai

UPenn NLP

@upennnlp

@Penn Natural Language Processing group

Marktechpost retweeted

Marktechpost AI

@Marktechpost

about 23 hours ago

This is really Big update from Nous Research: Hermes Desktop is now in public preview. If you’ve followed AI agents at all, I’d recommend reading about this one. It’s not just another chat wrapper — it’s a native, cross-platform front end for Hermes Agent (v0.15.2) running on macOS, Windows, and Linux. What stood out to me: - No terminal required for core use, while CLI/TUI parity remains intact - Streaming tool output and live tool activity— big quality-of-life win for debugging autonomous agent behavior - Right-hand preview pane for web pages, files, and tool results - Built-in voice I/O and file browser - Session continuity across CLI and desktop with no duplicated state Under the hood, Hermes does more than chat: it: - generates reusable skills after completing complex tasks - maintains persistent memory with cross-session recall - supports scheduled jobs via natural language - spawns isolated subagents for parallel work Read more here: https://t.co/HiKwthgvCv @NousResearch #AI #AutonomousAgents #OpenSource #NousResearch #HermesAgent

0

11

1

467

Marktechpost retweeted

Marktechpost AI

@Marktechpost

about 24 hours ago

This is super cool! Just checked NVIDIA Cosmos 3 this week. Here's what's actually interesting if you build physical AI. It's an open family of omnimodal world models. One model does physical reasoning, world generation, and action generation. Earlier Cosmos releases needed separate models for each of those. 1) Two towers, one transformer→ Reasoner tower: an autoregressive VLM that reads video, images, and text → Generator tower: a diffusion path for physics-aware video and actions → Information flows one way, reasoner → generator 2) Pick a size for your hardware→ Cosmos3-Nano: 16B total (dense 8B, Qwen3-VL 8B), runs on workstation GPUs like the RTX PRO 6000 → Cosmos3-Super: 64B total (dense 32B, Qwen3-VL 32B), targets Hopper and Blackwell datacenters → A 4B Edge model is planned for a later release 3) What it generates→ In: text, image, video, action arrays → Out: image, video, synchronized sound, action states, text → 256p/480p/720p, 5–300 frames (default 189 ≈ 7.9s at 24 FPS), stereo AAC at 48 kHz 4) The benchmark claims→ Open-source SOTA on R-Bench; leads PAI-Bench, Physics-IQ, and RoboLab → Top open-source on Artificial Analysis text-to-image and image-to-video → New HUE eval scores videos with yes/no fact checks across 4 dimensions and 7 domains 5) Fully open → Checkpoints, six SDG datasets, and training recipes (SFT + action post-training) → Action modes: forward dynamics, inverse dynamics, policy generation → Released under OpenMDW-1.1 6) Deployment path is there→ NIM microservices: Reasoner NIM now, Generator NIM later → BF16/FP8/NVFP4 quantization, up to 2x speedup with NVFP4 → vLLM serving plus Efficient Video Sampling (EVS) Full analysis: https://t.co/7iKDuHn36T Model Weights: https://t.co/LO7FNe8x6F GitHub Repo: https://t.co/ncQyNc9PXC @NVIDIAAI @NVIDIARobotics

Marktechpost's tweet photo. This is super cool! Just checked NVIDIA Cosmos 3 this week. Here's what's actually interesting if you build physical AI.

It's an open family of omnimodal world models. One model does physical reasoning, world generation, and action generation. Earlier Cosmos releases needed separate models for each of those.

1) Two towers, one transformer→ Reasoner tower: an autoregressive VLM that reads video, images, and text → Generator tower: a diffusion path for physics-aware video and actions → Information flows one way, reasoner → generator

2) Pick a size for your hardware→ Cosmos3-Nano: 16B total (dense 8B, Qwen3-VL 8B), runs on workstation GPUs like the RTX PRO 6000 → Cosmos3-Super: 64B total (dense 32B, Qwen3-VL 32B), targets Hopper and Blackwell datacenters → A 4B Edge model is planned for a later release

3) What it generates→ In: text, image, video, action arrays → Out: image, video, synchronized sound, action states, text → 256p/480p/720p, 5–300 frames (default 189 ≈ 7.9s at 24 FPS), stereo AAC at 48 kHz

4) The benchmark claims→ Open-source SOTA on R-Bench; leads PAI-Bench, Physics-IQ, and RoboLab → Top open-source on Artificial Analysis text-to-image and image-to-video → New HUE eval scores videos with yes/no fact checks across 4 dimensions and 7 domains

5) Fully open → Checkpoints, six SDG datasets, and training recipes (SFT + action post-training) → Action modes: forward dynamics, inverse dynamics, policy generation → Released under OpenMDW-1.1

6) Deployment path is there→ NIM microservices: Reasoner NIM now, Generator NIM later → BF16/FP8/NVFP4 quantization, up to 2x speedup with NVFP4 → vLLM serving plus Efficient Video Sampling (EVS)

Full analysis: https://t.co/7iKDuHn36T

Model Weights: https://t.co/LO7FNe8x6F

GitHub Repo: https://t.co/ncQyNc9PXC

@NVIDIAAI @NVIDIARobotics

0

15

2

33K

Marktechpost AI

@Marktechpost

about 23 hours ago

@NousResearch https://t.co/95YDMtDLLy

Marktechpost AI

@Marktechpost

about 23 hours ago

This is really Big update from Nous Research: Hermes Desktop is now in public preview. If you’ve followed AI agents at all, I’d recommend reading about this one. It’s not just another chat wrapper — it’s a native, cross-platform front end for Hermes Agent (v0.15.2) running on macOS, Windows, and Linux. What stood out to me: - No terminal required for core use, while CLI/TUI parity remains intact - Streaming tool output and live tool activity— big quality-of-life win for debugging autonomous agent behavior - Right-hand preview pane for web pages, files, and tool results - Built-in voice I/O and file browser - Session continuity across CLI and desktop with no duplicated state Under the hood, Hermes does more than chat: it: - generates reusable skills after completing complex tasks - maintains persistent memory with cross-session recall - supports scheduled jobs via natural language - spawns isolated subagents for parallel work Read more here: https://t.co/HiKwthgvCv @NousResearch #AI #AutonomousAgents #OpenSource #NousResearch #HermesAgent

0

11

1

467

0

172

Marktechpost AI

@Marktechpost

about 23 hours ago

This is really Big update from Nous Research: Hermes Desktop is now in public preview. If you’ve followed AI agents at all, I’d recommend reading about this one. It’s not just another chat wrapper — it’s a native, cross-platform front end for Hermes Agent (v0.15.2) running on macOS, Windows, and Linux. What stood out to me: - No terminal required for core use, while CLI/TUI parity remains intact - Streaming tool output and live tool activity— big quality-of-life win for debugging autonomous agent behavior - Right-hand preview pane for web pages, files, and tool results - Built-in voice I/O and file browser - Session continuity across CLI and desktop with no duplicated state Under the hood, Hermes does more than chat: it: - generates reusable skills after completing complex tasks - maintains persistent memory with cross-session recall - supports scheduled jobs via natural language - spawns isolated subagents for parallel work Read more here: https://t.co/HiKwthgvCv @NousResearch #AI #AutonomousAgents #OpenSource #NousResearch #HermesAgent

0

11

1

467

Marktechpost AI

@Marktechpost

about 24 hours ago

@NVIDIAAI https://t.co/aTFqkwR4Nz

Marktechpost AI

@Marktechpost

about 24 hours ago

This is super cool! Just checked NVIDIA Cosmos 3 this week. Here's what's actually interesting if you build physical AI. It's an open family of omnimodal world models. One model does physical reasoning, world generation, and action generation. Earlier Cosmos releases needed separate models for each of those. 1) Two towers, one transformer→ Reasoner tower: an autoregressive VLM that reads video, images, and text → Generator tower: a diffusion path for physics-aware video and actions → Information flows one way, reasoner → generator 2) Pick a size for your hardware→ Cosmos3-Nano: 16B total (dense 8B, Qwen3-VL 8B), runs on workstation GPUs like the RTX PRO 6000 → Cosmos3-Super: 64B total (dense 32B, Qwen3-VL 32B), targets Hopper and Blackwell datacenters → A 4B Edge model is planned for a later release 3) What it generates→ In: text, image, video, action arrays → Out: image, video, synchronized sound, action states, text → 256p/480p/720p, 5–300 frames (default 189 ≈ 7.9s at 24 FPS), stereo AAC at 48 kHz 4) The benchmark claims→ Open-source SOTA on R-Bench; leads PAI-Bench, Physics-IQ, and RoboLab → Top open-source on Artificial Analysis text-to-image and image-to-video → New HUE eval scores videos with yes/no fact checks across 4 dimensions and 7 domains 5) Fully open → Checkpoints, six SDG datasets, and training recipes (SFT + action post-training) → Action modes: forward dynamics, inverse dynamics, policy generation → Released under OpenMDW-1.1 6) Deployment path is there→ NIM microservices: Reasoner NIM now, Generator NIM later → BF16/FP8/NVFP4 quantization, up to 2x speedup with NVFP4 → vLLM serving plus Efficient Video Sampling (EVS) Full analysis: https://t.co/7iKDuHn36T Model Weights: https://t.co/LO7FNe8x6F GitHub Repo: https://t.co/ncQyNc9PXC @NVIDIAAI @NVIDIARobotics

0

15

2

33K

0

18

Marktechpost AI

@Marktechpost

about 24 hours ago

@NVIDIARobotics @UnitreeRobotics @SharpaRobotics https://t.co/aTFqkwR4Nz

Marktechpost AI

@Marktechpost

about 24 hours ago

This is super cool! Just checked NVIDIA Cosmos 3 this week. Here's what's actually interesting if you build physical AI. It's an open family of omnimodal world models. One model does physical reasoning, world generation, and action generation. Earlier Cosmos releases needed separate models for each of those. 1) Two towers, one transformer→ Reasoner tower: an autoregressive VLM that reads video, images, and text → Generator tower: a diffusion path for physics-aware video and actions → Information flows one way, reasoner → generator 2) Pick a size for your hardware→ Cosmos3-Nano: 16B total (dense 8B, Qwen3-VL 8B), runs on workstation GPUs like the RTX PRO 6000 → Cosmos3-Super: 64B total (dense 32B, Qwen3-VL 32B), targets Hopper and Blackwell datacenters → A 4B Edge model is planned for a later release 3) What it generates→ In: text, image, video, action arrays → Out: image, video, synchronized sound, action states, text → 256p/480p/720p, 5–300 frames (default 189 ≈ 7.9s at 24 FPS), stereo AAC at 48 kHz 4) The benchmark claims→ Open-source SOTA on R-Bench; leads PAI-Bench, Physics-IQ, and RoboLab → Top open-source on Artificial Analysis text-to-image and image-to-video → New HUE eval scores videos with yes/no fact checks across 4 dimensions and 7 domains 5) Fully open → Checkpoints, six SDG datasets, and training recipes (SFT + action post-training) → Action modes: forward dynamics, inverse dynamics, policy generation → Released under OpenMDW-1.1 6) Deployment path is there→ NIM microservices: Reasoner NIM now, Generator NIM later → BF16/FP8/NVFP4 quantization, up to 2x speedup with NVFP4 → vLLM serving plus Efficient Video Sampling (EVS) Full analysis: https://t.co/7iKDuHn36T Model Weights: https://t.co/LO7FNe8x6F GitHub Repo: https://t.co/ncQyNc9PXC @NVIDIAAI @NVIDIARobotics

0

15

2

33K

0

8

Marktechpost AI

@Marktechpost

about 24 hours ago

@liu_mingyu https://t.co/aTFqkwR4Nz

Marktechpost AI

@Marktechpost

about 24 hours ago

This is super cool! Just checked NVIDIA Cosmos 3 this week. Here's what's actually interesting if you build physical AI. It's an open family of omnimodal world models. One model does physical reasoning, world generation, and action generation. Earlier Cosmos releases needed separate models for each of those. 1) Two towers, one transformer→ Reasoner tower: an autoregressive VLM that reads video, images, and text → Generator tower: a diffusion path for physics-aware video and actions → Information flows one way, reasoner → generator 2) Pick a size for your hardware→ Cosmos3-Nano: 16B total (dense 8B, Qwen3-VL 8B), runs on workstation GPUs like the RTX PRO 6000 → Cosmos3-Super: 64B total (dense 32B, Qwen3-VL 32B), targets Hopper and Blackwell datacenters → A 4B Edge model is planned for a later release 3) What it generates→ In: text, image, video, action arrays → Out: image, video, synchronized sound, action states, text → 256p/480p/720p, 5–300 frames (default 189 ≈ 7.9s at 24 FPS), stereo AAC at 48 kHz 4) The benchmark claims→ Open-source SOTA on R-Bench; leads PAI-Bench, Physics-IQ, and RoboLab → Top open-source on Artificial Analysis text-to-image and image-to-video → New HUE eval scores videos with yes/no fact checks across 4 dimensions and 7 domains 5) Fully open → Checkpoints, six SDG datasets, and training recipes (SFT + action post-training) → Action modes: forward dynamics, inverse dynamics, policy generation → Released under OpenMDW-1.1 6) Deployment path is there→ NIM microservices: Reasoner NIM now, Generator NIM later → BF16/FP8/NVFP4 quantization, up to 2x speedup with NVFP4 → vLLM serving plus Efficient Video Sampling (EVS) Full analysis: https://t.co/7iKDuHn36T Model Weights: https://t.co/LO7FNe8x6F GitHub Repo: https://t.co/ncQyNc9PXC @NVIDIAAI @NVIDIARobotics

0

15

2

33K

0

93

Marktechpost AI

@Marktechpost

about 24 hours ago

This is super cool! Just checked NVIDIA Cosmos 3 this week. Here's what's actually interesting if you build physical AI. It's an open family of omnimodal world models. One model does physical reasoning, world generation, and action generation. Earlier Cosmos releases needed separate models for each of those. 1) Two towers, one transformer→ Reasoner tower: an autoregressive VLM that reads video, images, and text → Generator tower: a diffusion path for physics-aware video and actions → Information flows one way, reasoner → generator 2) Pick a size for your hardware→ Cosmos3-Nano: 16B total (dense 8B, Qwen3-VL 8B), runs on workstation GPUs like the RTX PRO 6000 → Cosmos3-Super: 64B total (dense 32B, Qwen3-VL 32B), targets Hopper and Blackwell datacenters → A 4B Edge model is planned for a later release 3) What it generates→ In: text, image, video, action arrays → Out: image, video, synchronized sound, action states, text → 256p/480p/720p, 5–300 frames (default 189 ≈ 7.9s at 24 FPS), stereo AAC at 48 kHz 4) The benchmark claims→ Open-source SOTA on R-Bench; leads PAI-Bench, Physics-IQ, and RoboLab → Top open-source on Artificial Analysis text-to-image and image-to-video → New HUE eval scores videos with yes/no fact checks across 4 dimensions and 7 domains 5) Fully open → Checkpoints, six SDG datasets, and training recipes (SFT + action post-training) → Action modes: forward dynamics, inverse dynamics, policy generation → Released under OpenMDW-1.1 6) Deployment path is there→ NIM microservices: Reasoner NIM now, Generator NIM later → BF16/FP8/NVFP4 quantization, up to 2x speedup with NVFP4 → vLLM serving plus Efficient Video Sampling (EVS) Full analysis: https://t.co/7iKDuHn36T Model Weights: https://t.co/LO7FNe8x6F GitHub Repo: https://t.co/ncQyNc9PXC @NVIDIAAI @NVIDIARobotics

0

15

2

33K

Marktechpost retweeted

Marktechpost AI

@Marktechpost

1 day ago

TinyFish just open-sourced BigSet — a multi-agent system that builds structured datasets from a single plain-English sentence. You type: "YC companies that are currently hiring engineers, with their funding stage, location, and number of open roles." That's the input. That's it. Here's what actually happens under the hood: 1. Schema Inference (Claude Sonnet via OpenRouter) - Infers column names, data types, and primary keys before any web access 2. Orchestrator Agent (Qwen via OpenRouter) - Runs broad discovery via TinyFish Search to identify which entities exist and where to find them 3. Sub-Agent Fan-Out - One isolated sub-agent per entity, running in parallel - Each agent is capped at 6 tool calls — fetch, search, insert, done - Dataset ID is baked into a JS closure invisible to the LLM — prompt injection can't redirect writes 4. Export - Primary key deduplication across all agents - Source attribution per row - Download as CSV or XLSX The refresh part is what makes it useful long-term. Set it to 30 min, 6 hours, daily, or weekly — the agents re-run automatically. Your dataset stays current without re-running anything manually. I have personally tested BigSet and covered the full setup walkthrough — clone to first dataset — including all env vars, make commands, and the security architecture. Here is the full analysis: https://t.co/lJMVFngeuL GitHub: https://t.co/8dL7kQdsyc @Tiny_Fish #ai #aiagent #dataset

0

13

5

2

198

Marktechpost retweeted

Marktechpost AI

@Marktechpost

2 days ago

JetBrains just open-sourced Mellum2. Here's what's actually interesting about it. It's a 12B Mixture-of-Experts model, but only 2.5B parameters are active per token. The whole design is built around being a fast component inside larger systems, not a frontier model replacement. JetBrains calls this a "focal model" philosophy. The idea: not every step in an AI pipeline needs your biggest model. Routing, summarization, validation — these are high-frequency and latency-sensitive. A small specialized model handles them efficiently while the frontier model does the heavy lifting. 1. The architecture→ 12B total parameters, 2.5B active per token (64 experts, 8 activated) → Per-token compute equals a 2.5B dense model → Multi-Token Prediction head doubles as a built-in draft model for speculative decoding → 131,072 token context window 2. The training→ ~10.6 trillion tokens across a three-phase curriculum → Muon optimizer under FP8 hybrid precision → Context extended to 128K via layer-selective YaRN → Post-trained with SFT then RLVR 3. The release→ Apache 2.0 license — commercial use, fine-tuning, self-hosting all permitted → Six checkpoints: base, SFT, and RL-tuned Instruct and Thinking variants → vLLM support with tool-calling Benchmarks: Mellum2 posts a strong EvalPlus (78.4) and competitive BFCL v3 (66.3) against models up to 14B. It trails larger comparisons on LiveCodeBench v6 and GPQA Diamond. That tradeoff is the point — this is a model for component roles, not a general-purpose leaderboard chase. I covered the full architecture, benchmark tables, and deployment details on Marktechpost: https://t.co/TY2QcCFxYM Model Weights: https://t.co/pvxT8s5AMD Technical details: https://t.co/pvxT8s5AMD @jetbrains @nv_pavlichenko #opensource #ai #llms

Marktechpost's tweet photo. JetBrains just open-sourced Mellum2. Here's what's actually interesting about it.

It's a 12B Mixture-of-Experts model, but only 2.5B parameters are active per token. The whole design is built around being a fast component inside larger systems, not a frontier model replacement.

JetBrains calls this a "focal model" philosophy. The idea: not every step in an AI pipeline needs your biggest model. Routing, summarization, validation — these are high-frequency and latency-sensitive. A small specialized model handles them efficiently while the frontier model does the heavy lifting.

1. The architecture→ 12B total parameters, 2.5B active per token (64 experts, 8 activated) → Per-token compute equals a 2.5B dense model → Multi-Token Prediction head doubles as a built-in draft model for speculative decoding → 131,072 token context window

2. The training→ ~10.6 trillion tokens across a three-phase curriculum → Muon optimizer under FP8 hybrid precision → Context extended to 128K via layer-selective YaRN → Post-trained with SFT then RLVR

3. The release→ Apache 2.0 license — commercial use, fine-tuning, self-hosting all permitted → Six checkpoints: base, SFT, and RL-tuned Instruct and Thinking variants → vLLM support with tool-calling

Benchmarks: Mellum2 posts a strong EvalPlus (78.4) and competitive BFCL v3 (66.3) against models up to 14B. It trails larger comparisons on LiveCodeBench v6 and GPQA Diamond. That tradeoff is the point — this is a model for component roles, not a general-purpose leaderboard chase.

I covered the full architecture, benchmark tables, and deployment details on Marktechpost: https://t.co/TY2QcCFxYM

Model Weights: https://t.co/pvxT8s5AMD

Technical details: https://t.co/pvxT8s5AMD

@jetbrains @nv_pavlichenko #opensource #ai #llms

1

34

5

10

2K

Marktechpost AI

@Marktechpost

1 day ago

TinyFish just open-sourced BigSet — a multi-agent system that builds structured datasets from a single plain-English sentence. You type: "YC companies that are currently hiring engineers, with their funding stage, location, and number of open roles." That's the input. That's it. Here's what actually happens under the hood: 1. Schema Inference (Claude Sonnet via OpenRouter) - Infers column names, data types, and primary keys before any web access 2. Orchestrator Agent (Qwen via OpenRouter) - Runs broad discovery via TinyFish Search to identify which entities exist and where to find them 3. Sub-Agent Fan-Out - One isolated sub-agent per entity, running in parallel - Each agent is capped at 6 tool calls — fetch, search, insert, done - Dataset ID is baked into a JS closure invisible to the LLM — prompt injection can't redirect writes 4. Export - Primary key deduplication across all agents - Source attribution per row - Download as CSV or XLSX The refresh part is what makes it useful long-term. Set it to 30 min, 6 hours, daily, or weekly — the agents re-run automatically. Your dataset stays current without re-running anything manually. I have personally tested BigSet and covered the full setup walkthrough — clone to first dataset — including all env vars, make commands, and the security architecture. Here is the full analysis: https://t.co/lJMVFngeuL GitHub: https://t.co/8dL7kQdsyc @Tiny_Fish #ai #aiagent #dataset

0

13

5

2

198

Marktechpost retweeted

TinyFish

@Tiny_Fish

1 day ago

What if you and your agent had all the data that always stays fresh? Structured, on demand, never stale. Introducing BigSet. Describe the data you need in plain English → get a structured dataset built from the live web, that refreshes regularly. It's live and open-source.

37

130

18

89

191K

Marktechpost retweeted

Marktechpost AI

@Marktechpost

2 days ago

MiniMax just released MiniMax M3 — and the architecture change alone is worth paying attention to. The most important element in it is MSA (MiniMax Sparse Attention). At 1 million tokens of context, M3's per-token compute is 1/20th of the previous generation. That's more than 9× faster prefill and more than 15× faster decoding at that context length. This is a meaningful infrastructure shift for devs running full-codebase agents or long-document pipelines Here's what's actually interesting about MiniMax M3: 1. Native multimodality from step 0 → Text, image, and video trained together from the start — not added post-training → Training data scaled to the order of 100 trillion tokens using interleaved formats → Supports image input, video input, and desktop computer operation 2. Coding benchmarks → 59.0% on SWE-Bench Pro (surpasses GPT-5.5 and Gemini 3.1 Pro) → 66.0% on Terminal-Bench 2.1 → 74.2% on MCP Atlas → 70.06% on OSWorld-Verified for computer use 3. Long-horizon autonomous iteration → M3 optimized an FP8 GEMM kernel on NVIDIA Hopper GPUs over 24 hours → 147 benchmark submissions, 1,959 tool calls, zero human intervention → Improved Hopper FP8 peak utilization from 7.6% to 71.3% — a 9.4× speedup 4. Access → API is live today at https://t.co/lrrwMPgq6B → Open weights and technical report committed within 10 days → Token Plan starts at $20/month (~1.7B M3 tokens) One thing to closely watch: PostTrainBench — the task of autonomously training models from scratch — scored 0.37, below Opus 4.7 (0.42) and GPT-5.5 (0.39). Worth keeping in context when evaluating M3 for ML research automation specifically. I covered the full technical breakdown: https://t.co/yxLeIRjK6T Details: https://t.co/ephFkY2Ec5 @MiniMax_AI

Marktechpost's tweet photo. MiniMax just released MiniMax M3 — and the architecture change alone is worth paying attention to.

The most important element in it is MSA (MiniMax Sparse Attention). At 1 million tokens of context, M3's per-token compute is 1/20th of the previous generation. That's more than 9× faster prefill and more than 15× faster decoding at that context length. This is a meaningful infrastructure shift for devs running full-codebase agents or long-document pipelines

Here's what's actually interesting about MiniMax M3:

1. Native multimodality from step 0 → Text, image, and video trained together from the start — not added post-training → Training data scaled to the order of 100 trillion tokens using interleaved formats → Supports image input, video input, and desktop computer operation

2. Coding benchmarks → 59.0% on SWE-Bench Pro (surpasses GPT-5.5 and Gemini 3.1 Pro) → 66.0% on Terminal-Bench 2.1 → 74.2% on MCP Atlas → 70.06% on OSWorld-Verified for computer use

3. Long-horizon autonomous iteration → M3 optimized an FP8 GEMM kernel on NVIDIA Hopper GPUs over 24 hours → 147 benchmark submissions, 1,959 tool calls, zero human intervention → Improved Hopper FP8 peak utilization from 7.6% to 71.3% — a 9.4× speedup

4. Access → API is live today at https://t.co/lrrwMPgq6B → Open weights and technical report committed within 10 days → Token Plan starts at $20/month (~1.7B M3 tokens)

One thing to closely watch: PostTrainBench — the task of autonomously training models from scratch — scored 0.37, below Opus 4.7 (0.42) and GPT-5.5 (0.39). Worth keeping in context when evaluating M3 for ML research automation specifically.

I covered the full technical breakdown: https://t.co/yxLeIRjK6T

Details: https://t.co/ephFkY2Ec5

@MiniMax_AI

1

30

4

2

73K

Marktechpost AI

@Marktechpost

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users