Taka Shinagawa

@blueviggen

Zen mind with millions of new & old ideas one by one

Wild West

Joined August 2015

5.3K Following

275 Followers

3.5K Posts

blueviggen retweeted

Google Gemma

@googlegemma

16 days ago

Meet DiffusionGemma! An experimental open model that explores a fast approach to text generation, released under an Apache 2.0 license. Moving beyond sequential, token-by-token processes to generate entire blocks of text simultaneously. Here’s what’s new with DiffusionGemma: 👇

166

810

956K

blueviggen retweeted

Andrej Karpathy

@karpathy

16 days ago

This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time. I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!

26K

blueviggen retweeted

vLLM

@vllm_project

16 days ago

Today we're excited to introduce vime — a simple, stable, and efficient RL framework for LLM post-training in the vLLM ecosystem. Built on slime's proven training design and powered by vLLM inference, vime brings another strong option to the growing vLLM post-training ecosystem. Our goal isn't a one-size-fits-all framework. We want users with different needs to find the right vLLM-ecosystem choice for their workflows—whether that's vime, NeMo RL, OpenRLHF, verl, or others. More choice. More interoperability. More innovation. Learn more: https://t.co/c3yfEewsWj #LLM #RLHF #PostTraining #vLLM

512

249

43K

blueviggen retweeted

Google Gemma

@googlegemma

23 days ago

Meet Gemma 4 12B! A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license. Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇

googlegemma's tweet photo. Meet Gemma 4 12B!

A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.

Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇 https://t.co/gf4FZv0WZb

402

12K

Who to follow

Die hard Miami Hurricanes fan. Born a Laker. Living in a Raider Nation 📙__📗Serving our country in 3 decades. 🎖👨🏽‍🚒👮🏽🕵🏽

blueviggen retweeted

about 1 month ago

KV cache shouldn't disappear every time vLLM restarts. With @novita_labs, we're sharing PegaFlow — a production-grade external KV cache service that plugs into vLLM through the external KV connector interface. PegaFlow runs as a standalone Rust daemon owning the host KV pool, SSD cache, and RDMA resources. vLLM workers attach via CUDA IPC + gRPC, and cache survives engine crashes, upgrades, and model switches. In production-oriented evaluations: 🚀 2.15× faster vLLM startup with a pre-warmed 500 GiB host pool 📈 56% higher throughput for 8 Qwen3-8B instances sharing one cache ⚡ 72% higher throughput for DeepSeek-V3.2 MLA TP8 (logical KV stored once, not per rank) 🌐 194 GB/s average remote-read throughput across nodes Three-level hierarchy: pinned DRAM, remote DRAM over RDMA, local SSD on io_uring. Integrates through the existing `kv_transfer_config` path — no vLLM source changes. 📖 https://t.co/rf2VmevP7J

287

155

30K

blueviggen retweeted

Omar Sanseviero

@osanseviero

about 1 month ago

Everything AI released at Google I/O 2026 - Gemini Omni Flash - Gemini 3.5 Flash (and in GA) - Antigravity 2.0 - Managed Agents in the Gemini API - AI Studio app in pre-order - New SynthID partnerships - AI Studio: native Android support, Workspace Integrations, and export to AGY - Antigravity SDK and CLI - Gemini Spark - New Google AI Ultra subscription And stay tuned, so much more to come!

284

12K

blueviggen retweeted

GPU MODE

@GPU_MODE

about 2 months ago

NVIDIA cuDNN team tomorrow at noon

113

blueviggen retweeted

Omar Sanseviero

@osanseviero

about 2 months ago

Gemma 4 was released just a few weeks ago. Since then, it has been downloaded over 50 million times and there are almost 1500 community-built models based on it. Exciting times ahead!

494

39K

blueviggen retweeted

Poolside

@poolsideai

about 2 months ago

Today we’re releasing Laguna XS.2, Poolside’s first open-weight model. It’s a 33B total / 3B active MoE model built for agentic coding and long-horizon tasks. Trained fully in-house on our own stack. Runs on a single GPU. Released under Apache 2.0. Links 👇 Weights: https://t.co/HSo8L2gM64 API: https://t.co/DMJtNFrace Blog: https://t.co/BXEjQxtQoV

poolsideai's tweet photo. Today we’re releasing Laguna XS.2, Poolside’s first open-weight model.
It’s a 33B total / 3B active MoE model built for agentic coding and long-horizon tasks.
Trained fully in-house on our own stack. Runs on a single GPU. Released under Apache 2.0.
Links 👇
Weights: https://t.co/HSo8L2gM64
API: https://t.co/DMJtNFrace
Blog: https://t.co/BXEjQxtQoV

807

141

377

275K

blueviggen retweeted

vLLM

@vllm_project

2 months ago

🎉 Day-0 support for @deepseek_ai V4 Pro and Flash on vLLM — a new generation of DeepSeek model, purpose-built for tasks up to 1M tokens. Alongside the release, we're publishing a first-principles walkthrough of the new long-context attention and how we implemented it in vLLM. The new attention mechanism, in four moves: • Shared K/V + inverse RoPE → 2× memory savings • c4a / c128a KV compression → 4×–128× savings • DeepSeek Sparse Attention over compressed tokens • Short sliding window for locality across compression boundaries At 1M context, per-layer KV state is ~8.7× smaller than a DeepSeek V3.2-style 61-layer stack (9.62 GiB vs 83.9 GiB, bf16). fp8 attention cache + fp4 indexer cache shrink it further. vLLM side: • Unified hybrid KV cache — single logical block size (256 native positions) across all compression rates; compressor state folded into the SWA KV cache spec so prefix caching, disagg prefill, CUDA graphs and MTP reuse the same abstraction • Three page-size buckets for the full 5-way cache stack → no cross-kind fragmentation • Fused kernels: compressor + RMSNorm + RoPE + cache insert (1.4–3×), inverse RoPE + fp8 quant (2–3×), Q-norm + KV RoPE + K insert (10–20×) • Multi-stream overlap of indexer vs main-KV compression vs SWA insertion Disaggregated serving is supported out of the box and strongly recommended for best performance. Follow our recipes site for verified commands for @nvidia Blackwell (B200, B300, GB200, GB300) and Hopper (H100/H200/H20) systems. Thanks to the @deepseek_ai team for open-sourcing DeepSeek V4, and to @inferact for landing day-0 support 🤝 📝 Blog: https://t.co/Eh7vk6xVJy 📖 Recipes: https://t.co/jlWuzYyZeX 🤗 https://t.co/IA9qAysqJk

vllm_project's tweet photo. 🎉 Day-0 support for @deepseek_ai V4 Pro and Flash on vLLM — a new generation of DeepSeek model, purpose-built for tasks up to 1M tokens. Alongside the release, we're publishing a first-principles walkthrough of the new long-context attention and how we implemented it in vLLM.

The new attention mechanism, in four moves:
• Shared K/V + inverse RoPE → 2× memory savings
• c4a / c128a KV compression → 4×–128× savings
• DeepSeek Sparse Attention over compressed tokens
• Short sliding window for locality across compression boundaries

At 1M context, per-layer KV state is ~8.7× smaller than a DeepSeek V3.2-style 61-layer stack (9.62 GiB vs 83.9 GiB, bf16). fp8 attention cache + fp4 indexer cache shrink it further.

vLLM side:
• Unified hybrid KV cache — single logical block size (256 native positions) across all compression rates; compressor state folded into the SWA KV cache spec so prefix caching, disagg prefill, CUDA graphs and MTP reuse the same abstraction
• Three page-size buckets for the full 5-way cache stack → no cross-kind fragmentation
• Fused kernels: compressor + RMSNorm + RoPE + cache insert (1.4–3×), inverse RoPE + fp8 quant (2–3×), Q-norm + KV RoPE + K insert (10–20×)
• Multi-stream overlap of indexer vs main-KV compression vs SWA insertion

Disaggregated serving is supported out of the box and strongly recommended for best performance.

Follow our recipes site for verified commands for @nvidia Blackwell (B200, B300, GB200, GB300) and Hopper (H100/H200/H20) systems.

Thanks to the @deepseek_ai team for open-sourcing DeepSeek V4, and to @inferact for landing day-0 support 🤝

📝 Blog: https://t.co/Eh7vk6xVJy
📖 Recipes: https://t.co/jlWuzYyZeX
🤗 https://t.co/IA9qAysqJk

572

140

125K

blueviggen retweeted

DeepSeek

@deepseek_ai

2 months ago

DeepSeek-V4-Pro 🔹 Enhanced Agentic Capabilities: Open-source SOTA in Agentic Coding benchmarks. 🔹 Rich World Knowledge: Leads all current open models, trailing only Gemini-3.1-Pro. 🔹 World-Class Reasoning: Beats all current open models in Math/STEM/Coding, rivaling top closed-source models. 2/n

deepseek_ai's tweet photo. DeepSeek-V4-Pro

🔹 Enhanced Agentic Capabilities: Open-source SOTA in Agentic Coding benchmarks.
🔹 Rich World Knowledge: Leads all current open models, trailing only Gemini-3.1-Pro.
🔹 World-Class Reasoning: Beats all current open models in Math/STEM/Coding, rivaling top closed-source models.

2/n

321

296

563K

blueviggen retweeted

DeepSeek

@deepseek_ai

2 months ago

🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length. 🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models. 🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice. Try it now at https://t.co/GCdiMzk1Dl via Expert Mode / Instant Mode. API is updated & available today! 📄 Tech Report: https://t.co/drlDrxkYtp 🤗 Open Weights: https://t.co/T13Y8i7SDM 1/n

deepseek_ai's tweet photo. 🚀 DeepSeek-V4 Preview is officially live & open-sourced! Welcome to the era of cost-effective 1M context length.

🔹 DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.
🔹 DeepSeek-V4-Flash: 284B total / 13B active params. Your fast, efficient, and economical choice.

Try it now at https://t.co/GCdiMzk1Dl via Expert Mode / Instant Mode. API is updated & available today!

📄 Tech Report: https://t.co/drlDrxkYtp
🤗 Open Weights: https://t.co/T13Y8i7SDM

1/n

46K

10K

10M

blueviggen retweeted

ollama

@ollama

2 months ago

Qwen 3.6 27B model is available on Ollama! Use it with all the integrations in Ollama or chat with the model. Chat with the model: ollama run qwen3.6:27b OpenClaw: ollama launch openclaw --model qwen3.6:27b Claude Code: ollama launch claude --model qwen3.6:27b More 👇👇👇

118

393

168K

blueviggen retweeted

👩‍💻 Paige Bailey

@DynamicWebPaige

2 months ago · San Francisco

Stunning views and just as jaw-dropping of presentations at the @cerebral_valley Gemma 4 Launch Party tonight! @vllm_project @UnslothAI @Ollama @huggingface @cactuscompute @apple @nvidia @pytorch and more all represented, this is such a beautiful open-source community. 🥹❤️

DynamicWebPaige's tweet photo. Stunning views and just as jaw-dropping of presentations at the @cerebral_valley Gemma 4 Launch Party tonight!

@vllm_project @UnslothAI @Ollama @huggingface @cactuscompute @apple @nvidia @pytorch and more all represented, this is such a beautiful open-source community. 🥹❤️ https://t.co/lAkYzbP2lM

16K

blueviggen retweeted

Claude

@claudeai

2 months ago

Introducing Claude Opus 4.7, our most capable Opus model yet. It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back. You can hand off your hardest work with less supervision.

claudeai's tweet photo. Introducing Claude Opus 4.7, our most capable Opus model yet.

It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.

You can hand off your hardest work with less supervision. https://t.co/PtlRdpQcG5

81K

10K

12K

14M

blueviggen retweeted

Omar Sanseviero

@osanseviero

3 months ago

Next week we're doing a special Gemma meetup in San Francisco 💎 If you have built something with Gemma 4 and would like to showcase it in person, send me a DM!

blueviggen retweeted

Qwen

@Alibaba_Qwen

3 months ago

（1/8）🚀 Introducing Qwen3.6-Plus: Towards Real-World Agents! 🤖 Today, we’re thrilled to drop a major milestone in our journey toward native multimodal agents. Here is what makes Qwen3.6-Plus a game-changer： 💻 Next-level Agentic Coding: Smarter, faster execution. 👁️ Enhanced Multimodal Vision: Sharper perception & reasoning. 🏆 Top-tier Performance: Maintaining leading general capabilities. 📚 1M Context Window: Available by default via our API. Built on your invaluable feedback from the Qwen3.5 era, we’re laying a rock-solid foundation for real-world devs. Get ready to experience truly transformative ✨ Vibe Coding ✨. Huge thanks to our community! Go try it out and show us what you can build. 👇 Chat: https://t.co/V7RmqMaVNZ API: https://t.co/937Qkc9AMy Blog: https://t.co/P0rJSxERND 🔔Noted：More Qwen3.6 models to come and be open-sourced! Stay tuned~ 👀#Qwen #AI #AgenticCoding #VibeCoding #Agents

Alibaba_Qwen's tweet photo. （1/8）🚀 Introducing Qwen3.6-Plus: Towards Real-World Agents! 🤖

Today, we’re thrilled to drop a major milestone in our journey toward native multimodal agents.

Here is what makes Qwen3.6-Plus a game-changer：
💻 Next-level Agentic Coding: Smarter, faster execution.
👁️ Enhanced Multimodal Vision: Sharper perception & reasoning.
🏆 Top-tier Performance: Maintaining leading general capabilities.
📚 1M Context Window: Available by default via our API.

Built on your invaluable feedback from the Qwen3.5 era, we’re laying a rock-solid foundation for real-world devs. Get ready to experience truly transformative ✨ Vibe Coding ✨.

Huge thanks to our community! Go try it out and show us what you can build. 👇

Chat: https://t.co/V7RmqMaVNZ
API: https://t.co/937Qkc9AMy
Blog: https://t.co/P0rJSxERND

🔔Noted：More Qwen3.6 models to come and be open-sourced! Stay tuned~ 👀#Qwen #AI #AgenticCoding #VibeCoding #Agents

240

654

blueviggen retweeted

Unsloth AI

@UnslothAI

3 months ago

We collaborated with @NVIDIA to teach you about Reinforcement Learning and RL environments. Learn: • Why RL environments matter + how to build them • When RL is better than SFT • GRPO and RL best practices • How verifiable rewards and RLVR work Blog: https://t.co/Jng3urMPyw

UnslothAI's tweet photo. We collaborated with @NVIDIA to teach you about Reinforcement Learning and RL environments.

Learn:
• Why RL environments matter + how to build them
• When RL is better than SFT
• GRPO and RL best practices
• How verifiable rewards and RLVR work

Blog: https://t.co/Jng3urMPyw https://t.co/CmEj1S3QAe

244

89K

blueviggen retweeted

NVIDIA HPC Developer

@NVIDIAHPCDev

4 months ago

🎉 CUDA 13.2 just dropped, and GPU programming just got simpler. This release expands CUDA Tile support to Ampere and Ada GPUs while delivering a stronger CUDA Python stack for cluster-scale workloads. What's new: ✅ Install cuTile Python directly from PyPI: pip install cuda-tile ✅ Enhanced CUDA Python profiling and debugging across Numba-CUDA flows and Nsight tools ✅ Modern CUDA C++ and refreshed math libraries optimized for AI and HPC kernels Ready to accelerate your workflows? 📝 Read the technical deep dive: https://t.co/pE5UcJZqXU

820

174

54K

blueviggen retweeted

Cloudflare @Cloudflare

4 months ago

We rebuilt Next.js in a week. No, really. The team ported the framework to run natively on Workers to prove what’s possible with edge-first architecture. Dive into the technical hurdles we solved to eliminate Node.js dependencies. https://t.co/GqYBiZ5Qum

172

522

Taka Shinagawa

@blueviggen

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users