🇬🇧 London, June 10.
@vllm_project & @_llm_d_ Inference Meetup, hosted by Red Hat AI, @nvidia, and @SteliaAI.
Talks on vLLM updates, speculative decoding, llm-d in production, AI safety, and more.
Plus food, drinks, and the people building this stuff. https://t.co/QC4d1yKPbc
Congrats to the @googlegemma team on the Gemma 4 12B launch 🎉 Day-0 support on vLLM is ready to go.
It's an encoder-free unified multimodal model — text, image, audio, and video all project straight into the LLM's embedding space, no separate vision or audio towers. 256K context, built-in thinking, native tool calling.
Reasoning + tool parsers (`gemma4`), vision, and audio all served through the OpenAI-compatible API.
🔗 Recipe: https://t.co/MGJcoQkwzz
This one has been in the works for a while. @cedricclyburn teaching LLM inference, compression, and benchmarking with @vllm_project -- free course with @DeepLearningAI. Proud of this one.
New short course: Fast & Efficient LLM Inference with vLLM, built in partnership with @RedHat and taught by @cedricclyburn.
Learn to quantize an open-source LLM, serve it with vLLM, and benchmark your deployment across speed, cost, and accuracy.
Free to enroll: https://t.co/czVwJBnLZ6
🇬🇧 London, June 10.
@vllm_project & @_llm_d_ Inference Meetup hosted by Red Hat AI, @nvidia, and @SteliaAI at Sustainable Ventures, County Hall.
On the agenda: vLLM project update, speculative decoding, llm-d in production, and AI safety evaluation.
https://t.co/QC4d1yKPbc
Love seeing the work @RedHat_AI and @vllm_project are doing to make Laguna XS.2 easier to run.
Red Hat AI trained a DFlash speculator: a 0.6B drafter that predicts 8 tokens per pass, with Laguna verifying the output.
So builders get faster generation without changing output quality.
With vLLM support and FP8/NVFP4/INT4 checkpoints through LLM Compressor, it’s also easier to tune for different latency, memory, and hardware constraints.
Grateful for the team building the infra that makes open models easier to use, serve, and improve!
🇹🇷 Istanbul, 17 June.
@vllm_project & @_llm_d_ meetup hosted by Red Hat AI, @nvidia, and BeyondGuard at İTÜ Taşkışla.
On the agenda: vLLM project update, distributed inference, speculative decoding, securing vLLM in production, live demos, and more.
https://t.co/1ZidCjwPdS
🇬🇧 London, June 10.
@vllm_project & @_llm_d_ Inference Meetup hosted by Red Hat AI, @nvidia, and @SteliaAI at Sustainable Ventures, County Hall.
On the agenda: vLLM project update, speculative decoding, llm-d in production, and AI safety evaluation.
https://t.co/QC4d1yKPbc
Red Hat and @NVIDIA are integrating NVIDIA OpenShell into the full-stack @RedHat_AI platform.
The work brings oversight and policy to the infrastructure level, while contributing to the open source OpenShell project to standardize how agents are governed on enterprise platforms.
Learn more: https://t.co/NiGNNWDIWF
We’ve open sourced all aspects of our stack for training SOTA speculators like DFlash, and keep publishing our own checkpoints validating it for everyone to benefit and learn from. Check out the latest ones for Laguna!
Laguna XS.2 from @poolsideai is a 33B MoE built for agentic coding.
Red Hat AI trained a DFlash speculator for it: 0.6B drafter, 8 tokens per pass, no quality loss.
FP8, NVFP4, and INT4 checkpoints via LLM Compressor.
Models in comments. Speedup with @vllm_project:
Speculators v0.5.0 just dropped with 3 big updates:
- DFlash training support. Draft all tokens in one pass via block diffusion
- Unified online/offline training powered by @vllm_project's hidden states extraction system
- Docs & tutorials overhaul for faster onboarding
https://t.co/tCMZIrgQf2
Can you run Gen AI workloads without GPUs with @vllm_project?
Join vLLM Office Hours today at 2PM ET to learn what’s new in vLLM v0.21.0 from @mgoin_, followed by a deep dive from Intel on running AI use cases on Intel Xeon CPUs.
Get a 🗓️ invite: https://t.co/X8hAHYR3rl
Red Hat supports any model, on any accelerator, in any cloud. Now, we’re adding "any agent" to that list with @RedHat_AI 3.4. Control your AI journey with a foundation built for scale and security: https://t.co/SFALA9WpV7 #RHSummit