AI/ML Advocate @googlecloud | Vertex AI dude | Research, Open Models, Ray & TPU | Instructor @DeepLearningAI | Startup Advisor @ycombinator x Google Cloud
Presenting at the Next '26 Developer Keynote is one of those moments I'll remember for a long time.
Thank you to everyone who played a part.
Till the next Next!
6/10開催のAnthropic主催イベントCode w/ Claudeでは、 @ivnardini と私で"Building with Claude on GoogleCloud"を担当します。すでにオンサイトは満席ですが、オンライン視聴可能なのでぜひ。
https://t.co/bIWi3Xmvee
With v2.1.158, Anthropic shipped Auto mode in Claude Code with Google Cloud
You can now run commands in Claude Code using Claude models on Google Cloud without stopping for permission prompts every time
https://t.co/iDHnBLoqko
Next Friday we are running a hands-on Claude Code on Google Cloud workshop together with the @AnthropicAI team in SF
Half day, Guided labs, and Live Q&A
Link
https://t.co/Lgi7B11HuB
I looked into Keras Kinetic recently
Keras Kinetic is a framework that lets you run Keras and JAX workloads on Cloud TPUs by writing a training function and adding a decorator
Personally, it is one of the easiest ways I’ve seen to run a first TPU job so far
Here is a great blog post on fine tuning Gemma to speak Gen-Z slang using Kinetic
Blog
https://t.co/lqZu5MCI6U
I spent some time testing elastic training capabilities on MaxText recently.
MaxText is Google’s open-source JAX library for the full LLM lifecycle scaling from one host to hundreds of TPU chips.
Pre-train with train method, run SFT/DPO/GRPO in the same package, and serve via vLLM.
It supports several models including Gemma, DeepSeek, Qwen, Kimi and more.
Docs
https://t.co/ppOa6xUMu9
Tutorial coming soon.
Ray Serve now supports multi-host TPU slice deployments with gang scheduling.
Before, TPU slices required manual host counts and bundle replication, with no guarantee of a single co-located slice.
Now, Ray Serve uses Ray Core’s SlicePlacementGroup to pin deployments to one co-located TPU slice, matching Ray Train.
Code
https://t.co/YUSuQ27ZGe
Anthropic released the public beta of Cowork on Third-Party Providers (3P)
Claude Desktop with Cowork and Code can now run using your own Google Cloud endpoint, billed as token consumption to your GCP project.
Docs
https://t.co/D5upoOpDzb
vLLM v0.19.1 shipped a bunch of optimizations and fixes for Gemma 4
> Gemma 4 MoE quantization support
> Eagle3 speculative decoding for faster inference
> Streaming and tool-call bug fixes for production applications
Vertex AI Agent Engine Memory Bank just landed two features I’ve been looking for.
You can now push events yourself and decide when memories get generated.
Before, agent memory was passive. You knew conversations were flowing in, but you didn’t know when extraction happened.
Now you have
> ingest events method lets you push raw turns in per user (and force_flush if you want it now)
> generation trigger config sets idle-duration, fixed-interval, and event-count rules
Code
https://t.co/I5vRgZR8Y7
Claude Code adds 1-hour prompt cache support for Vertex AI.
Following interactions are now cheaper for long-running agentic coding sessions.
Under the hood, it is the ttl field on cache_control field:
{"cache_control": {"type": "ephemeral", "ttl": "1h"}}
Documentation
https://t.co/v89XgWhaLZ