mlx-vlm now working for me w/ @googlegemma gemma-4-12b-it for my text + vision use cases🚀
@ollama was text only for some reason on my mac (need vision features), but possible oversight on my part 🤷
mlx-vlm now working for me w/ @googlegemma gemma-4-12b-it for my text + vision use cases🚀
@ollama was text only for some reason on my mac (need vision features), but possible oversight on my part 🤷
@Google@googlegemma "The 12B model is fine, but the only 12B builds Ollama publishes are MLX, and all MLX gemma 4 tags are text-only"
going to look for another approach
@Google@googlegemma@Google Curious how I ask questions / get feedback for how I’m using @googlegemma from Google team?
Want to make sure the use case is ideally aligned 🤘
Our new unified architecture allows Gemma 4 12B to process multimodal inputs natively. Here's how ⬇️
Traditional models rely on separate encoders for images and audio. This adds latency and increases memory usage. So we streamlined this:
👁️ Vision: We took a novel approach to replace the encoder with a lightweight embedding module, letting the LLM backbone take over visual processing.
🎙️ Audio: We removed the encoder entirely, projecting the raw audio signal directly into the same space as text tokens.
I got to spend all day today with Jensen in Taiwan: talking with thousands of engineers and eating street food at a night market. Jensen is received as a rockstar in Taiwan, like it's Beatles in the 60's. It's mind-blowing and fun to watch. But most importantly, through all the interactions and all my conversations with him, he remained the same humble, kind, thoughtful, funny guy he always was, even as a kid who went to these same night markets many years ago.
Btw, we tried a crazy amount of different street food. It's legit some of the most delicious food I've ever had. I can't wait to share video of it, including a ton of our conversations and hangout. When I can pause for a moment from all the travel to edit the video, I'll post it.
Can't wait to continue talking to Jensen and engineers at Computex this week, and exploring more of Taiwan, and of course roaming the night markets for some more delicious street food.
Days like these, even more than usual, I feel like the luckiest kid in the world.
Love you all! ❤️
@steren@GoogleAIStudio 😆 did the same with my daughter. She wanted hiking goals to race with Frodo for steps and the apps out there were broken / half baked.
She now has it connected to her iPhone steps and can customize it to her hearts content ❤️
@lucasmaes_ I'm curious thoughts on a choice here @lucasmaes_
PyAV: one dependency, ships prebuilt arm64 wheels (bundled ffmpeg), does encode and decode.
decord: decode-only and historically painful to build on Apple Silicon (needs the eva-decord fork)
@joshwoodward@GeminiApp Pretty impressed with @GeminiApp 3.5 flash so far. Still using other models based on use case but tokens value/ performance is solid 👍