@mtschannen 🚀Gemma4 12B🚀
We made it great by training a simpler model.
No vision or audio encoders. Easier said than done.
Running exploratory experiments to a final model is always interesting. Joint work with @mtschannen@AndreasPSteiner@confusezius@kmisiunas and the whole Gemma team.
@mtschannen 🚀Gemma4 12B🚀
We made it great by training a simpler model.
No vision or audio encoders. Easier said than done.
Running exploratory experiments to a final model is always interesting. Joint work with @mtschannen@AndreasPSteiner@confusezius@kmisiunas and the whole Gemma team.
Check out our detailed report about *Jet* 🌊 - a simple, transformer-based normalizing flow architecture without bells and whistles.
Jet is an important part of JetFormer's engine ⚙️ As a standalone model it is very tame and behaves predictably (e.g. when scaling it up).
Making new simple things requires attention to detail. From numeric precision and unexpected bugs deep in the stack. But now there is a precedent which includes paper, numbers and code.
Hope it helps people go hammer some nails🔨
With some delay, JetFormer's *prequel* paper is finally out on arXiv: a radically simple ViT-based normalizing flow (NF) model that achieves SOTA results in its class.
Jet is one of the key components of JetFormer, deserving a standalone report. Let's unpack: 🧵⬇️
Welcome PaliGemma 2! 🤗
Google released PaliGemma 2, best vision language model family that comes in various sizes: 3B, 10B, 28B, based on Gemma 2 and SigLIP, comes with transformers support day-0 🎁
Saying this model is amazing would be an understatement, keep reading ✨
🚀🚀PaliGemma 2 is our updated and improved PaliGemma release using the Gemma 2 models and providing new pre-trained checkpoints for the full cross product of {224px,448px,896px} resolutions and {3B,10B,28B} model sizes.
1/7
🚀🚀PaliGemma 2 is our updated and improved PaliGemma release using the Gemma 2 models and providing new pre-trained checkpoints for the full cross product of {224px,448px,896px} resolutions and {3B,10B,28B} model sizes.
1/7
@YugeTen@__kolesnikov__ We already knew we would like it. But we didn't know how :) The NF comes with two properties: invertible and computable logdet. together they don't allow to cheat to map all latents to a trivial point and then obtain a perfect loss on the AR to model that trivial output.
Did you try to get an auto-regressive transformer to operate in a continuous latent space which is not fixed ahead of time but learned end to end from scratch?
Enter JetFormer: https://t.co/NaQzHGvezm -- joint work in a dream team: @mtschannen and @__kolesnikov__
Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)?
We have been pondering this during summer and developed a new model: JetFormer 🌊🤖
https://t.co/ngvPzZvUYW
A thread 👇
1/
Feels great to start adding diversity to the available pre-trained visual representations. Especially when it has considerable impact for problems with a smaller number of examples available or hard to collect.
We've looked into representation learning for #RemoteSensing with different datasets and fine-tuning using in-domain data. See paper with datasets and models included 🔋: https://t.co/TheaeAssWm with @ASusanoPinto, @XiaohuaZhai and @neilhoulsby.
We’re pleased to release the Visual Task Adaptation Benchmark (VTAB), a diverse, realistic, and challenging protocol to measure progress towards universal visual representations. Learn all about it below. https://t.co/PbORwSFPAg
Amazing article showing how accessible #DeepLearning is becoming. Model trained with transfer learning and "#TensorFlow For Poets" codelab +#tfhub. Converted to #TFLite and now deployed on International
Space Station🚀 - TensorFlow Lite is Going to Space - https://t.co/Jb8ykVxtRN
The BigGAN generators from our paper https://t.co/QUYlE9IBsE are now available on TF Hub (https://t.co/GHM9pIgQPw). Try the Colab demo at: https://t.co/Ynyb9T9AAD
We are launching a new web experience for TensorFlow Hub! Check out https://t.co/T8COqipES0 and explore our modules, including some new additions like the FasterRCNN for object detection.
Learn more on the post ↓ https://t.co/Et5NjpoW8X