People go mental when someone turns off a TV showing a football match, but when they print 80% of all money in history and wipe away most of their wealth they do nothing.
If you are using a domain name for personal email, a hobby, or some kind of other personal or family use, you NEED to move it out of @GoDaddy immediately:
we discovered a great use case for bitchat at a conference: panel moderator on the cypherpunk stage used it to collect questions from the audience. it worked great.
no registration, no accounts, no qr codes, just pure mesh q&a
antes de la pandemia ya teníamos algo en el mapa, en su forma primitiva, pero no había la motivación por "la falta de la banca tradicional" https://t.co/pH7zykyAK2
el efecto inevitable de la automatización sobre la dimensión del cubículo o cómo sobran las labores burocráticas cuando vas a migrar al UBI o todos ya firmaron, mejor evítate broncas para que te paguen antes y me llega más de bono
https://t.co/iXy7m8nf5x
@DrNickA maybe it has a lot to do with training and invaluable weights, id is many years ahead of implementation and h̶i̶g̶h̶ entropy is beaten by a couple of minutes and the coins in your pocket
https://t.co/eIXyw45Kga
@AnnaRRose@worldnetwork iridiology works with a naked eye, reading an iris with hi-res cameras and fine tuned LLMs should deliver much more than a proof of humanity.
https://t.co/YJ33pW8xeL
https://t.co/OxgsUvwUO1
Today, we share the tech report for SmolVLM: Redefining small and efficient multimodal models.
🔥 Explaining how to design a tiny 256M VLM that uses less than 1GB of RAM and outperforms our 80B models from 18 months ago!
Here are the coolest insights from our experiments:
✨ Longer context = Big wins: Increasing the context length from 2K to 16K gave our tiny VLMs a 60% performance boost!
✨ Smaller is smarter with SigLIP: Surprise! Smaller LLMs didn't benefit from the usual large SigLIP (400M). Instead, we use the 80M base SigLIP that performs equally well at just 20% of the original size!
✨ Pixel shuffling magic: Aggressively pixel shuffling helped our compact VLMs "see" better, achieving the same performance with sequences 16x shorter!
✨ Learned positional tokens FTW: For compact models, learned positional tokens significantly outperform raw text tokens, enhancing efficiency and accuracy.
✨ System prompts and special tokens are key: Introducing system prompts and dedicated media intro/outro tokens significantly boosted our compact VLM’s performance—especially for video tasks.
✨ Less CoT, more efficiency: Turns out, too much Chain-of-Thought (CoT) data actually hurts performance in small models. They dumb
✨ Longer videos, better results: Increasing video length during training enhanced performance on both video and image tasks.
🌟 State-of-the-Art Performance, SmolVLM comes in three powerful yet compact sizes—256M, 500M, and 2.2B parameters—each setting new SOTA benchmarks for their hardware constraints in image and video understanding.
📱 Real-world Efficiency: We've created an app using SmolVLM on an iPhone 15 and got real-time inference directly from its camera!
🌐 Browser-based Inference? Yep! We get lightning-fast inference speeds of 40-80 tokens per second directly in a web browser. No tricks, just compact, efficient models!
If you’re into efficient multimodal models, you’ll love this one.