Daniel Blasko @blskdan - Twitter Profile

blskdan retweeted

about 1 month ago

📢 @evanspiegel (CEO, @Snap) is headlining #AWE2026 Join us June 16 for his keynote: "Making Computing More Human." Plus, explore the latest in AI smartglasses, physical AI, and robotics. ⏳ Save $400. Early Bird ends May 7 🎫 https://t.co/K9bNbWoUlx #AWE2026 #XR #ISpatial

ARealityEvent's tweet photo. 📢 @evanspiegel (CEO, @Snap) is headlining #AWE2026

Join us June 16 for his keynote: "Making Computing More Human."

Plus, explore the latest in AI smartglasses, physical AI, and robotics.

⏳ Save $400. Early Bird ends May 7
🎫 https://t.co/K9bNbWoUlx

#AWE2026 #XR #ISpatial https://t.co/e4hZexBE3u

10

47

13

7

38K

blskdan retweeted

Snap Inc.

@Snap

2 months ago

Today, we announced a multi-year strategic agreement with @Qualcomm to power future generations of @Spectacles with Snapdragon XR platforms. https://t.co/RMQJi16kDd

Snap's tweet photo. Today, we announced a multi-year strategic agreement with @Qualcomm to power future generations of @Spectacles with Snapdragon XR platforms.

https://t.co/RMQJi16kDd https://t.co/4jJk7meXY8

20

145

42

17

28K

blskdan retweeted

Google AI Developers

@googleaidevs

6 months ago

Announcing FunctionGemma, a specialized version of our Gemma 3 270M model that’s fine-tuned for function calling ⚙️ The new release brings bespoke function calling to the edge, and is designed as a strong base for further training into custom, fast, private, local agents that translate natural language into executable API actions. https://t.co/nkfZAKgBMm

30

1K

170

426

181K

blskdan retweeted

Snap Inc.

@Snap

7 months ago

Snapchat + @perplexity_ai 🤝 Starting in early 2026, you’ll be able to ask questions, explore new ideas, and get credible answers right inside chat. AI that feels more personal, social, and fun! https://t.co/5J10blvzhR

Snap's tweet photo. Snapchat + @perplexity_ai 🤝

Starting in early 2026, you’ll be able to ask questions, explore new ideas, and get credible answers right inside chat.

AI that feels more personal, social, and fun! https://t.co/5J10blvzhR https://t.co/RJIYU4ZEic

77

1K

118

124

493K

Who to follow

Flor3nce Delattre

@FlorenceDelatt3

Développeuse Web fullstack chez Défi Informatique / Symfony/Angular. Titre pro de développeur web obtenu grâce à la formation O'clock 💖, promotion Rocket 🚀

Koba

@ShibaKoba_

🐕🐾| 23 | 🇫🇷 living in 🇪🇸

blskdan retweeted

about 1 year ago

Not all visual tokens are important. We present new work on efficient token selection driven by the text prompt in VLMs. We train a vision encoder in a CLIP-like setting with local/global contrastive loss. Once trained, the model can output a heatmap of interest given a text prompt. As a result, we achieve up to 4k x 4k working resolution in an efficient way. In the paper (https://t.co/xTkmn1CIEz), we present: • PS3: a method to scale CLIP-like models to 4k and above; • PS3-SigLIP-SO400M and PS3-C-RADIO-v2-L ViT models; • 4KPro: a benchmark for high-resolution VQA on 4k resolution for VLMs; • VILA-HD: extending VLM to 4k resolution with 1.9-3x better efficiency than using all patches, with an accuracy gain of 3.2%.

PavloMolchanov's tweet photo. Not all visual tokens are important. We present new work on efficient token selection driven by the text prompt in VLMs. We train a vision encoder in a CLIP-like setting with local/global contrastive loss. Once trained, the model can output a heatmap of interest given a text prompt. As a result, we achieve up to 4k x 4k working resolution in an efficient way.

In the paper (https://t.co/xTkmn1CIEz), we present:
• PS3: a method to scale CLIP-like models to 4k and above;
• PS3-SigLIP-SO400M and PS3-C-RADIO-v2-L ViT models;
• 4KPro: a benchmark for high-resolution VQA on 4k resolution for VLMs;
• VILA-HD: extending VLM to 4k resolution with 1.9-3x better efficiency than using all patches, with an accuracy gain of 3.2%.

1

338

52

183

26K

Daniel Blasko @blskdan

over 1 year ago

@AniC_dev Really cool stuff! Any way now or in the future to export the logs or a natural language/structured summary of them to use as context e.g. for further questioning?

1

0

90

blskdan retweeted

Alessio

@alessiograncini

over 1 year ago

@Spectacles Office Hours @SnapAR

0

5

2

0

704

blskdan retweeted

Xiaohua Zhai @XiaohuaZhai

over 1 year ago

Introducing SigLIP2: now trained with additional captioning and self-supervised losses! Stronger everywhere: - multilingual - cls. / ret. - localization - ocr - captioning / vqa Try it out, backward compatible! Models: https://t.co/3hOdqcy9QD Paper: https://t.co/Tp4D8Syld8

XiaohuaZhai's tweet photo. Introducing SigLIP2: now trained with additional captioning and self-supervised losses!

Stronger everywhere:
- multilingual
- cls. / ret.
- localization
- ocr
- captioning / vqa

Try it out, backward compatible!

Models: https://t.co/3hOdqcy9QD

Paper: https://t.co/Tp4D8Syld8 https://t.co/U2FC06iDIj

10

367

54

150

60K

blskdan retweeted

Andrew Curran

@AndrewCurran_

over 1 year ago

They did say they had built their own system, it's two models working together. S2 is a 7B VLM, S1 is an 80M(!) transformer.

6

117

13

21

8K

blskdan retweeted

merve

@mervenoyann

over 1 year ago

we just dropped SmolVLM2: world's smollest video models in 256M, 500M and 2.2B ⏯️🤗 we also release the following 🔥 > an iPhone app (runs on 500M model in MLX) > integration with VLC for segmentation of descriptions (2.2B) > a highlights extractor (2.2B)

21

739

121

385

37K

blskdan retweeted

Simo Ryu

@cloneofsimo

over 1 year ago

So if you are typical ML researcher, you had this question for eternity: "I want small, powerful model: Should we train large model and distill? Or should we train small model from scatch" This new Apple papers conclusion: Its complicated but maybe yes, depending on your budget. 1/n

cloneofsimo's tweet photo. So if you are typical ML researcher, you had this question for eternity:

"I want small, powerful model: Should we train large model and distill? Or should we train small model from scatch"

This new Apple papers conclusion:
Its complicated but maybe yes, depending on your budget.

1/n

10

1K

103

1K

123K

Daniel Blasko @blskdan

over 1 year ago

Neat approach to more flexible and steerable token-based image-generation! Seems to lead to noteworthy instruction- and task-level zero-shot capabilities https://t.co/kDtWCHDMkf

0

1

0

109

blskdan retweeted

Peter Tong

@TongPetersb

over 1 year ago

This project really changed how I think about multimodal models and LLMs. I used to believe that multimodal (visual) prediction required significant changes to the model and heavy pretraining, like Chameleon. But surprisingly, the opposite is true! In large autoregressive models, visual understanding and generation are closely linked and can be instruction-tuned directly from LLMs. The LLM is there waiting for vision. [1/7] Website: https://t.co/d35d5ZNGOf

9

468

91

450

130K

blskdan retweeted

Tim Brooks

@_tim_brooks

over 1 year ago

Gemini 2.0 Flash has native image outputs! Congrats to the awesome team that built it. I find the example at 1:15 super cool: to change the car's color and add beach gear, the model generates two images step-by-step using visual chain of thought. https://t.co/zLl14JUEG1

24

425

81

66

158K

blskdan retweeted

Andreas Steiner @AndreasPSteiner

over 1 year ago

🚀🚀PaliGemma 2 is our updated and improved PaliGemma release using the Gemma 2 models and providing new pre-trained checkpoints for the full cross product of {224px,448px,896px} resolutions and {3B,10B,28B} model sizes. 1/7

AndreasPSteiner's tweet photo. 🚀🚀PaliGemma 2 is our updated and improved PaliGemma release using the Gemma 2 models and providing new pre-trained checkpoints for the full cross product of {224px,448px,896px} resolutions and {3B,10B,28B} model sizes.

1/7 https://t.co/NGy3mMM7sD

4

262

52

100

62K

blskdan retweeted

Tali Dekel @talidekel

over 1 year ago

Understanding the inner workings of foundation models is key for unlocking their full potential. While the research community has explored this for LLMs, CLIP, and text-to-image models, it's time to turn our focus to VLMs. Let's dive in! 🌟 https://t.co/NtTrkZ6iWh

0

150

23

73

15K

blskdan retweeted

Justin Johnson

@jcjohnss

over 1 year ago

Today we're sharing our first research update @theworldlabs -- a generative model of 3D worlds! I'm super proud of what the team has achieved so far, and can't wait to see what comes next. Lifting GenAI to 3D will change the way we make media, from movies to games and more!

19

389

27

74

57K

blskdan retweeted

Avi

@AviSchiffmann

over 1 year ago

Truffle’s aesthetics are peak. Design that transcends utility and becomes ubiquitous furniture. Your goal should be to make movies that don’t feature your work look anachronistic. @iamgingertrash is on that path. An inspiration ⭐️⭐️⭐️⭐️⭐️

7

127

1

25

22K

blskdan retweeted

Pavankumar Vasu @PavankumarVasu

over 1 year ago

📢 Presenting our app for real-time zero-shot image classification using MobileCLIP! Fully open-source—code & models available for everyone to explore. Check it out here: https://t.co/hg08zPJSZB with - David Koski, Travis Trotto, Megan Maher Welsh & Hugues Thomas

PavankumarVasu's tweet photo. 📢 Presenting our app for real-time zero-shot image classification using MobileCLIP!

Fully open-source—code & models available for everyone to explore. Check it out here: https://t.co/hg08zPJSZB

with - David Koski, Travis Trotto, Megan Maher Welsh & Hugues Thomas https://t.co/DWsv6n1hz3

0

26

11

7K

blskdan retweeted

Alaa El-Nouby @alaa_nouby

over 1 year ago

𝗗𝗼𝗲𝘀 𝗮𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝗽𝗿𝗲-𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝘄𝗼𝗿𝗸 𝗳𝗼𝗿 𝘃𝗶𝘀𝗶𝗼𝗻? 🤔 Delighted to share AIMv2, a family of strong, scalable, and open vision encoders that excel at multimodal understanding, recognition, and grounding. https://t.co/LkkkSDWpJh (🧵)

4

154

27

48

27K

Daniel Blasko

@blskdan

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users