DOJO @Dojo__0 - Twitter Profile

Dojo__0 retweeted

9 days ago

NEO-ov: Native one-vision model that learns pixel-word correspondence end-to-end—no external encoders, no adapters. Multi-image & video understanding without the modular Frankenstein seams. arXiv:2605.28820

0

4

2

1

491

Dojo__0 retweeted

Sophont

@SophontAI

25 days ago

We're excited to release Medmarks v1.0 + a technical report! This is an update to our Medmarks benchmark suite, the largest open-source automated suite for evaluating the medical capabilities of LLMs. We added 10 benchmarks (20→30) and 15 models (46→61) to the leaderboard!

SophontAI's tweet photo. We're excited to release Medmarks v1.0 + a technical report!

This is an update to our Medmarks benchmark suite, the largest open-source automated suite for evaluating the medical capabilities of LLMs.

We added 10 benchmarks (20→30) and 15 models (46→61) to the leaderboard! https://t.co/QAhQtzhRF5

3

93

25

43

41K

Dojo__0 retweeted

Odyssey @odysseyml

about 1 month ago

Why We Must Build World Models

15

352

40

167

23K

DOJO @Dojo__0

about 1 month ago

@initlayers Hmm well we use interruptible instances so we were always prepared for that. Also what services do you use

0

7

Who to follow

Immature

@Maths_mentor_

🏏 Official Fan Page of the 🦁 King of Elegance – Babar Azam 👑 | overing every cover drive, milestone & magic moment ✨ | #KingBabar #TeamGreen Mathematician

Aditya Kumar

@dev_Aditya07

➣ React.js | TypeScript | MERN ➣ Full Stack Developer ➣ GitHub - https://t.co/MnBlt45Oeg ➣ Portfolio - https://t.co/sIRiB2fcJq

Javier Cenon Lenida

@DeathStorm95

DOJO @Dojo__0

about 1 month ago

@gabriberton There was also cross MAE too which decided to just reconstruct the masked tokens

0

139

Dojo__0 retweeted

Gabriele Berton

@gabriberton

about 1 month ago

A team of cracked @GoogleDeepMind colleagues just released Vision Banana A brief thread about Vision Banana, what it means for the future of AI, and the future of image understanding 🧵

gabriberton's tweet photo. A team of cracked @GoogleDeepMind colleagues just released Vision Banana

A brief thread about Vision Banana, what it means for the future of AI, and the future of image understanding 🧵 https://t.co/K2UBjb9iir

3

69

6

31

6K

Dojo__0 retweeted

Massimiliano Viola ✈️ CVPR @massiviola01

about 2 months ago

TIPS and TIPSv2🚨 Last week, @GoogleDeepMind released TIPSv2, their latest suite of image-text encoders with spatial awareness. You already know the deal here: fancy feature maps, SOTA along many tasks and benchmarks, and open source for all of us to use. But what makes this line of work special?

massiviola01's tweet photo. TIPS and TIPSv2🚨

Last week, @GoogleDeepMind released TIPSv2, their latest suite of image-text encoders with spatial awareness.

You already know the deal here: fancy feature maps, SOTA along many tasks and benchmarks, and open source for all of us to use.

But what makes this line of work special?

1

125

17

81

7K

Dojo__0 retweeted

Kevis-Kokitsi Maninis @kmaninis

about 1 year ago

📢📢 We released checkpoints and Pytorch/Jax code for TIPS: https://t.co/0JUIRML8gr Paper updated with distilled models, and more: https://t.co/zebYMD0VFz #ICLR2025

2

15

3

2K

Dojo__0 retweeted

Alpha Defense™🇮🇳

@alpha_defense

2 months ago

Losses are usually part of combat, and it is fine to lose a few jets in pursuit of goals on the battlefield. But what makes it super embarrassing for United States is the loudmouth president who claimed that “we have absolute control over the battlefield and that the Iranians are nowhere, with a complete loss of their air defence radar network, air force, and navy.”

alpha_defense's tweet photo. Losses are usually part of combat, and it is fine to lose a few jets in pursuit of goals on the battlefield. But what makes it super embarrassing for United States is the loudmouth president who claimed that “we have absolute control over the battlefield and that the Iranians are nowhere, with a complete loss of their air defence radar network, air force, and navy.”

47

2K

221

55

72K

Dojo__0 retweeted

William Yijiang Li

@Williamiumli

2 months ago

🚨 New paper alert !! 🎥 Video VLMs are strong at high-level semantics and long-range temporal understanding. 🧠 JEPA is almost the opposite: better at dense, high-frequency dynamics, local physical consistency, and fast corrective control, but are less suited for rich semantic reasoning and long-horizon reasoning. We try to get the best of both: 🧩 A VLM as a cortex-like reasoner for semantics and long-horizon planning ⚡ A JEPA branch as a cerebellum-like controller for fine-grained dynamics, physical consistency, and rapid corrections Proudly, we present ThinkJEPA: a VLM-guided latent world model that FiLM-fuse the pyramid repr of VLMs encoding long-horizon semantic reasoning into the JEPA repr for fine-grained, physically consistent dynamics prediction. 🔗 Project: https://t.co/quro6Pf8un 📄 Paper: https://t.co/yO5rv3ZJT7

Williamiumli's tweet photo. 🚨 New paper alert !!

🎥 Video VLMs are strong at high-level semantics and long-range temporal understanding.

🧠 JEPA is almost the opposite: better at dense, high-frequency dynamics, local physical consistency, and fast corrective control, but are less suited for rich semantic reasoning and long-horizon reasoning.

We try to get the best of both:
🧩 A VLM as a cortex-like reasoner for semantics and long-horizon planning
⚡ A JEPA branch as a cerebellum-like controller for fine-grained dynamics, physical consistency, and rapid corrections

Proudly, we present ThinkJEPA: a VLM-guided latent world model that FiLM-fuse the pyramid repr of VLMs encoding long-horizon semantic reasoning into the JEPA repr for fine-grained, physically consistent dynamics prediction.

🔗 Project: https://t.co/quro6Pf8un
📄 Paper: https://t.co/yO5rv3ZJT7

7

359

67

236

18K

DOJO @Dojo__0

2 months ago

@Williamiumli I went through this paper yesterday it was great. Altho it'd be amazing to see whether replacing the vlm with a jepa pre trained on sparse frames for long distance reasoning would work too or not

1

0

58

Dojo__0 retweeted

Quentin Le Lidec @quentinlldc

2 months ago

🚀 LeWorldModel datasets & checkpoints are now available on Hugging Face! https://t.co/aiBkDTsNyX You can plug them directly into stable-worldmodel (https://t.co/2eQB7Q0l9i), the engine behind LeWorldModel, to instantly load, run, and start building on top of our models.

9

302

33

192

55K

Dojo__0 retweeted

DailyPapers

@HuggingPapers

2 months ago

Meta just released the Efficient Universal Perception Encoder on Hugging Face A vision backbone for edge devices that unifies image understanding, vision-language modeling, and dense prediction via multi-teacher distillation.

HuggingPapers's tweet photo. Meta just released the Efficient Universal Perception Encoder on Hugging Face

A vision backbone for edge devices that unifies image understanding, vision-language modeling, and dense prediction via multi-teacher distillation. https://t.co/qnF84e5t09

8

223

29

166

24K

DOJO @Dojo__0

2 months ago

@massiviola01 I have been thinking a lot between clip and dino features obviously clip features are much more linguistically aligned and thus they don't capture patch features due to global supervision of image caption. https://t.co/MCXpHMlYQd this was something i read recently a little old.

0

1

0

41

DOJO @Dojo__0

3 months ago

@lucasmaes_ Its so good bro! going to play around with the models

0

1

0

253

Dojo__0 retweeted

Lucas Maes

@lucasmaes_

3 months ago

JEPA are finally easy to train end-to-end without any tricks! Excited to introduce LeWorldModel: a stable, end-to-end JEPA that learns world models directly from pixels, no heuristics. 15M params, 1 GPU, and full planning <1 second. 📑: https://t.co/cpTzgvbTS0

109

4K

560

4K

951K

Dojo__0 retweeted

Tanishq Kumar

@tanishqkumar07

3 months ago

I've been working on a new LLM inference algorithm. It's called Speculative Speculative Decoding (SSD) and it's up to 2x faster than the strongest inference engines in the world. Collab w/ @tri_dao @avnermay. Details in thread.

135

4K

454

3K

612K

DOJO @Dojo__0

3 months ago

@initlayers If you're interested in local attention do read these papers one is "Hiera: Hierarchical transformers without the bells & whistle" then another paper I belive it was window attention is bugged

1

2

0

16

DOJO @Dojo__0

3 months ago

@initlayers HR NET is based on multi scale feature extraction while DA Net is based on attention between the channels and spatial features

1

0

8

DOJO

@Dojo__0

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users