Evan Kim @evnkimm - Twitter Profile

Pinned Tweet

3 months ago

How do you train compute-optimal novel view synthesis models? In our CVPR ‘26 paper Scaling View Synthesis Transformers, we uncover key design choices through scaling and careful ablations--and along the way train a new SoTA with 3x less compute. (1/n)

evnkimm's tweet photo. How do you train compute-optimal novel view synthesis models?

In our CVPR ‘26 paper Scaling View Synthesis Transformers, we uncover key design choices through scaling and careful ablations--and along the way train a new SoTA with 3x less compute. (1/n) https://t.co/4PTutG84pE

13

171

18

88

35K

evnkimm retweeted

Harish Krishnakumar

@harishkrik

9 days ago

Today’s vision benchmarks suggest VLMs are nearing saturation, but real-world visual understanding is far from solved. Introducing WorldBench: 2,000 hand-written, human-verified VQA questions focused on visual diversity and designed to be challenging for frontier models. Gemini-3.1-Pro leads with just 64.0% accuracy. (1/10)

harishkrik's tweet photo. Today’s vision benchmarks suggest VLMs are nearing saturation, but real-world visual understanding is far from solved.

Introducing WorldBench: 2,000 hand-written, human-verified VQA questions focused on visual diversity and designed to be challenging for frontier models. Gemini-3.1-Pro leads with just 64.0% accuracy. (1/10)

4

79

18

38

37K

evnkimm retweeted

Tanvir Bhathal

@BhathalTanvir0

19 days ago

Super excited to release OpenJarvis! Start using it for the most functional, complete, secure, and ipw-efficient personal agent experience!

77

186

40

60

58K

evnkimm retweeted

Danial Hosseintabar

@danialgorithm

3 months ago

What if every image in your training set is corrupted, masked, blurred, or compressed, and you don’t have any clean data points? This is often the case in many areas like MRI scans, satellite images, and many datasets in the real world. Can you still train a diffusion model and recover the clean distribution? Yes, as long as the corruption channel is known and invertible on distribution level. We introduce DiffEM, a framework to do this with diffusion models. 🧵 (1/n)

danialgorithm's tweet photo. What if every image in your training set is corrupted, masked, blurred, or compressed, and you don’t have any clean data points? This is often the case in many areas like MRI scans, satellite images, and many datasets in the real world.
Can you still train a diffusion model and recover the clean distribution?
Yes, as long as the corruption channel is known and invertible on distribution level. We introduce DiffEM, a framework to do this with diffusion models. 🧵 (1/n)

2

126

18

112

13K

evnkimm retweeted

Matthew Noto

@matthewnoto73

3 months ago

Dreamverse is our AI video engine that generates a video faster than you can watch it. 30s of 1080p video in 4.5 seconds. One GPU. Real-time editing. This is vibe-directing. https://t.co/HEfd8cupc6

12

42

14

3

7K

evnkimm retweeted

Matthew Noto

@matthewnoto73

3 months ago

⚡️We built a new real-time inference stack in FastVideo and have the fastest 1080p TI2AV (text + image to audio and video) pipeline ever. Create a 5 s 1080p video with audio in ~4.55 s on a single GPU! High-quality video generation must be fast to be truly interactive. The only limit in creative workflows should be your imagination. If you have the need for speed (and quality), make video generation go blurrr (for free) at https://t.co/U9LLNQuISI and create whatever you can imagine…

10

48

17

13

8K

Evan Kim @evnkimm

3 months ago

@yuseungleee thanks Yuseung!!

0

1

0

257

Evan Kim @evnkimm

3 months ago

How do you train compute-optimal novel view synthesis models? In our CVPR ‘26 paper Scaling View Synthesis Transformers, we uncover key design choices through scaling and careful ablations--and along the way train a new SoTA with 3x less compute. (1/n)

13

171

18

88

35K

Evan Kim @evnkimm

3 months ago

@AdamZweiger thanks adam!

0

360

evnkimm retweeted

Zhenjun Zhao @zhenjun_zhao

4 months ago

Scaling View Synthesis Transformers Evan Kim, Hyunwoo Ryu, Thomas W. Mitchel, @vincesitzmann tl;dr: ncoder-decoder+effective batch size->scaling good! https://t.co/udskkkzYOb

zhenjun_zhao's tweet photo. Scaling View Synthesis Transformers

Evan Kim, Hyunwoo Ryu, Thomas W. Mitchel, @vincesitzmann

tl;dr: ncoder-decoder+effective batch size->scaling good!

https://t.co/udskkkzYOb https://t.co/QwuTY6hgaf

0

83

14

44

5K

evnkimm retweeted

Vincent Sitzmann

@vincesitzmann

3 months ago

Evan is an undergraduate researcher in my group, and within less than a year put together a really cool paper on the scaling laws of novel view synthesis - surprisingly, he found an encoder-decoder model that actually scales *better* than a decoder-only LVSM model!

2

189

6

114

22K

Evan Kim @evnkimm

3 months ago

Nice reminder that it's about the inductive biases which scale not zero inductive biases :)

Takeru Miyato

@takeru_miyato

3 months ago

Glad to see GTA mentioned here — nice to see camera-relative attention actually working well at scale.

1

12

0

3

4K

0

12

2

1

2K

Evan Kim @evnkimm

3 months ago

This is work with @RyuHyunwoooo @twmitchel and @vincesitzmann ! (n/n) 📄 Paper: https://t.co/CujoTH0lwD 💻 Code: https://t.co/hoatPlf5En 🎥 Website: https://t.co/2YpziNdZ7r

1

25

2

4

1K

Evan Kim @evnkimm

3 months ago

In sum: cross attention works, treat B*Vt as your batch size, fixed-size encodings don’t scale, and PRoPE/GTA work great. Our unidirectional finding also nicely mirrors the scalability of causal attention in language modeling! (7/n)

evnkimm's tweet photo. In sum: cross attention works, treat B*Vt as your batch size, fixed-size encodings don’t scale, and PRoPE/GTA work great. Our unidirectional finding also nicely mirrors the scalability of causal attention in language modeling! (7/n) https://t.co/m11o3yRuQ7

1

17

1

2

1K

Evan Kim

@evnkimm

Last Seen Users on Sotwe

Trends for you

Most Popular Users