Will Harvey @willarvey - Twitter Profile

2 months ago

🎥🪄 What should happen when you remove an object from a video? Example 1: A domino chain is falling → remove the middle blocks → the last block should remain standing Example 2: Two cars are about to crash → remove one car → the other should drive away 🚙 Current video object removal models fail at these dynamic scenarios. We introduce VOID: a model that removes objects and updates the scene as if they were never there. 🏆 Preferred 64.8% of the time vs Runway Aleph, Gen-Omnimatte, ProPainter, and more. 🌐 Project page: https://t.co/PBAWjuwUea 💻 GitHub: https://t.co/nYTv4miPSt 🤗 Demo: https://t.co/9DZpYCBUeN 📄 arXiv: https://t.co/UymkQC6Yku w/ @willarvey @ZhuoningYuan @ChengTim0708 and collaborators at @NetflixResearch and @INSAITinstitute

3

64

8

21

5K

willarvey retweeted

Christian Weilbach [email protected] @wh1lo

over 2 years ago

It is @NeurIPS time again! I am excited to present our trans-dimensional jump diffusion work with @AndrewC_ML @willarvey @ValentinDeBort1 @tom_rainforth and @ArnaudDoucet1 ! Come over on Thursday 2nd poster session, https://t.co/sHFojNAZXp. https://t.co/MR1SnLV6k2 #NeurIPS2023

1

16

5

2K

Will Harvey @willarvey

about 3 years ago

This was a lot of fun to work on! And works well with test-time guidance: we can train on varying-length RoboDesk videos and then, at test-time, fix the first and last frames and automatically figure out how far apart they are - i.e. how long the robot needs to move between them!

willarvey's tweet photo. This was a lot of fun to work on! And works well with test-time guidance: we can train on varying-length RoboDesk videos and then, at test-time, fix the first and last frames and automatically figure out how far apart they are - i.e. how long the robot needs to move between them! https://t.co/DKw9yD2aXy

Andrew Campbell @AndrewC_ML

about 3 years ago

How can we apply diffusion models to data with varying dimensionality? We use jump diffusions to simultaneously generate the size and state values for varying size data e.g. molecules https://t.co/99SvKR0NZs w/ @willarvey @wh1lo @ValentinDeBort1 @tom_rainforth @ArnaudDoucet1

0

92

22

27

9K

0

4

0

544

willarvey retweeted

Sander Dieleman

@sedielem

about 4 years ago

This paper is a goldmine for anyone training diffusion models, carefully picking apart theory and practice and showing which choices really matter. I was quite excited to see the authors of the StyleGAN series of papers tackle this topic, and boy do they deliver!

sedielem's tweet photo. This paper is a goldmine for anyone training diffusion models, carefully picking apart theory and practice and showing which choices really matter.

I was quite excited to see the authors of the StyleGAN series of papers tackle this topic, and boy do they deliver! https://t.co/Qnw0EU7D6i

1

582

106

223

0

Who to follow

Andrew Campbell

@AndrewC_ML

Research Scientist, Google DeepMind. Previous: @Xaira_Thera, PhD @oxcsml

Jose Miguel Hernández-Lobato

@jmhernandez233

Professor of Machine Learning, University of Cambridge, UK.

Ricky T. Q. Chen

@RickyTQChen

Research Scientist. Meta. I build simplified abstractions of the world through the lens of dynamics and flows.

Will Harvey @willarvey

about 4 years ago

@sirbayes @frankdonaldwood @sama @demishassabis @ylecun We know :) We cite Video Diffusion Models heavily in the paper (https://t.co/HYAJs6UKT7) but focus on long-term coherence, jointly generating frames up to 1000 timesteps apart (instead of 64 like the Google work). Anyone at google looking into scaling that model to longer videos?

0

2

0

1

0

Will Harvey @willarvey

about 4 years ago

@jekbradbury @frankdonaldwood Definitely sounds interesting, will be in touch!

0

1

0

Will Harvey @willarvey

about 4 years ago

Thanks for the shout out @frankdonaldwood - the videos still have occasional glitches but are much better after scaling from training on 1 GPU to 4 GPUs. Simply scaling further might be the right direction to take

Frank Wood

@frankdonaldwood

about 4 years ago

I think, much more than large language models, this work might be the first glimpse of what the foundation model for vision-based planning for embodied real-world AGI might look like. @sama, @demishassabis, @ylecun who is going to scale this first? https://t.co/jzkoU8l6Tx

11

321

62

111

0

4

16

3

2

0

Will Harvey @willarvey

about 4 years ago

@tejasdkulkarni @frankdonaldwood @sama @demishassabis @ylecun Maybe we can improve object/landmark permanence by conditioning frames on e.g. the corresponding camera position similar to GQN. But I sense that pixel-level models with lots of compute are likely to win out over anything much more structured than that

0

3

1

0

Will Harvey @willarvey

about 4 years ago

@b11tz @frankdonaldwood @sama @demishassabis @ylecun in the order of 1 GPU-week - almost nothing compared to most of the recent video models I've seen

1

4

0

Will Harvey @willarvey

about 4 years ago

@frankdonaldwood @saeidnaderip @VadenMasrani @NandoDF @sirbayes @sama @ylecun Haha well at the very least let's see if we can get some vision-based planning working before my "wasted summer" begins 😅

0

1

0

Will Harvey @willarvey

about 4 years ago

@adam_golinski Thanks @adam_golinski !

0

willarvey retweeted

AK

@_akhaliq

about 4 years ago

Flexible Diffusion Modeling of Long Videos abs: https://t.co/Cx1BUqA7zM demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length

_akhaliq's tweet photo. Flexible Diffusion Modeling of Long Videos
abs: https://t.co/Cx1BUqA7zM

demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length https://t.co/nNzhlnwvT8

0

83

11

10

0

Will Harvey @willarvey

over 5 years ago

Our results suggest a possible future application of such high-fidelity image completion tools: they could be used to select maximally informative sequences of small field of view x-ray scans.

willarvey's tweet photo. Our results suggest a possible future application of such high-fidelity image completion tools: they could be used to select maximally informative sequences of small field of view x-ray scans. https://t.co/slUVOJSweH

0

2

0

Will Harvey @willarvey

over 5 years ago

Excited to announce our work (https://t.co/2MYYfJ6Qyp) with hierarchical variational autoencoders - we found that they're ideal for making into realistic image completion models (with @saeidnaderip and @frankdonaldwood)

willarvey's tweet photo. Excited to announce our work (https://t.co/2MYYfJ6Qyp) with hierarchical variational autoencoders - we found that they're ideal for making into realistic image completion models (with @saeidnaderip and @frankdonaldwood) https://t.co/JK7qr1Zd1b

1

22

6

2

0

Will Harvey

@willarvey

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users