Hangjie Yuan @jacob9972 - Twitter Profile

Pinned Tweet

over 2 years ago

Thanks to @_akhaliq for featuring our work! InstructVideo addresses key challenges in video generation by integrating human feedback into video diffusion models. Excited to see how InstructVideo advances AI-driven video creation! 🚀 #AI #VideoGeneration #InstructVideo

AK

@_akhaliq

over 2 years ago

Alibaba announces InstructVideo: Instructing Video Diffusion Models with Human Feedback paper page: https://t.co/6mg9XheORk Diffusion models have emerged as the de facto paradigm for video generation. However, their reliance on web-scale data of varied quality often yields results that are visually unappealing and misaligned with the textual prompts. To tackle this problem, we propose InstructVideo to instruct text-to-video diffusion models with human feedback by reward fine-tuning. InstructVideo has two key ingredients: 1) To ameliorate the cost of reward fine-tuning induced by generating through the full DDIM sampling chain, we recast reward fine-tuning as editing. By leveraging the diffusion process to corrupt a sampled video, InstructVideo requires only partial inference of the DDIM sampling chain, reducing fine-tuning cost while improving fine-tuning efficiency. 2) To mitigate the absence of a dedicated video reward model for human preferences, we repurpose established image reward models, e.g., HPSv2. To this end, we propose Segmental Video Reward, a mechanism to provide reward signals based on segmental sparse sampling, and Temporally Attenuated Reward, a method that mitigates temporal modeling degradation during fine-tuning. Extensive experiments, both qualitative and quantitative, validate the practicality and efficacy of using image reward models in InstructVideo, significantly enhancing the visual quality of generated videos without compromising generalization capabilities.

2

173

37

72

34K

0

12

1

2K

Hangjie Yuan @Jacob9972

7 months ago

🌟UniLumos — a unified framework for image & video relighting with physics-plausible feedback! UniLumos learns lighting consistency in static & dynamic scenes — much faster and more physically grounded⚡ 💻 Code: https://t.co/IMNO4BVkzb Also in ComfyUI-WanVideo! #neurips2025

1

3

2

0

168

Hangjie Yuan @Jacob9972

11 months ago

@_akhaliq Feel free to discuss with me. 🙋🏼

0

2

0

149

Jacob9972 retweeted

AK

@_akhaliq

11 months ago

Lumos-1 On Autoregressive Video Generation from a Unified Model Perspective

3

104

17

59

17K

Jacob9972 retweeted

Haonan Qiu @qhnmoon

over 1 year ago

Unleash the resolution of your SDXL without cost. 🚀FreeScale🚀, a tuning-free method for higher-resolution visual generation, unlocking the 8k image generation! #FreeScale #SDXL - Project: https://t.co/cdkjJU77J0 - Code: https://t.co/vnyE3zOvP0 - Paper: https://t.co/naKY9gmiho

qhnmoon's tweet photo. Unleash the resolution of your SDXL without cost. 🚀FreeScale🚀, a tuning-free method for higher-resolution visual generation, unlocking the 8k image generation! #FreeScale #SDXL

- Project: https://t.co/cdkjJU77J0
- Code: https://t.co/vnyE3zOvP0
- Paper: https://t.co/naKY9gmiho https://t.co/2LZ9NL8trY

3

85

18

38

18K

Jacob9972 retweeted

Michael Kirchhof @mkirchhof_

over 1 year ago

Throughout my PhD, I've found one basic trick to read papers in less than 30 minutes but with maximum utility. It boils down to consuming actively, not passively: 🧵 1/5

25

1K

150

1K

188K

Hangjie Yuan @Jacob9972

over 1 year ago

@DrZhenghaoChen @Uni_Newcastle Congrats 🎉

1

0

55

Jacob9972 retweeted

Xin Eric Wang

@xwang_lk

almost 2 years ago

After I joined the industry, I realize more and more how fragile and infeasible building your business on proprietary LLMs is. An 86% open-weight model >> an 89% proprietary API. Open source is the future!

0

114

10

25

25K

Hangjie Yuan @Jacob9972

almost 2 years ago

I will be presenting InstructVideo on June 19th from 17:15 to 18:45 at Arch 4A-E (poster 162) . Feel free to reach out! I am more than happy to have discussions on this.🥳

AK

@_akhaliq

over 2 years ago

Alibaba announces InstructVideo: Instructing Video Diffusion Models with Human Feedback paper page: https://t.co/6mg9XheORk Diffusion models have emerged as the de facto paradigm for video generation. However, their reliance on web-scale data of varied quality often yields results that are visually unappealing and misaligned with the textual prompts. To tackle this problem, we propose InstructVideo to instruct text-to-video diffusion models with human feedback by reward fine-tuning. InstructVideo has two key ingredients: 1) To ameliorate the cost of reward fine-tuning induced by generating through the full DDIM sampling chain, we recast reward fine-tuning as editing. By leveraging the diffusion process to corrupt a sampled video, InstructVideo requires only partial inference of the DDIM sampling chain, reducing fine-tuning cost while improving fine-tuning efficiency. 2) To mitigate the absence of a dedicated video reward model for human preferences, we repurpose established image reward models, e.g., HPSv2. To this end, we propose Segmental Video Reward, a mechanism to provide reward signals based on segmental sparse sampling, and Temporally Attenuated Reward, a method that mitigates temporal modeling degradation during fine-tuning. Extensive experiments, both qualitative and quantitative, validate the practicality and efficacy of using image reward models in InstructVideo, significantly enhancing the visual quality of generated videos without compromising generalization capabilities.

2

173

37

72

34K

0

4

1

1K

Hangjie Yuan @Jacob9972

over 2 years ago

@SamuelAlbanie @Cambridge_Eng @Oxford_VGG Congrats!

0

50

Jacob9972 retweeted

OpenAI

@OpenAI

over 2 years ago

Introducing Sora, our text-to-video model. Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/YYpOAcrXQ3 Prompt: “Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.”

9K

130K

30K

34K

98M

Jacob9972 retweeted

Ziwei Liu

@liuziwei7

over 2 years ago

🎯Align GenAI with Human Preference🎯 #InstructVideo instructs video diffusion models with human feedback by reward fine-tuning, enhancing the video generation quality/aesthetics - Project: https://t.co/1vILMrZEkI - Paper: https://t.co/vYO9wnMIf5 - Code: https://t.co/UcDH4OmTu9

0

83

17

18

9K

Hangjie Yuan @Jacob9972

over 2 years ago

Check out our new work.

AK

@_akhaliq

over 2 years ago

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion paper page: https://t.co/GjisYPlDr0 Customized generation using diffusion models has made impressive progress in image generation, but remains unsatisfactory in the challenging video generation task, as it requires the controllability of both subjects and motions. To that end, we present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject and a few videos of target motion. DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model. The subject learning aims to accurately capture the fine appearance of the subject from provided images, which is achieved by combining textual inversion and fine-tuning of our carefully designed identity adapter. In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern. Combining these two lightweight and efficient adapters allows for flexible customization of any subject with any motion. Extensive experimental results demonstrate the superior performance of our DreamVideo over the state-of-the-art methods for customized video generation.

1

100

21

41

15K

0

88

Hangjie Yuan @Jacob9972

over 2 years ago

Our new work.

AK

@_akhaliq

over 2 years ago

I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models paper page: https://t.co/zj3ip5Y8My Video synthesis has recently made remarkable strides benefiting from the rapid development of diffusion models. However, it still encounters challenges in terms of semantic accuracy, clarity and spatio-temporal continuity. They primarily arise from the scarcity of well-aligned text-video data and the complex inherent structure of videos, making it difficult for the model to simultaneously ensure semantic and qualitative excellence. In this report, we propose a cascaded I2VGen-XL approach that enhances model performance by decoupling these two factors and ensures the alignment of the input data by utilizing static images as a form of crucial guidance. I2VGen-XL consists of two stages: i) the base stage guarantees coherent semantics and preserves content from input images by using two hierarchical encoders, and ii) the refinement stage enhances the video's details by incorporating an additional brief text and improves the resolution to 1280times720. To improve the diversity, we collect around 35 million single-shot text-video pairs and 6 billion text-image pairs to optimize the model. By this means, I2VGen-XL can simultaneously enhance the semantic accuracy, continuity of details and clarity of generated videos. Through extensive experiments, we have investigated the underlying principles of I2VGen-XL and compared it with current top methods, which can demonstrate its effectiveness on diverse data.

3

354

85

126

77K

0

72

Hangjie Yuan @Jacob9972

over 2 years ago

It’s indeed awesome to meet @SamuelAlbanie in person and present the poster together. Can’t wait to conduct more impactful research together.

Jacob9972's tweet photo. It’s indeed awesome to meet @SamuelAlbanie in person and present the poster together. Can’t wait to conduct more impactful research together. https://t.co/SI9LNVhgCK

0

2

0

90

Jacob9972 retweeted

Azade Farshad @azadef

over 2 years ago

Dear #ICCV2023 attendees, my laptop was stolen from my backpack on Monday (02.10) from room S01 and another laptop was stolen from S06. Both laptops were taken from 2 to 3 pm. If you have any photos/videos from these rooms at those times, I appreciate if you share them with me.

6

107

37

2

40K

Jacob9972 retweeted