Hyojun Go @gohyojun3 - Twitter Profile

Pinned Tweet

25 days ago

Our recent finding on Diffusion Alignment: a reward model in pixel space can be easily transferred to score noisy diffusion latents directly — at small finetuning cost, via stitching. This makes Faster & Better for both Training & Inference Alignment. Meet StitchVM👇 1/

gohyojun3's tweet photo. Our recent finding on Diffusion Alignment: a reward model in pixel space can be easily transferred to score noisy diffusion latents directly — at small finetuning cost, via stitching.

This makes Faster & Better for both Training & Inference Alignment.

Meet StitchVM👇

1/ https://t.co/fufZc0MLGr

3

29

6

10

7K

gohyojun3 retweeted

Grigory Bartosh @GrigoryBartosh

25 days ago

🚀 Excited to share my @GoogleDeepMind student researcher project: Dual-Rate Diffusion✨ ⚡ A simple construction that speeds up both regular diffusion and distilled models by interleaving a heavy context encoder with a light conditional denoiser. 🧵👇

GrigoryBartosh's tweet photo. 🚀 Excited to share my @GoogleDeepMind student researcher project: Dual-Rate Diffusion✨

⚡ A simple construction that speeds up both regular diffusion and distilled models by interleaving a heavy context encoder with a light conditional denoiser.

🧵👇 https://t.co/gIcIDFK2me

6

192

29

135

17K

gohyojun3 retweeted

Hyungjin Chung @hyungjin_chung

25 days ago

For alignment you need V, but is hard to compute. Most methods try to approximate with 1) Tweedie, which is biased 2) MC roll-outs, which is slow with high var. Training V was often neglected since it's hard. We beg to differ. StitchVM enables this! Led by @gohyojun3 👇

0

27

7

17

5K

Hyojun Go @gohyojun3

25 days ago

For more, check out our work 👇 @hyungjin_chung , @prunetruong , Goutam Bhat, @ZhaochongAn , @zixiangzhao_ , @DNarnhofer , @fedassa , @SergeBelongie , Konrad Schindler paper: https://t.co/qeVubJM2Om Page: https://t.co/ZCzaYLQRIl

0

3

0

2

304

Hyojun Go @gohyojun3

25 days ago

Our recent finding on Diffusion Alignment: a reward model in pixel space can be easily transferred to score noisy diffusion latents directly — at small finetuning cost, via stitching. This makes Faster & Better for both Training & Inference Alignment. Meet StitchVM👇 1/

3

29

6

10

7K

Hyojun Go @gohyojun3

25 days ago

Result 4️⃣ — Training-time alignment with DRaFT & DiffusionNFT No need for full rollouts. Just stop denoising at an intermediate step and use StitchVM's inference as the reward signal. Now we have much faster convergence 7/

gohyojun3's tweet photo. Result 4️⃣ — Training-time alignment with DRaFT & DiffusionNFT

No need for full rollouts. Just stop denoising at an intermediate step and use StitchVM's inference as the reward signal.

Now we have much faster convergence

7/ https://t.co/ojOWPzD5EK

1

0

363

gohyojun3 retweeted

Yuanwen Yue

@YueYuanwen

6 months ago

Want a lighter yet stronger Point Transformer? Meet LitePT ✨ LitePT is a lightweight, high-performance 3D point cloud architecture for a wide range of point cloud processing tasks. Our smallest variant LitePT-S, features 3.6× fewer parameters, 2× faster runtime and 2× lower memory footprint than PTv3, while already matching or outperforming it across a range of benchmarks. 💻Code: https://t.co/WtMSKJHfRB 🌐Project page: https://t.co/FEQbeOtHUB 📰Paper: https://t.co/8cQcS4Nvtt with Damien Robert, @jianyuan_wang , Sunghwan Hong, Jan Dirk Wegner, Christian Rupprecht, and Konrad Schindler

YueYuanwen's tweet photo. Want a lighter yet stronger Point Transformer? Meet LitePT ✨

LitePT is a lightweight, high-performance 3D point cloud architecture for a wide range of point cloud processing tasks. Our smallest variant LitePT-S, features 3.6× fewer parameters, 2× faster runtime and 2× lower memory footprint than PTv3, while already matching or outperforming it across a range of benchmarks.

💻Code: https://t.co/WtMSKJHfRB
🌐Project page: https://t.co/FEQbeOtHUB
📰Paper: https://t.co/8cQcS4Nvtt

with Damien Robert, @jianyuan_wang , Sunghwan Hong, Jan Dirk Wegner, Christian Rupprecht, and Konrad Schindler

3

126

26

62

12K

gohyojun3 retweeted

Michael Niemeyer @Mi_Niemeyer

8 months ago

Combining video diffusion and 3D feedforward models by simply stiching them together in latent space - very cool idea! Make sure to check out this novel work from my collagues at Google and ETH!

0

64

7

39

8K

gohyojun3 retweeted

Dominik Narnhofer @DNarnhofer

8 months ago

Want to leverage the power of SOTA 3D models like VGGT & Video LDMs for 3D generation? Now you can! 🚀 Introducing VIST3A — we stitch pretrained video generators to 3D foundation models and align them via reward finetuning. 📄 https://t.co/MctMyuDev4 🌐 https://t.co/XQMW4mfjWI

0

14

3

7

9K

gohyojun3 retweeted

Prune Truong @prunetruong

8 months ago

🎺Meet VIST3A — Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator. ➡️ Paper: https://t.co/sFqbbUiGOO ➡️ Website: https://t.co/QWMLwXyVcB Collaboration between ETH & Google with Hyojun Go, @DNarnhofer, Goutam Bhat, @fedassa, and Konrad Schindler.

2

88

11

41

17K

gohyojun3 retweeted

Hyungjin Chung @hyungjin_chung

9 months ago

Even the SOTA VideoLLMs see videos in 1 fps, and you CANNOT perceive fine-grained motion 💃 with this frequency 🥲 📣 Presenting Video Parallel Scaling (VPS), an inference-time strategy that lets VideoLLMs see more frames by scaling compute in the parallel-axis 🤩

hyungjin_chung's tweet photo. Even the SOTA VideoLLMs see videos in 1 fps, and you CANNOT perceive fine-grained motion 💃 with this frequency 🥲

📣 Presenting Video Parallel Scaling (VPS), an inference-time strategy that lets VideoLLMs see more frames by scaling compute in the parallel-axis 🤩 https://t.co/MKDBOWTCFL

1

38

11

6

3K

gohyojun3 retweeted

Hyungjin Chung @hyungjin_chung

12 months ago

Excited to share that 3 papers are accepted to #ICCV2025 at EverEx 🎉 📌 SteerX: https://t.co/t9BQouecfI 📌 VideoRFSplat: https://t.co/rSrlGiZvCv 📌 CapeLLM: https://t.co/Sh75vXsQID See you in Hawaii 🌴 👇 link to some threads

1

83

7

8

5K

gohyojun3 retweeted

Hyungjin Chung @hyungjin_chung

about 1 year ago

🚨Introducing VideoRFSplat📽️, a feed-forward text-to-3DGS generative model with high-quality scene-level results without post-optimization (e.g. SDS) Led by collaborators at EverEx AI - @gohyojun3, @bypark___, @namhyelin99, Byung-Hoon https://t.co/jfzlfejO8A A 🧵 👇 1/n

2

45

8

11

4K

gohyojun3 retweeted

Zhenjun Zhao @zhenjun_zhao

about 1 year ago

SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering @bypark___, @gohyojun3, @namhyelin99, Byung-Hoon Kim, @hyungjin_chung, Changick Kim tl;dr: MV-DUSt3R+ and MonST3R->geometric reward functions->geometric consistency https://t.co/vJSExQFGBV

zhenjun_zhao's tweet photo. SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering

@bypark___, @gohyojun3, @namhyelin99, Byung-Hoon Kim, @hyungjin_chung, Changick Kim

tl;dr: MV-DUSt3R+ and MonST3R->geometric reward functions->geometric consistency

https://t.co/vJSExQFGBV https://t.co/c9Cmytf3wZ

0

49

9

11

3K

gohyojun3 retweeted

Hyungjin Chung @hyungjin_chung

about 1 year ago

3D consistent videos are hard to generate 🙁 What if we could steer them to be consistent during generation? Introducing SteerX🛞, a plug-and-play sampling method that works with *any* video diffusion to make videos physically plausible🤩 w/ @bypark___ @gohyojun3 @namhyelin99

hyungjin_chung's tweet photo. 3D consistent videos are hard to generate 🙁

What if we could steer them to be consistent during generation?

Introducing SteerX🛞, a plug-and-play sampling method that works with *any* video diffusion to make videos physically plausible🤩

w/ @bypark___ @gohyojun3 @namhyelin99 https://t.co/njhQK8Sy41

2

104

18

35

7K

gohyojun3 retweeted

MrNeRF

@janusch_patas

over 1 year ago

SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis TL;DR: SplatFlow is a unified framework that combines a latent-space multi-view generator and a Gaussian Splatting Decoder to enable efficient 3D generation, editing, and inpainting directly from text prompts. Abstract (excerpt): SplatFlow comprises two main components: a multi-view rectified flow (RF) model and a Gaussian Splatting Decoder (GSDecoder). The multi-view RF model operates in latent space, generating multi-view images, depths, and camera poses simultaneously, conditioned on text prompts, thus addressing challenges like diverse scene scales and complex camera trajectories in real-world settings. Then, the GSDecoder efficiently translates these latent outputs into 3DGS representations through a feed-forward 3DGS method. Leveraging training-free inversion and inpainting techniques, SplatFlow enables seamless 3DGS editing and supports a broad range of 3D tasks-including object editing, novel view synthesis, and camera pose estimation-within a unified framework without requiring additional complex pipelines. We validate SplatFlow's capabilities on the MVImgNet and DL3DV-7K datasets, demonstrating its versatility and effectiveness in various 3D generation, editing, and inpainting-based tasks.

1

77

7

60

5K

Hyojun Go

@gohyojun3

Last Seen Users on Sotwe

Trends for you

Most Popular Users