Yujia Chen

@IssacCyj

AI @Google

Joined May 2023

37 Following

63 Followers

34 Posts

Yujia Chen @IssacCyj

6 months ago

Insert a video into a video with motion and identity awareness. Proud of this work! Split-then-Merge is a cool step forward for video composition. Great teamwork Ozgur!

Özgür Kara

@ozgurkara99

6 months ago

🎥 Introducing Split-then-Merge: A new video composition framework! This approach enables the composition of any foreground video with any background video. Unlike conventional methods that rely on annotated datasets or handcrafted rules, Split-then-Merge (StM) splits a large unlabeled corpus of videos into dynamic foreground and background layers, then merges them to learn how dynamic subjects interact with diverse scenes. Work done in collaboration with team members at @Google: Du Tran (@dutran) , Yujia Chen (@IssacCyj) , Prof. Ming-Hsuan Yang (@MingHsuanYang), Vincent Chu: and my advisor at UIUC (@siebelschool): Prof. James M. Rehg (@RehgJim). I will be attending NeurIPS, San Diego and would be happy to chat more! 🔗Project Webpage: https://t.co/D5UZ4BDi0N 📄Paper: https://t.co/L97S4QpU9m

136

15K

447

IssacCyj retweeted

Nataniel Ruiz

@natanielruizg

6 months ago

today we are releasing new research at Google. we tackle the previously unsolved task of editing motion in an existing video. it's called MotionV2V. with it you can move objects in videos, move the camera, and other unprecedented edits in user-provided video

179

18K

Yujia Chen @IssacCyj

8 months ago

Great work!

Litu Rout @litu_rout_

8 months ago

Continuous diffusion had a good run—now it’s time for Discrete diffusion! Introducing Anchored Posterior Sampling (APS) APS outperforms discrete and continuous baselines in terms of performance & scaling on inverse problems, stylization, and text-guided editing.

422

295

40K

194

Yujia Chen @IssacCyj

about 1 year ago

Wow

Min Choi

@minchoi

about 1 year ago

Veo 3 is pretty wild. People just dropped some new insane videos 100% AI 1. What if Jurassic Park was real?

848

24K

11K

14M

Yujia Chen @IssacCyj

about 1 year ago

@natanielruizg It is. And the connection sometimes is unstable.

Yujia Chen @IssacCyj

about 1 year ago

This is crazy

Jeremy Nguyen ✍🏼 🚢

@JeremyNguyenPhD

about 1 year ago

ChatGPT's new Image Generation dropped less than 24 hours ago Here are 15 great examples of what you can do now, some limitations—and a hidden trick to get instant access if you're still waiting! 1. Life-like photos:

JeremyNguyenPhD's tweet photo. ChatGPT's new Image Generation dropped less than 24 hours ago

Here are 15 great examples of what you can do now, some limitations—and a hidden trick to get instant access if you're still waiting!

1. Life-like photos: https://t.co/mXX2XUB5AH

103

509

109

Yujia Chen @IssacCyj

about 1 year ago

Like the idea

Zhengzhong Tu

@_vztu

about 1 year ago

📍 𝗖𝗮𝗻 𝗔𝗜 𝗡𝗮𝘃𝗶𝗴𝗮𝘁𝗲 𝗠𝗮𝗽𝘀 𝗟𝗶𝗸𝗲 𝗛𝘂𝗺𝗮𝗻𝘀 𝗗𝗼? 𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗠𝗮𝗽𝗕𝗲𝗻𝗰𝗵! 🗺️🤖 𝘙𝘦𝘢𝘥𝘪𝘯𝘨 𝘮𝘢𝘱𝘴, like Google Maps and Theme Park Maps, is second nature for humans. It is a highly challenging task that requires visual understanding, spatial reasoning, and long-horizon planning. We're curious - 𝗖𝗮𝗻 𝗟𝗮𝗿𝗴𝗲 𝗩𝗶𝘀𝗶𝗼𝗻-𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹𝘀 (𝗟𝗩𝗟𝗠𝘀) 𝗱𝗼 𝗶𝘁 𝘁𝗼𝗼? 🤔 We’re excited to share 𝗠𝗮𝗽𝗕𝗲𝗻𝗰𝗵, the first-ever dataset and benchmark specifically designed for evaluating how well LVLMs perform on pixel-based map navigation tasks! 🚀 🔑 𝗪𝗵𝘆 𝗠𝗮𝗽𝗕𝗲𝗻𝗰𝗵 𝗶𝘀 𝗮 𝗚𝗮𝗺𝗲-𝗖𝗵𝗮𝗻𝗴𝗲𝗿: • 📌 1600+ Complex Pathfinding Queries from 100 uniquely challenging map scenarios (urban areas, theme parks, universities, malls, and more). • 📌 Introduces Map Space Scene Graph (MSSG): a novel data structure for mapping visual landmarks and spatial relationships to structured navigation tasks. • 📌 Evaluates state-of-the-art LVLMs like GPT-4o, Llama-3.2, and Qwen-2-VL under zero-shot and Chain-of-Thought (CoT) reasoning methods, revealing key insights into their spatial reasoning and navigation abilities. 🚩 𝗞𝗲𝘆 𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀: • Despite their impressive capabilities, current LVLMs struggle significantly with spatial reasoning and structured decision-making. • CoT prompting boosts spatial reasoning performance but sometimes introduces redundant details. 👀 𝗖𝗵𝗲𝗰𝗸 𝗼𝘂𝘁 𝗼𝘂𝗿 𝗳𝗶𝗻𝗱𝗶𝗻𝗴𝘀, 𝗱𝗮𝘁𝗮𝘀𝗲𝘁, 𝗮𝗻𝗱 𝗰𝗼𝗱𝗲 𝗵𝗲𝗿𝗲: 🔗 Arxiv: https://t.co/41aeScvzrb Huge thanks to our incredible collaborators for making this happen, from @TAMU, @UCBerkeley, @mbzuai, @UMich, and @UCRiverside! 🎉 Let’s continue to bridge the gap between human intuition and AI navigation! 🗺️💡

_vztu's tweet photo. 📍 𝗖𝗮𝗻 𝗔𝗜 𝗡𝗮𝘃𝗶𝗴𝗮𝘁𝗲 𝗠𝗮𝗽𝘀 𝗟𝗶𝗸𝗲 𝗛𝘂𝗺𝗮𝗻𝘀 𝗗𝗼? 𝗜𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗠𝗮𝗽𝗕𝗲𝗻𝗰𝗵! 🗺️🤖

𝘙𝘦𝘢𝘥𝘪𝘯𝘨 𝘮𝘢𝘱𝘴, like Google Maps and Theme Park Maps, is second nature for humans. It is a highly challenging task that requires visual understanding, spatial reasoning, and long-horizon planning. We're curious - 𝗖𝗮𝗻 𝗟𝗮𝗿𝗴𝗲 𝗩𝗶𝘀𝗶𝗼𝗻-𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗠𝗼𝗱𝗲𝗹𝘀 (𝗟𝗩𝗟𝗠𝘀) 𝗱𝗼 𝗶𝘁 𝘁𝗼𝗼? 🤔

We’re excited to share 𝗠𝗮𝗽𝗕𝗲𝗻𝗰𝗵, the first-ever dataset and benchmark specifically designed for evaluating how well LVLMs perform on pixel-based map navigation tasks! 🚀

🔑 𝗪𝗵𝘆 𝗠𝗮𝗽𝗕𝗲𝗻𝗰𝗵 𝗶𝘀 𝗮 𝗚𝗮𝗺𝗲-𝗖𝗵𝗮𝗻𝗴𝗲𝗿:
• 📌 1600+ Complex Pathfinding Queries from 100 uniquely challenging map scenarios (urban areas, theme parks, universities, malls, and more).
• 📌 Introduces Map Space Scene Graph (MSSG): a novel data structure for mapping visual landmarks and spatial relationships to structured navigation tasks.
• 📌 Evaluates state-of-the-art LVLMs like GPT-4o, Llama-3.2, and Qwen-2-VL under zero-shot and Chain-of-Thought (CoT) reasoning methods, revealing key insights into their spatial reasoning and navigation abilities.

🚩 𝗞𝗲𝘆 𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀:
• Despite their impressive capabilities, current LVLMs struggle significantly with spatial reasoning and structured decision-making.
• CoT prompting boosts spatial reasoning performance but sometimes introduces redundant details.

👀 𝗖𝗵𝗲𝗰𝗸 𝗼𝘂𝘁 𝗼𝘂𝗿 𝗳𝗶𝗻𝗱𝗶𝗻𝗴𝘀, 𝗱𝗮𝘁𝗮𝘀𝗲𝘁, 𝗮𝗻𝗱 𝗰𝗼𝗱𝗲 𝗵𝗲𝗿𝗲:
🔗 Arxiv: https://t.co/41aeScvzrb

Huge thanks to our incredible collaborators for making this happen, from @TAMU, @UCBerkeley, @mbzuai, @UMich, and @UCRiverside! 🎉

Let’s continue to bridge the gap between human intuition and AI navigation! 🗺️💡

621

120

404

59K

113

IssacCyj retweeted

Jia-Bin Huang

@jbhuang0604

over 1 year ago

Some papers rejected due to "incremental novelty" 🫠 We as a community should emphasize less on being novel and more on being simple, interesting, and useful.

419

147

32K

IssacCyj retweeted

Andrej Karpathy

@karpathy

over 1 year ago

This is interesting as a first large diffusion-based LLM. Most of the LLMs you've been seeing are ~clones as far as the core modeling approach goes. They're all trained "autoregressively", i.e. predicting tokens from left to right. Diffusion is different - it doesn't go left to right, but all at once. You start with noise and gradually denoise into a token stream. Most of the image / video generation AI tools actually work this way and use Diffusion, not Autoregression. It's only text (and sometimes audio!) that have resisted. So it's been a bit of a mystery to me and many others why, for some reason, text prefers Autoregression, but images/videos prefer Diffusion. This turns out to be a fairly deep rabbit hole that has to do with the distribution of information and noise and our own perception of them, in these domains. If you look close enough, a lot of interesting connections emerge between the two as well. All that to say that this model has the potential to be different, and possibly showcase new, unique psychology, or new strengths and weaknesses. I encourage people to try it out!

373

11K

943K

IssacCyj retweeted

Mike Bespalov

@bbssppllvv

over 1 year ago

It’s live! After some final tweaks ASCII converter is officially ready. Turn any image into ASCII art instantly https://t.co/2NOhtUGK2N

196

730

661K

IssacCyj retweeted

Yuchen Jin

@Yuchenj_UW

over 1 year ago

o3-mini might be the best LLM for real-world physics. Prompt: "write a python script of a ball bouncing inside a tesseract"

120

234

919

IssacCyj retweeted

Google DeepMind @GoogleDeepMind

over 1 year ago

Today, we’re announcing Veo 2: our state-of-the-art video generation model which produces realistic, high-quality clips from text or image prompts. 🎥 We’re also releasing an improved version of our text-to-image model, Imagen 3 - available to use in ImageFX through @LabsDotGoogle. �� https://t.co/zMJQwON4Gx

GoogleDeepMind's tweet photo. Today, we’re announcing Veo 2: our state-of-the-art video generation model which produces realistic, high-quality clips from text or image prompts. 🎥

We’re also releasing an improved version of our text-to-image model, Imagen 3 - available to use in ImageFX through @LabsDotGoogle. �� https://t.co/zMJQwON4Gx

263

Yujia Chen @IssacCyj

over 1 year ago

@natanielruizg Congratulations! Well deserved!

115

Yujia Chen @IssacCyj

over 1 year ago

And now video games!

Nataniel Ruiz

@natanielruizg

over 1 year ago

I'm sharing something unique we've been making at Google (w/ UNC). We are releasing our work on a new class of interactive experiences that we call generative infinite games, essentially video games where the game mechanics and graphics are fully subsumed by generative models 🧵

natanielruizg's tweet photo. I'm sharing something unique we've been making at Google (w/ UNC). We are releasing our work on a new class of interactive experiences that we call generative infinite games, essentially video games where the game mechanics and graphics are fully subsumed by generative models 🧵

121

276

378K

548

Yujia Chen @IssacCyj

over 1 year ago

Now you can RF-Inversion your personalized GIF in any way you want! 🔥 https://t.co/edpKogonbi

779

IssacCyj retweeted

A.I.Warper

@AIWarper

over 1 year ago

Using @logtdx implementation of RF-Inversion by @Google and @litu_rout_ and @natanielruizg I think there may be a method here for consistent stylized animation frames. If we could somehow just align these grids it would be very powerful Grid in the second tweet

Yujia Chen @IssacCyj

over 1 year ago

Thanks for the superrr quick reproduction!

logtd

@logtdx

over 1 year ago

I'll be posting more of my implementations and experiments on here from now on For now, implementation of RF-Inversion for unsampling and editing images using Flux https://t.co/jee4jRNBSQ

logtdx's tweet photo. I'll be posting more of my implementations and experiments on here from now on

For now, implementation of RF-Inversion for unsampling and editing images using Flux

https://t.co/jee4jRNBSQ https://t.co/f4aAX6yd4R

462

Yujia Chen @IssacCyj

over 1 year ago

Nataniel Ruiz

@natanielruizg

over 1 year ago

RF Inversion reimplemented in <24 hours with some super nice results - I love this community https://t.co/1GB8eTYQ8j

271

216

24K

Yujia Chen @IssacCyj

over 1 year ago

Look how much we can do with such simple yet efficient techniques! Sometimes you just need a clean theory with solid proofs! Great work team!

Litu Rout @litu_rout_

over 1 year ago

Diffusion based image editing and personalization methods are expensive💰due to training, latent optimization or prompt-tuning🤷‍♂️. Introducing RF-Inversion🎯,the first efficient zero-shot inversion and editing framework for Flux🚀without training,optimization or prompt-tuning🧵⬇️

litu_rout_'s tweet photo. Diffusion based image editing and personalization methods are expensive💰due to training, latent optimization or prompt-tuning🤷‍♂️.

Introducing RF-Inversion🎯,the first efficient zero-shot inversion and editing framework for Flux🚀without training,optimization or prompt-tuning🧵⬇️ https://t.co/Cd4VyuR9pT

688

105

593

94K

725

IssacCyj retweeted

@_akhaliq

over 1 year ago

Open-MAGVIT2 An Open-Source Project Toward Democratizing Auto-regressive Visual Generation paper page: https://t.co/RkoczeyOQr We present Open-MAGVIT2, a family of auto-regressive image generation models ranging from 300M to 1.5B. The Open-MAGVIT2 project produces an open-source replication of Google's MAGVIT-v2 tokenizer, a tokenizer with a super-large codebook (i.e., 2^{18} codes), and achieves the state-of-the-art reconstruction performance (1.17 rFID) on ImageNet 256 times 256. Furthermore, we explore its application in plain auto-regressive models and validate scalability properties. To assist auto-regressive models in predicting with a super-large vocabulary, we factorize it into two sub-vocabulary of different sizes by asymmetric token factorization, and further introduce "next sub-token prediction" to enhance sub-token interaction for better generation quality. We release all models and codes to foster innovation and creativity in the field of auto-regressive visual generation.

$_akhaliq's tweet photo. Open-MAGVIT2 An Open-Source Project Toward Democratizing Auto-regressive Visual Generation paper page: https://t.co/RkoczeyOQr We present Open-MAGVIT2, a family of auto-regressive image generation models ranging from 300M to 1.5B. The Open-MAGVIT2 project produces an open-source replication of Google's MAGVIT-v2 tokenizer, a tokenizer with a super-large codebook (i.e., 2^{18} codes), and achieves the state-of-the-art reconstruction performance (1.17 rFID) on ImageNet 256 times 256. Furthermore, we explore its application in plain auto-regressive models and validate scalability properties. To assist auto-regressive models in predicting with a super-large vocabulary, we factorize it into two sub-vocabulary of different sizes by asymmetric token factorization, and further introduce "next sub-token prediction" to enhance sub-token interaction for better generation quality. We release all models and codes to foster innovation and creativity in the field of auto-regressive visual generation.$

260

102

41K

Yujia Chen

@IssacCyj

Last Seen Users on Sotwe

Trends for you

Most Popular Users