Jiaqi Wang @wjqdev - Twitter Profile

wjqdev retweeted

4 months ago

DeepGen 1.0 A lightweight 5B unified multimodal model that outperforms 80B+ giants like HunyuanImage by 28% on WISE and Qwen-Image-Edit by 37% on UniREditBench—proving scale isn't everything

HuggingPapers's tweet photo. DeepGen 1.0

A lightweight 5B unified multimodal model that outperforms 80B+ giants like HunyuanImage by 28% on WISE and Qwen-Image-Edit by 37% on UniREditBench—proving scale isn't everything https://t.co/gZPSbCDyN7

1

113

15

88

7K

wjqdev retweeted

AK

@_akhaliq

10 months ago

Pref-GRPO Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

7

71

17

35

13K

Jiaqi Wang

@wjqdev

10 months ago

🌟Project page: https://t.co/0UGOWyrdGY 📖Paper: https://t.co/VEDcnUI56t

0

1

0

153

Jiaqi Wang

@wjqdev

10 months ago

🚀 Pref-GRPO: A pairwise preference-based GRPO method that tackles reward hacking for T2I models 🎨 UniGenBench: A unified benchmark providing comprehensive, fine-grained evaluation for T2I models across 27 dimensions & 20 scenarios 🤗Leaderboard: https://t.co/bTUshFNQeI

1

3

1

0

307

Who to follow

Ziqi Huang

@ziqi_huang_

Ph.D. student @NTUsg MMLab@NTU - Visual Generation

Jingkang (Jake) Yang

@JingkangY

Egocentric Model Researcher | Prev. Co-Founder at Synvo AI (https://t.co/iLyMFdMNYG) | MMLab@NTU Ph.D. (https://t.co/E8cQaOk45D) | ECCV’22 Best Backpack Award 🎒

Yuanhan (John) Zhang

@zhang_yuanhan

Coder @ Meta Superintelligence Lab Ph.D @MMLabNTU

wjqdev retweeted

DailyPapers

@HuggingPapers

10 months ago

Explore Pref-GRPO for stable T2I RL & UniGenBench, a comprehensive T2I benchmark, on Hugging Face! Paper: https://t.co/c28ghJcn7u Model: https://t.co/3MrTGNAdzm Leaderboard: https://t.co/5vHIMAZrQB

0

3

0

868

Jiaqi Wang

@wjqdev

10 months ago

Thanks for tweeting our work🍻

AK

@_akhaliq

10 months ago

CODA Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

_akhaliq's tweet photo. CODA

Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning https://t.co/mVN0yXRJs0

2

23

7

9K

0

1

0

235

wjqdev retweeted

DailyPapers

@HuggingPapers

10 months ago

SEAgent autonomously learns through experiential feedback, evolving from specialists to generalists. Key components include a World State Model and Curriculum Generator. Read the paper: https://t.co/jSzI9IHt8P Try the model: https://t.co/Ftr1KKEK8i

0

5

1

2

854

wjqdev retweeted

DailyPapers

@HuggingPapers

about 1 year ago

Nvidia's got something new UnifiedReward-Think is here: a multimodal CoT reward model for both visual understanding and generation https://t.co/k3z5LARosv

2

170

40

135

22K

wjqdev retweeted

Zhibing Li @ZhibingLi_6626

over 1 year ago

🎉 Excited to introduce IDArb! 🎉 Our method can predict plausible and 𝗰𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝘁 geometry and PBR material for 𝗮𝗻𝘆 𝗻𝘂𝗺𝗯𝗲𝗿📷 of input images under 𝘃𝗮𝗿𝘆𝗶𝗻𝗴 𝗶𝗹𝗹𝘂𝗺𝗶𝗻𝗮𝘁𝗶𝗼𝗻𝘀☀️ ! Webpage: https://t.co/GvfyvbEq25

2

74

24

37

11K

Jiaqi Wang

@wjqdev

over 1 year ago

@JunMa_11 Thx! We are preparing the fine-tuning code. Hope to come out within following two weeks.

1

0

123

Jiaqi Wang

@wjqdev

over 1 year ago

🚀 We’re excited to announce the release of InternLM-XComposer2.5-OmniLive (IXC2.5-OL), a comprehensive multimodal system designed for long-term streaming video and audio interactions. This fully open-sourced project delivers functionality similar to Gemini 2.0 Live Streaming and OpenAI Her, with standout features including: 🎥 Chat with Streaming Video & Audio 💾 Long-Term Memory for recalling past video experiences 🏆 Competitive Performance across various video and audio perception benchmarks 📄 Paper: https://t.co/oiBfU74lR4 💻 Code: https://t.co/hP881kVnlM 📦 Models: https://t.co/t7hGEuj9Qx ✨ Immerse yourself in multimodal interaction and create your own app today! #Gemini2 #OpenAI #ChatGPTAdvancedVoice

3

123

32

61

14K

Jiaqi Wang

@wjqdev

over 1 year ago

Thanks so much for tweeting our work!

AK

@_akhaliq

over 1 year ago

InternLM-XComposer2.5-OmniLive A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

2

144

39

71

44K

0

20

1

2

12K

Jiaqi Wang

@wjqdev

over 1 year ago

Many thanks for your tweeting 👏

Adina Yakup

@AdinaYakup

over 1 year ago

InternLM-XComposer-2.5-OmniLive🔥 a specialized generalist multimodal system for streaming video and audio interactions by @intern_lm. Model: https://t.co/dSqEg2mnK8 ✨ Apache 2.0, but a form is required for a commercial license

1

19

6

4

1K

0

4

0

585

wjqdev retweeted

Ziwei Liu

@liuziwei7

over 1 year ago

😻Fine-Grained Visual Attributes for GenAI😻 #NeurIPS2024 🍎FiVA🍊 is a fine-grained visual attributes dataset and a framework that decouples different visual attributes for GenAI - Project: https://t.co/hhSlc7PFQm - Code: https://t.co/Ggji0AluDN - Data: https://t.co/LgRjvcShl1

0

132

34

38

8K

Jiaqi Wang

@wjqdev

over 1 year ago

@GeekTrailAI Thx a lot!

0

60

Jiaqi Wang

@wjqdev

over 1 year ago

We have released SAM2Long, a training-free enhancement to SAM 2 for long-term video segmentation 🔥 Less error accumulation facing occlusion/reappearance. ⚡️ A training-free memory tree for dynamic segmentation paths, boosting resilience efficiently. 🤯 Significant improvements over SAM2 across 24 head-to-head comparisons on SA-V and LVOS. Technical Report: https://t.co/jI0WbJDSHr Github: https://t.co/nxc1WoMVoO Homepage: https://t.co/zhx7tQuG2R #AIML #VideoSegmentation #SAM2Long #ComputerVision

1

168

40

97

16K

Jiaqi Wang

@wjqdev

over 1 year ago

@KyeGomezB SAM2Long is a training-free work; the models are identical to SAM2 & SAM2.1. See https://t.co/MWiDHbEaD9 for details.

0

1

0

92

wjqdev retweeted

Yunxin Li

@LyxTg

almost 2 years ago

🚀Check out VideoVista, our comprehensive video-LMMs evaluation benchmark! We've assessed 33 video Video-LMMs across 27 tasks. Highlights include the latest GPT-4o-Mini, ranked third, and InternLM-XComposer-2.5, the top-performing open-source model. More: https://t.co/Ey0MIzXIlT

LyxTg's tweet photo. 🚀Check out VideoVista, our comprehensive video-LMMs evaluation benchmark! We've assessed 33 video Video-LMMs across 27 tasks. Highlights include the latest GPT-4o-Mini, ranked third, and InternLM-XComposer-2.5, the top-performing open-source model.
More: https://t.co/Ey0MIzXIlT https://t.co/3HB9joLAem

3

26

13

6

5K

wjqdev retweeted

Yubo Ma @mayubo2333

almost 2 years ago

Large Vision-Language Models (LVLMs) perform ideally on the understanding of single-page documents like DocVQA, ChartQA. Here remains an open question🧐: Can LVLMs handle long documents well? We introduce MMLongBench-Doc! 🌐 Project Page: https://t.co/rdGoATeCW3 🧵(1/7)

2

65

16

41

14K

Jiaqi Wang

@wjqdev

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users