Ian Liu 劉以恆

@duckiesfloat

MS @BrownBiostats | Data Science Graduate & Classical Chinese Dancer @FTCNorthern | 🇺🇸🇹🇼

Providence, RI

Joined April 2021

1.6K Following

70 Followers

960 Posts

duckiesfloat retweeted

Lucas Beyer (bl16)

@giffmana

about 24 hours ago

You may have recently heard claims that video generation models are "dumb" about physics, and only "world models" (V-JEPA, specifically) have a valid internal model of physics. This turns out to be false. In a recent paper, researchers show that a LINEAR probe of diffusion videogen models predict various "physics" very well, significantly better than V-JEPA or VideoMAE (and plain VAE just sucks). This is noteworthy, because a *linear* probe being this accurate shows that the model has a pretty explicit internal representation of the physics!

giffmana's tweet photo. You may have recently heard claims that video generation models are "dumb" about physics, and only "world models" (V-JEPA, specifically) have a valid internal model of physics.

This turns out to be false. In a recent paper, researchers show that a LINEAR probe of diffusion videogen models predict various "physics" very well, significantly better than V-JEPA or VideoMAE (and plain VAE just sucks).

This is noteworthy, because a *linear* probe being this accurate shows that the model has a pretty explicit internal representation of the physics!

923

541

74K

duckiesfloat retweeted

Yi Ma

@YiMaTweets

1 day ago

I’m getting increasingly annoyed by young people complaining that they cannot do AI-related research unless they join big industrial labs… well, here is my reply: academia is supposed to work on ideas that money cannot buy!

905

182

70K

duckiesfloat retweeted

Phoenix Yin

@Phoenixyin13

about 9 hours ago

我是一个极度喜欢看自己曾经爱看的文字的人。不乏小说，论文，漫画，小人书。今天还是没什么别的事，在家里找到了10年前我看过的一篇论文，印象深刻，想从自己的视角，再次解构一下。十年前读这篇论文，我觉得作者 Douglas Detterman 简直是个冷酷的宿命论者。他用冰冷的数据砸碎了所有教育家的饭碗。在决定学习成绩的变量里，学校和老师的贡献率只有可怜的 10%，而剩下的 90%，全看学生自己的天生特质，也就是智商。当时正值教育内卷的狂热期，人人都在为那 10% 的名师学区房抢得头破血流，没人愿意听这种大实话。十年后的今天，当我坐在哥伦布的家里，看着满街被AI重塑的教学工具，再回看这篇文章，突然有一种头皮发麻的宿命感。现在生成式 AI 和个性化助教已经普及。曾经，差学校和好学校差的是师资。如今，AI 让世界顶尖的知识和教学方法变得触手可及。当所有人都能拥有完美、耐心的24小时 AI 名师时，教师质量的差异被无限抹平。结果是什么？决定学生差距的，退回到了最纯粹、最残酷的起点，学生自身的认知能力、专注力和提问能力，即 Detterman 强调的 90% 特质。 AI 没有消灭差距，反而放大了智力的马太效应。 Detterman 在文末说，人类教育几千年来没有本质变化，是因为我们总在试图改变最容易改变的，比如学校，比如老师，而逃避了最难改变的学生的个体差异。可怜那些被指标压垮的老师，也叹息那些被系统催熟的皮囊。承认人与人之间底层的不同，并非为了放弃，只是为了不再用同一种模具，去裁剪完全不同的灵魂。再次读完，在光标闪烁的文档里，只剩下感叹。日光之下，并无新事，只是当年不信邪的我们，终于在十年后撞上了现实的南墙。

Phoenixyin13's tweet photo. 我是一个极度喜欢看自己曾经爱看的文字的人。不乏小说，论文，漫画，小人书。

今天还是没什么别的事，在家里找到了10年前我看过的一篇论文，印象深刻，想从自己的视角，再次解构一下。

十年前读这篇论文，我觉得作者 Douglas Detterman 简直是个冷酷的宿命论者。
他用冰冷的数据砸碎了所有教育家的饭碗。
在决定学习成绩的变量里，学校和老师的贡献率只有可怜的 10%，而剩下的 90%，全看学生自己的天生特质，也就是智商。

当时正值教育内卷的狂热期，人人都在为那 10% 的名师学区房抢得头破血流，没人愿意听这种大实话。

十年后的今天，当我坐在哥伦布的家里，看着满街被AI重塑的教学工具，再回看这篇文章，突然有一种头皮发麻的宿命感。

现在生成式 AI 和个性化助教已经普及。曾经，差学校和好学校差的是师资。

如今，AI 让世界顶尖的知识和教学方法变得触手可及。当所有人都能拥有完美、耐心的24小时 AI 名师时，教师质量的差异被无限抹平。
结果是什么？
决定学生差距的，退回到了最纯粹、最残酷的起点，学生自身的认知能力、专注力和提问能力，即 Detterman 强调的 90% 特质。

AI 没有消灭差距，反而放大了智力的马太效应。

Detterman 在文末说，人类教育几千年来没有本质变化，是因为我们总在试图改变最容易改变的，比如学校，比如老师，而逃避了最难改变的学生的个体差异。

可怜那些被指标压垮的老师，也叹息那些被系统催熟的皮囊。承认人与人之间底层的不同，并非为了放弃，只是为了不再用同一种模具，去裁剪完全不同的灵魂。

再次读完，在光标闪烁的文档里，只剩下感叹。
日光之下，并无新事，只是当年不信邪的我们，终于在十年后撞上了现实的南墙。

duckiesfloat retweeted

Nav Toor

@heynavtoor

about 19 hours ago

You have noticed it. ChatGPT feels dumber than it used to. Your prompts that worked six months ago produce worse results now. The writing sounds flatter. The ideas sound safer. The internet itself feels like it is shrinking. Every article reads the same. Every email sounds the same. Every answer sounds like it was written by the same voice. You thought it was you. It is not you. Researchers at Oxford and Cambridge published a paper in Nature proving what is happening. They call it Model Collapse. Here is the mechanism in one sentence. AI trained on AI-generated data gets dumber every generation until it forgets what real human data looked like. The internet is filling with AI-generated content. Blog posts. Articles. Reviews. Comments. Social media. AI companies scrape the internet to train the next generation of models. Which means the next generation of AI is being trained on the output of the current generation. Each cycle loses information. Not randomly. It loses the rarest, most unusual, most creative parts first. The researchers call these the "tails of the distribution." The weird ideas. The unexpected perspectives. The things that made the internet feel human. Those disappear first. What remains is the average. The safe. The expected. The bland. Then the next generation trains on that. And loses more. And the next generation trains on that. And loses more. The researchers proved this is not a slow decline. Major degradation happens within just a few iterations. Even when some of the original human data is preserved. They tested it on large language models. On image generators. On statistical models. The pattern was the same every time. The output converges toward a narrow, flattened version of reality that looks nothing like the original data. The lead researcher put it plainly. "Large language models are like fire. A useful tool. But one that pollutes the environment." The pollution is invisible. You cannot see which sentence on the internet was written by a human and which was written by AI. Neither can the AI that is about to train on it. And once the tails are gone, they do not come back. The damage is irreversible. This is not a prediction anymore. It is a diagnosis. The internet you grew up on was built by humans writing things no algorithm would have written. Strange, personal, imperfect, alive. That internet is being diluted. One generation of AI at a time. And the models trained on what remains are learning a smaller and smaller version of the world. Model Collapse is not a technical problem. It is a cultural one. The thing that made the internet worth reading is the thing that disappears first.

heynavtoor's tweet photo. You have noticed it. ChatGPT feels dumber than it used to. Your prompts that worked six months ago produce worse results now. The writing sounds flatter. The ideas sound safer. The internet itself feels like it is shrinking. Every article reads the same. Every email sounds the same. Every answer sounds like it was written by the same voice.

You thought it was you. It is not you.

Researchers at Oxford and Cambridge published a paper in Nature proving what is happening. They call it Model Collapse.

Here is the mechanism in one sentence. AI trained on AI-generated data gets dumber every generation until it forgets what real human data looked like.

The internet is filling with AI-generated content. Blog posts. Articles. Reviews. Comments. Social media. AI companies scrape the internet to train the next generation of models. Which means the next generation of AI is being trained on the output of the current generation.

Each cycle loses information. Not randomly. It loses the rarest, most unusual, most creative parts first. The researchers call these the "tails of the distribution." The weird ideas. The unexpected perspectives. The things that made the internet feel human. Those disappear first.

What remains is the average. The safe. The expected. The bland.

Then the next generation trains on that. And loses more. And the next generation trains on that. And loses more. The researchers proved this is not a slow decline. Major degradation happens within just a few iterations. Even when some of the original human data is preserved.

They tested it on large language models. On image generators. On statistical models. The pattern was the same every time. The output converges toward a narrow, flattened version of reality that looks nothing like the original data.

The lead researcher put it plainly. "Large language models are like fire. A useful tool. But one that pollutes the environment."

The pollution is invisible. You cannot see which sentence on the internet was written by a human and which was written by AI. Neither can the AI that is about to train on it. And once the tails are gone, they do not come back. The damage is irreversible.

This is not a prediction anymore. It is a diagnosis.

The internet you grew up on was built by humans writing things no algorithm would have written. Strange, personal, imperfect, alive. That internet is being diluted. One generation of AI at a time. And the models trained on what remains are learning a smaller and smaller version of the world.

Model Collapse is not a technical problem. It is a cultural one. The thing that made the internet worth reading is the thing that disappears first.

696

10K

766K

duckiesfloat retweeted

Phoenix Yin

@Phoenixyin13

about 11 hours ago

这是关于科研的重要理解。可能结合了很多自己科研时的一些想法。 1.自己挑问题，别接盘。大多数人问题都是别人给的，比如导师、arxiv trending、群里刷到的热门方向。你拿来就跑，结论早被别人想透，竞争还残酷。厉害的人都是是反着来，先想清楚我真想让这个东西存在，再倒推实验。目标够真， originality 就会长出来。 2.taste 是练出来的肌肉，并非天赋。每次实验前先自己猜结果。把论文结果盖住，只看方法猜数字。每个月新出的东西，赌两年后哪个还站得住。猜完对答案，错了就改。几百次循环下来，你脑子里的模型就比别人准。 3.输入要换血，不吃二手信息。 arxiv 热榜、群聊过滤后的东西，大家同时得出一样结论，那结论的价值有待评定。去读老东西，比如1991 年的 MoE、1986 年的 backprop、Sutton 的 Bitter Lesson、Shannon 1952 年那篇怎么把问题拆小再重建。或许，比现在十倍长的 survey 还管用。读原 paper，那是经典。不读thread，appendix 和 limitations 里才藏尸体。 4.脑子里的东西必须写下来。想法在脑子里永远觉得自己完美，一写就露馅。没测过的假设、推不下去的步骤、互相矛盾的两个 claim。 Feynman 那套是，第一个要骗的人是你自己。失败实验也记，比如假设、设置、预期、结果、更新后的信念。回头翻上个月的日志，是个好习惯。 5.研究速度 = 发现自己错的速度。工具是研究本身。一键跑实验、一键画图、config 就能复现。Karpathy 的思想在我看来最值钱，先在单个 batch 上 overfit，把 bug 干掉再放大。工程和研究早融一块了，不会搭 pipeline 的人，假设永远得不到检验。 6.别只看 loss 曲线，盯着输出里的怪东西。失败 case、transcript、分布尾巴，这些比准确率小数点后两位值钱多了。 Andrew Ng 教了十几年。拉一百个失败样本，分堆，重点攻击最大的那一堆。benchmark 你从来没读过它的 transcript，就等于你根本不懂它在测什么。 7.故意在几个子领域晃，跑 disposable idea。第一个方向多半是撞上的时间事故，别当终身大事。先到处试，找到自己那点怪能形成不对称优势的地方。 idea 先跑个廉价版，大部分早死。 ablate 到知道到底哪个组件在扛结果，别被标题骗。 8.慷慨和开着门是复利最高的行为。关门的人当年产出高，开门的人做的事真正重要，因为中断里带着世界真正需要的信息。复现别人结果、开源自己工具、把难东西讲明白，这些事侧面回报来得又快又猛。半成品想法也扔出去，被人当面怼比沉三个月再发现是垃圾便宜得多。每天那些看起来微不足道的边际，读什么、记什么、跑多快、跟谁较真，堆几年就是别人眼里的运气与天赋。越早起步，越占便宜。祝愿大家科研顺利！

127

128

12K

duckiesfloat retweeted

Claude

@claudeai

2 days ago

Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision. The longer and more complex the task, the larger Fable 5’s lead over our other models.

claudeai's tweet photo. Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision.

The longer and more complex the task, the larger Fable 5’s lead over our other models. https://t.co/DxgSu0KUxh

498

15K

duckiesfloat retweeted

Srinath Sridhar @drsrinathsridha

7 days ago

Super proud of my students and collaborators, especially @FuWanjia and @Hongyu_Lii for the Outstanding Paper Award at the Sense of Space workshop @CVPR for UniTac: https://t.co/2hvrWNmbcB

duckiesfloat retweeted

Fei-Fei Li

@drfeifei

8 days ago

https://t.co/Kt50ttQRMJ

160

939

969K

duckiesfloat retweeted

Fei-Fei Li

@drfeifei

14 days ago

It’s a real honor to receive an honorary doctorate of science from @BrownUniversity . 😍

107K

duckiesfloat retweeted

Keshigeyan Chandrasegaran

@keshigeyan

13 days ago

1/ Introducing GPIC: a Giant Permissive Image Corpus and benchmark for visual generation! 🚀100M VLM-captioned image-text pairs for training 📊1M image-text pairs for benchmarking 🖼️~28 trillion pixels 🤗Centrally Hosted ✅Fully permissive for research + commercial use Dataset, benchmark and models🧵👇 Co-led with @KyleSargentAI

keshigeyan's tweet photo. 1/ Introducing GPIC: a Giant Permissive Image Corpus and benchmark for visual generation!

🚀100M VLM-captioned image-text pairs for training
📊1M image-text pairs for benchmarking
🖼️~28 trillion pixels
🤗Centrally Hosted
✅Fully permissive for research + commercial use

Dataset, benchmark and models🧵👇

Co-led with @KyleSargentAI

370

231

143K

duckiesfloat retweeted

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

9 days ago

VISReg: Variance-Invariance-Sketching Regularization for JEPA training "We propose VISReg (Variance-Invariance-Sketching Regularization), a novel method that prevents embeddings from collapse while learning good representations." "By decoupling scale and shape, VISReg combines VICReg’s flexibility with the distributional rigor of sketching methods, providing robust gradients even under collapse" "Pre-trained on ImageNet-22K, it matches DINOv2’s OOD performance despite the latter using 10× more data (LVD-142M)."

iScienceLuvr's tweet photo. VISReg: Variance-Invariance-Sketching Regularization for JEPA training

"We propose VISReg (Variance-Invariance-Sketching Regularization), a novel method that prevents embeddings from collapse while learning good representations."

"By decoupling scale and shape, VISReg combines
VICReg’s flexibility with the distributional rigor of sketching methods, providing robust gradients even under collapse"

"Pre-trained on ImageNet-22K, it matches DINOv2’s OOD performance despite the latter using 10× more data (LVD-142M)."

113

duckiesfloat retweeted

Michael Tschannen @mtschannen

8 days ago

For the past years my research focus was on unifying models and training paradigms across modalities. Today I'm excited that we're releasing our latest model aligned with this theme: Gemma 4 12B, a dense encoder-free model which processes raw text, image, and audio inputs! 1/

mtschannen's tweet photo. For the past years my research focus was on unifying models and training paradigms across modalities. Today I'm excited that we're releasing our latest model aligned with this theme:

Gemma 4 12B, a dense encoder-free model which processes raw text, image, and audio inputs!

1/ https://t.co/4J2JKCtzU5

129

537

108K

duckiesfloat retweeted

DailyPapers

@HuggingPapers

10 days ago

ByteDance Seed removes the VAE bottleneck from unified multimodal models Their technique, Representation Forcing, lets decoders predict visual representations before pixels so generation and understanding share one end-to-end space.

HuggingPapers's tweet photo. ByteDance Seed removes the VAE bottleneck from unified multimodal models

Their technique, Representation Forcing, lets decoders predict visual representations before pixels so generation and understanding share one end-to-end space. https://t.co/qKymGu1KN3

157

113

10K

duckiesfloat retweeted

Phoenix Yin

@Phoenixyin13

8 days ago

在研究界，范式上降维打击远比原地搬砖硬砸要容易得多。你不需要在两个领域同时成为最顶尖的专家，你只需要把 A 领域已经玩烂了的、极其成熟的硬核工具，精准地平移到 B 领域那个刚刚兴起、大家都还在抓耳挠腮的混乱泥潭里。因为 B 领域的人没见过这么好用的锤子，而 A 领域的人又没注意到这颗新钉子。以我最了解的AI x 心理学的案例举例。第一步，盘点你的硬核锤子，成熟领域 A 这里的锤子，必须具备两个特点。逻辑极度严密。有一套公认的数学或行为实验范式。实验心理学的行为范式与严苛控制，比如心理学里的元认知量化、态度与Persuasion Models。这些体系经历了无数行为实验的洗礼，在统计学和控制变量的设计上极其严密。多模态视觉-文本的底层表征工具，比如用来把图像和文字映射到同一特征空间的数学模型与表征技术，其特征提取和向量对齐的计算方式在机器学习中已经高度标准化。第二步，锁定有热度、有盲区的新钉子，前沿领域 B 这颗钉子往往出现在当下最火、资源最多、但大家都还在黑盒里乱撞的领域，比如大模型、生成式 AI、AI 伦理与人机交互前沿领域的普遍痛点是缺乏硬性的评估标准和底层的解释工具。大模型在长文本或长序列生成时的Stability到底怎么定量评估？目前的评测集很多都在流于表面。人类的意识带宽、感知边界在面对爆发的多模态信息刺激时，传统工具往往由于维度太低而无法精准量化。第三步，像素级合体，这就是你的硬核 Idea 有了锤子和钉子，接下来就是套利的发生瞬间。（你就会生成一些想法和idea）第四步，用 AI 榨干这个 Idea 的生存概率实操降噪当你通过跨界平移组合出一个初步想法后，不要马上去写论文或做实验，先用大模型做极限压力测试。我最喜欢给 AI 的降噪 Prompt 框架： “我现在试图把 A 领域的 [具体成熟方法论/实验范式] 平移到 B 领域的 [具体前沿问题] 中。请你作为最挑剔的Reviewer ，从以下三个硬核维度攻击我： 01 这两个领域在底层假设上有什么不可调和的冲突？（防范逻辑硬伤） 02 B 领域的哪些特有噪声或变量，会彻底摧毁 A 领域原本严密的实验控制？ 03 这种平移，会不会沦为一个毫无实际学术价值的‘玩具应用’？如果是，怎么修正才能让它具备硬性的定量科学意义？” 通过这种方式，AI 会逼着你把所有模糊的、宏大的概念，全部缩减成可测量的变量、可控的实验、可对比的 Metric。看完这个系统性的拆解，如果让你把自己手头最熟悉最扎实的那个硬核工具，去尝试暴力拆解一个最近关注的前沿黑盒吧！

116

108

11K

duckiesfloat retweeted

Phoenix Yin

@Phoenixyin13

8 days ago

一张嘴就让大模型给你吐个创新的科研idea，得到的绝对是100%的废话。大模型的预训练权重是过去人类知识的统计平均值。你跟它闭眼瞎聊，它只能给你吐出最符合统计概率的、最平庸的陈词滥调。想落地、不讲官话地搞出硬核idea，得把AI当成一个拥有无限带宽、但极度缺乏上下文的顶级工具人。比如你想研究交叉学科。别问：“AI和心理学结合有什么新方向？” 这种大而无当的问题，无法获取任何信息。正确做法是把最近顶级会议或期刊里最硬核的20篇相关论文全文本、或者最新的实验数据集丢进它的上下文，利用大模型联网检索或长文本RAG。让AI去干一件事：找冲突。提示词工程：“对比A论文的方法论和B论文的结论，找出他们在解释某种认知表征或感知边界时，在实验边界上有什么不可调和的矛盾？” 好的idea，永远诞生在两条已知铁轨发生剧烈撞击的断裂带上。基础文献调研没做透，跟AI聊再多都是空中楼阁。好的idea，要有程序员思维。高阶选手写代码前先花大量时间出框架、写开发文档。搞科研idea也一样，在动手前，你得先让大模型帮你把Constraints做成铁板一块。 01 核心变量是什么？是否可量化？如果无法用数据或高维特征向量落地，直接毙掉。 02 评估的Metric是什么？用AI去疯狂攻击你初步构想的实验设计，让它扮演一个最刻薄的审稿人Reviewer，比如从实验控制和统计学角度，指出你这个实验框架最致命的3个逻辑漏洞。在idea孵化阶段，就被打成筛子的点子，总比你花几个月做完实验才发现是垃圾要强。如果帖文反馈不错，下一期，我会出方法论。

358

283

31K

duckiesfloat retweeted

Kai Xu @itskaixu

9 days ago

Image editing models can put you on the Moon, but can they precisely move a circle right by 50 pixels? 📐 Introducing 🎨PaintBench: a foundational eval of visual editing operations with only one right answer. The highest-performing model (@NanoBanana 2) reaches only 17.1%.

duckiesfloat retweeted

Sihyun Yu

@sihyun_yu

8 days ago

Can MLLMs actually track what's happening in a video? Introducing VSTAT 🎯, our new benchmark for visual state tracking. The tasks are simple: count cups, read typed words, count page flips. Humans solve them easily. MLLMs don't. https://t.co/dgqhqeVuSv 🧵 [1/11]

233

125

159K

duckiesfloat retweeted

Mark Manson

@Markmanson

10 days ago

You don’t find your purpose. You build it, brick by brick, mistake by mistake.

269

21K

364K

duckiesfloat retweeted

Reads with Ravi

@readswithravi

10 days ago

I’m in love with this sentence: “The degree to which a person can grow is directly proportional to the amount of truth he can accept about himself without running away.”

12K

227K

duckiesfloat retweeted

Christopher Potts

@ChrisGPotts

10 days ago

We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Huang and @EkdeepL, traces this to a data-induced competition for resources (neurons), using formal analysis, idealized tasks, and real pretraining.

ChrisGPotts's tweet photo. We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Huang and @EkdeepL, traces this to a data-induced competition for resources (neurons), using formal analysis, idealized tasks, and real pretraining. https://t.co/vqRUUe6whP

915

137

842

137K

Ian Liu 劉以恆

@duckiesfloat

Last Seen Users on Sotwe

Trends for you

Most Popular Users