zhenheng tang

@ZhenhengT

CS PhD, Machine learning and MLsys. Homepage: Scholar:

Hong Kong

Joined November 2018

655 Following

171 Followers

248 Posts

ZhenhengT retweeted

Mr Panda

@PandaTalk8

7 months ago

1989年的杨乐昆在他的电脑上演示了卷神经网络的demo。

142

439

376K

ZhenhengT retweeted

张小珺 Xiaojun Zhang

@zhang_benita

7 months ago

“我觉得很多人，尤其最优质的人才，那些被挖的人，他真正care的不一定是钱，他真正care的是如果这个变革发生，他们希望在driver's seat。我觉得一个有使命感的人，他不会容忍说I'm on a wrong ship。我一定要在the right ship上。”(对话DeepMind谭捷） https://t.co/z6jc8qmNsa

348

125

98K

ZhenhengT retweeted

Delip Rao e/σ

@deliprao

7 months ago

RL optimized LLM learning new skills

460

723

532K

ZhenhengT retweeted

Dimitris Papailiopoulos

@DimitrisPapail

8 months ago

datasets (eg openthoughts, dclm) model studies (eg ICL studies, model inversion, etc) RL/post training/self improvment even on small scale settings (even small arithmetic/maze examples) prompt optimization (eg GEPA) efficiency/architectures (eg flash attention/mamba/ssms etc) adversarial attacks/robustness etc the point is you can run a ton of meaningful experiments with <10k on lambda so much has happened in the open source/academic side of things.

Who to follow

PhD candidate in CSE @michiganstateu

Bhargav Kowshik

@bkowshik

NeuroAI → Brain-Computer Interfaces

ZhenhengT retweeted

Orange AI

@oran_ge

8 months ago

昨天和朋友聊起美国的裁员潮，大家都对一件事看得越来越清晰： AGI 为人类带来最大的威胁，根本不是安全对齐，而是朴素的就业问题在 AGI 真正实现之前，甚至都不需要完全实现，人类就没多少工作了为什么会这样？这和人类公司的 ROI 导向有关当公司更需要显卡而不是员工，股东和CEO就会裁掉员工买显卡人类的行为是相当被动的，很难靠意志转移。有一只无形的手在把人类推向那个未来。而为什么人类需要就业呢？除了金钱的因素外，也是和人类的本性有关大部分人无法给自己设定目标在无所事事的状态下，会越来越堕落工作给了他们一个目标，一个循环，一个月薪的奖励机制其实是人类文明几千年下来是一种有效的社会机制所以就业不仅是经济需求，还是精神需求而 UBI 无法提供这个奖励，不会满足这个需求破局点在人类要找到一个自己的健脑房能在不需要工作的时候也能获得这种奖励这就需要一种既能顺人性又不会导致精神堕落的产品这也是 AGI 时代内容行业的新机会

550

295

198K

ZhenhengT retweeted

马东锡 NLP

@dongxi_nlp

8 months ago

「 Fake Reasoning Bias， Let Me Think 」只需一句 let me think，就能欺骗大型模型。作者提出 Fake Reasoning Bias，指模型会将看似推理的表达误判为高质量信号，即便这些推理并无逻辑。当在两个选项之间插入一句 let me think，虽然语义完全不变，却使文本形式更像在思考。这一极小提示足以让大型语言模型，尤其是带有显式推理机制的模型，偏离正确答案。 Towards Evaluting Fake Reasoning Bias in Language Models

dongxi_nlp's tweet photo. 「 Fake Reasoning Bias， Let Me Think 」

只需一句 let me think，就能欺骗大型模型。

作者提出 Fake Reasoning Bias，指模型会将看似推理的表达误判为高质量信号，即便这些推理并无逻辑。

当在两个选项之间插入一句 let me think，虽然语义完全不变，却使文本形式更像在思考。

这一极小提示足以让大型语言模型，尤其是带有显式推理机制的模型，偏离正确答案。

Towards Evaluting Fake Reasoning Bias in Language Models

ZhenhengT retweeted

Tsinghua University

@Tsinghua_Uni

9 months ago

Prof. Chen Ning Yang, a world-renowned physicist, Nobel Laureate in Physics, Academician of the Chinese Academy of Sciences, Professor at Tsinghua University, and Honorary Director of the Institute for Advanced Study at Tsinghua University, passed away in Beijing due to illness at the age of 103. His life stands as a timeless chapter in human history—one that shines not only for China but for the global community of thinkers and innovators. His legacy will live on forever.

Tsinghua_Uni's tweet photo. Prof. Chen Ning Yang, a world-renowned physicist, Nobel Laureate in Physics, Academician of the Chinese Academy of Sciences, Professor at Tsinghua University, and Honorary Director of the Institute for Advanced Study at Tsinghua University, passed away in Beijing due to illness at the age of 103. His life stands as a timeless chapter in human history—one that shines not only for China but for the global community of thinkers and innovators. His legacy will live on forever.

209

720

185

433K

ZhenhengT retweeted

宝玉

@dotey

9 months ago

陶哲轩：我有一个初步的看法：如今社会中的各种系统、激励机制和技术发展，轻微地增强了个人的能力，大幅地强化了大型组织的力量，却极大地挤压了小型组织的生存空间。在整个人类社会的生态系统中，小组织的角色越来越不重要，要么逐渐被边缘化，要么被大组织兼并或取代。这种不平衡的社会结构，尽管给人们带来了物质上的舒适（尽管舒适程度分配并不公平），也给予人们一种有限的掌控感（agency），但在个人心理层面却造成了严重的后果。人们开始感到孤独、疏离、缺乏归属感，并产生了深深的无力感和对未来的悲观情绪。大部分人不相信自己能够影响未来或解决重大的挑战，除非通过激烈甚至残酷的竞争，让自己变得极其富有或有影响力，从而获得类似一个小型甚至大型组织才能拥有的社会地位。而那些规模更大的组织，则在一定程度上填补了小型社区消失后留下的空白，它们向人们提供一些合成的社会或情感产品。然而，这些产品在真实性和亲密感上，就如同高度加工的“垃圾食品”与真正健康食物之间的差距一样巨大。因为大型组织天然带有一种冷漠和非个人化的特点，这一点在先进算法和人工智能（AI）时代尤为明显。更糟糕的是，如果任由这些技术自由发展，它们往往会进一步加剧上述负面趋势。

dotey's tweet photo. 陶哲轩：我有一个初步的看法：如今社会中的各种系统、激励机制和技术发展，轻微地增强了个人的能力，大幅地强化了大型组织的力量，却极大地挤压了小型组织的生存空间。在整个人类社会的生态系统中，小组织的角色越来越不重要，要么逐渐被边缘化，要么被大组织兼并或取代。

这种不平衡的社会结构，尽管给人们带来了物质上的舒适（尽管舒适程度分配并不公平），也给予人们一种有限的掌控感（agency），但在个人心理层面却造成了严重的后果。人们开始感到孤独、疏离、缺乏归属感，并产生了深深的无力感和对未来的悲观情绪。大部分人不相信自己能够影响未来或解决重大的挑战，除非通过激烈甚至残酷的竞争，让自己变得极其富有或有影响力，从而获得类似一个小型甚至大型组织才能拥有的社会地位。

而那些规模更大的组织，则在一定程度上填补了小型社区消失后留下的空白，它们向人们提供一些合成的社会或情感产品。然而，这些产品在真实性和亲密感上，就如同高度加工的“垃圾食品”与真正健康食物之间的差距一样巨大。因为大型组织天然带有一种冷漠和非个人化的特点，这一点在先进算法和人工智能（AI）时代尤为明显。更糟糕的是，如果任由这些技术自由发展，它们往往会进一步加剧上述负面趋势。

334

909

326K

zhenheng tang

@ZhenhengT

9 months ago

When can academia obtain enough GPU resources.......?

Dimitris Papailiopoulos

@DimitrisPapail

9 months ago

Prediction: In ~3 years academia will be the most desirable place to do fundamental AI research Contributing factors: - small models improve/become significantly more impactful - open weights community broadens its reach - gpus continue to get faster & cheaper - meaningful post-training/RL experiments become more and more tractable - raw capabilities of large models plateau (100% acc is actually a wall) => "foundation models" become commodity => product matters more there will obviously be incredibly important problems at the frontier of a gazillion parameters, of models launching 100k agents, and training incredibly complex systems with one million gpus. But there will be so many more and incredibly important problems at the hands of a community that is free to ask any questions they like, and benefits directly from sharing with everyone else.

466

191

58K

ZhenhengT retweeted

fin

@fi56622380

9 months ago

@frxiaobei "AI 应用不应该去比谁更智能，而应该去比谁更懂人类和人类不想动脑的那一瞬间" 非常赞同，这是意图经济的新范式 https://t.co/Nn4oZrITQJ

ZhenhengT retweeted

凡人小北

@frxiaobei

9 months ago

在 OpenAI 最新那篇《How People Use ChatGPT》的研究报告里，可以看到一个很多技术人不愿意承认的事实。我们天天讨论 AI 的未来、模型的能力、Agent 的协同，但普通人真正反复在用的确是最不起眼、最没技术含量、但最能偷懒的那一类小脑力动作。很多的创业方向都是 AI 重构操作系统，但在报告里能看到的基本都是这样的提问场景： “我懒得写，你帮我润色下” “这事我大概懂，但你能快速解释一下吗” “我脑子卡住了，你先给我几个思路我再改” 就是这类小到不能再小、但一天下来会出现无数次的轻认知需求。要说这些任务值钱吧，好像也不大值钱；但要说不值钱吧，每一次都真想掏出点什么东西来换时间、换注意力、换一口气不费脑的轻松感，于是，这反而成了 ChatGPT 用得最频繁的几个场景。报告里有个特别关键的数据点：写作、实用建议、信息查询这三类用途，加起来占了用户对话的大头。注意！！！不是图像生成、代码开发和多模态探索之类的，就是字面意义上的“你帮我想点内容”、“你帮我写点东西”、“你告诉我这个怎么做”，极其朴素、但极其高频的脑力协助。更有意思的信息是，真正把这三类用法用在工作场景中的人占比也很高，尤其是在教育程度高/收入水平高/日常脑力劳动密度大的人群中。也就是说大量的高认知人群的低成本输出策略，用 AI 省点脑子，完全不是因为不会做，单纯的不想做或者不想做得那么费力。我意识到一个很本质的判断转变，AI 应用不应该去比谁更智能，而应该去比谁更懂人类和人类不想动脑的那一瞬间。很多技术人一个很大的错觉，以为大家想要一个能回答所有问题的 GPT，其实大家更想要一个能帮他们免于思考前5分钟的小工具；以为用户要的是全链路智能流程，其实用户更需要的是一个“我脑袋转不动了你先帮我垫一脚”的认知助理；以为大家要构建的是一个 super agent，但现实中能留下来的产品，很多时候可能只解决了一个问题，比如：懒得写。也正因为这样，我现在看“做什么 AI 应用能赚钱”这个问题，视角已经完全变了。别去想还能不能做一个内容平台、一个垂直模型、一个 SaaS 系统。应该反过来去问自己：我有没有办法，找到一个特别具体、特别细分、但特别常见的人类偷懒瞬间，然后围绕这个瞬间，去设计一套轻决策路径 +提示词模板 + 好的 UI 输出，让用户在最不想动脑的时候，最快拿到可修改的半成品。而当这个偷懒动作被频繁触发，它就自然变成了习惯性的AI 肌肉记忆，而我们所做的应用，也就从一个工具变成了大脑外挂。那 AI 产品的商业价值又该如何定义，可能有一类不在于能不能模拟一个人类专家，而在于能不能替用户做掉那些明明可以做但就是不想做的动作。真正的市场不应该只盯着智能的天花板，往下看看，再懒惰的地板上也有大量的机会。那再 AI 革命的宏大叙事下，我们追求的就不只是让人更强，让人更轻也应该进入视野。人类会为强大而敬畏，但也会为轻松而掏钱。思考下自己的日常，再环顾下市场，一个值得做的 AI 应用，不一定惊艳，但一定能替人类懒一次。所以，你想不想做一款 AI 产品，能替用户少动一次脑？你能不能用 prompt、memory、数据和一点点贴心，帮人类多偷一秒懒？如果可以，那它可能比我们写出一个能做十种事情的智能体，还更容易被买单和留存。这类的机会还有很多。

952

187

790

220K

ZhenhengT retweeted

Suhas Kotha @kothasuhas

10 months ago

Though none of the individual interventions we consider are new and are instead inspired by classical statistics + data-constrained ML, they show that algorithmic improvements are critical to greater data efficiency in a compute-rich future. We believe that correctly characterizing the asymptotes of scaling recipes will help design “general methods that leverage computation” for the future, in line with the Bitter Lesson

ZhenhengT retweeted

Suhas Kotha @kothasuhas

10 months ago

Finally, we test our gains on downstream tasks, finding a 9% improvement on standard benchmarks at our scale. Moreover, when applying our interventions to math data from OctoThinker, we achieve 17.5x data efficiency.

kothasuhas's tweet photo. Finally, we test our gains on downstream tasks, finding a 9% improvement on standard benchmarks at our scale. Moreover, when applying our interventions to math data from OctoThinker, we achieve 17.5x data efficiency. https://t.co/M5hMkfzEY8

ZhenhengT retweeted

Percy Liang

@percyliang

10 months ago

-2016 (classic era): focus on data efficiency 2017-2025 (pretraining era): focus on compute efficiency 2026-: focus on data efficiency (again) The standard Transformer paradigm is optimized for compute efficiency. As we look at data efficiency, we'll see very different design decisions, which will be exciting!

613

376

104K

zhenheng tang

@ZhenhengT

9 months ago

Yes, advocate for more research of small-scale models

Dimitris Papailiopoulos

@DimitrisPapail

10 months ago

Small models as the new frontier and why this may be academia's LLM moment Academia should reject the nihilism of "scale is all you need", i.e, that meaningful research requires frontier scale compute. This mindset hurts basic research and what we can contribute to machine learning in practice. Many interesting questions about architectures, data, and training methods do show signal and can be tested at the O(100M) to O(1B) parameter scale within reasonable budget. There seems to exist no fundamental reason why these insights wouldn't transfer and hold up to 14B, 32B, or even larger models. Yes, there will be trends and observations that break at the trillion parameter scale, but my conjecture is that this will be irrelevant for the majority of models people will actually deploy locally in the future. The economics of post-training (SFT/RL) are finally favorable for academia. Post training a 7B model fits on a single H100 GPU, which roughly $3/hour on cloud providers. You can train on 100M+ tokens for under $100. Why care about mid/post-training? That's where a lot of interesting problems are! Reasoning, tool use, specialization, etc, these are settings where you see meaningful performance improvements and skills learnt within millions of trained tokens, not billions, that are typical for pretraining. More importantly, the 4B-32B parameter range will likely dominate local deployment in the not so distant future. These models fit on reasonable hardware (a beefy laptop) as inference requires enough RAM to fit the model, but you can use without GPUs for single batch inf calls. Also these models, at that scale, are getting seriously good for tasks likecoding, math, tutoring, computer use etc. So here is my conjecture: local models at the <100B scale will eventually generate more tokens/day than api-hosted frontier models. This may be academia's moment! The open-weights ecosystem provides a path to real impact without million-dollar GPU clusters at this scale. Our research can directly study, understand, and improve the 99% of models that will run locally, not the 1% that require data centers. This is finally both possible and meaningful. Don't be discouraged by scale maximalism!!

335

215

66K

ZhenhengT retweeted

François Chollet

@fchollet

10 months ago

The most important skill for a researcher is not technical ability. It's taste. The ability to identify interesting and tractable problems, and recognize important ideas when they show up. This can't be taught directly. It's cultivated through curiosity and broad reading.

514

988

211K

ZhenhengT retweeted