Jianchuanman @jelloli - Twitter Profile

jelloli retweeted

2 days ago

Vision-language AI models have a gaze. And you can steer it! 👀 Redirect just 9% of a model’s attention heads to any region in an image, and the VLM will start describing that region mid-generation. We call them Gaze Heads! Try the demo: https://t.co/y5jlb0iBI8 🧵👇

11

493

91

350

45K

jelloli retweeted

开发者Hailey

@IndieDevHailey

2 days ago

🔥全球已超18亿次模拟！PhET免费神器让科学变游戏！全球学生、家长、老师都在刷的 PhET Interactive Simulations，直接把枯燥课本变成超好玩的互动游戏！核心亮点： - 科罗拉多大学出品，诺贝尔物理奖得主 Carl Wieman 创立 - 完全免费、无广告、支持中文，浏览器秒开（还能下载离线版） - 160+ 个超逼真互动实验，覆盖物理、化学、生物、数学、地球科学 - 小学到大学全年龄适用，概念秒懂，兴趣直接拉满最上头的几个模拟： - 电路构建（自己搭电路，灯亮瞬间成就感爆棚） - 构建原子（拖电子看能级，量子世界秒懂） - 重力与轨道（调行星运动，宇宙大片既视感） - 酸碱反应、波的干涉等孩子学科学不再痛苦，只剩一句：“妈妈再玩五分钟！” 老师上课神器，家长陪娃利器。

21

174

43

212

17K

jelloli retweeted

Science girl

@sciencegirl

8 days ago

How to make a Pikachu book mark

129

20K

4K

18K

2M

jelloli retweeted

HomeMadeGarbage

@H0meMadeGarbage

8 days ago

ほいもろもろのデータ https://t.co/dgexgiJXCG

0

24

2

21

2K

Who to follow

Author, serial entrepreneur, and innovator across, wireless, satellite, video, and SaaS. Former CTO-Dell, CTO -TI, SVPSW- Comcast. Founder of 3 startups

jelloli retweeted

9 days ago

https://t.co/ZhOgyq7Vgn

186

12K

2K

30K

5M

jelloli retweeted

Addy Osmani

@addyosmani

10 days ago

https://t.co/hIe0UX7z6T

327

8K

1K

18K

2M

jelloli retweeted

Ilir Aliu

@IlirAliu_

9 days ago

ETH Zurich just open-sourced their entire 2026 robot learning course. Not a MOOC. The actual course. Slides, lecture recordings, coding assignments, GitHub repo. The curriculum goes from imitation learning and RL all the way to Vision-Language-Action models and foundation models for robotics. Guest lectures from the co-founder of Physical Intelligence. The creator of Diffusion Policy. Pieter Abbeel. Dieter Fox. 12 weeks. Free. No signup. If you want to understand where robot intelligence is actually heading… this is the reading list the field is using right now. 📍[https://t.co/eKsIjILi60] —— Weekly robotics and AI insights. Subscribe free: https://t.co/9Nm01QUcw3

21

2K

312

3K

117K

jelloli retweeted

Viking

@vikingmute

11 days ago

分享一篇文章：《How LLMs Actually Work》 https://t.co/apHhTvjdiB 好像是前几天 HackerNews 排名第一来着，类似的文章很多，但是这篇深入浅出和直观的例子非常适合有一定编程但没深入学Transformer的人阅读，里面的比喻也恰当，一看就是活人写的，没什么 AI 味道。最近重新爱上了写东西，写了两篇技术文章，之后还会继续写，而且我的一个原则，活人写，绝对不用 AI，写作是一种乐趣，梳理逻辑，表达观点，不要让这种乐趣被 AI 剥夺。

23

648

134

1K

83K

jelloli retweeted

Alok

@analogalok

11 days ago

Run Gemma 4 26b MTP on 8 GB VRAM GPUs at 25+ tokens/second. Flags included! local llm space is moving at terminal velocity. only 3 days ago google released gemma 4 26b a4b qat quants. more efficient than before, ran on 8gb vram at 20 tok/sec. and now just a few hours ago, mainline llama.cpp merged a massive update and we just shattered our own record. decode throughput went 25-40% up on the same 8 GB VRAM setup! Before MTP: 20 tps -> After MTP: 28 tps! llama.cpp just officially merged PR #23398 ("add Gemma4 MTP"), bringing native Multi-Token Prediction (MTP) support to Gemma 4 models. By running speculative drafting on the same 8GB VRAM RTX 4060 setup, my decode throughput on a 64k context instantly leaped to a blistering 25–27 tokens/sec thats 25-30% increase with the same hardware. Here is the architectural catch you need to know: Unlike the Qwen 3.5 and 3.6 series, which bake the MTP heads directly into the base GGUF, the Gemma 4 MTP head is not built in. You must download a separate, specialized MTP drafter GGUF (the assistant model) to act as the speculator. (I've dropped the download link in the replies). copy and try the exact flags: -m gemma-4-26B-A4B-it-qat-UD-Q4_K_XL.gguf --spec-type draft-mtp --spec-draft-n-max 6 --spec-draft-p-min 0.7 --spec-draft-model gemma-4-26b-A4B-it-assistant-Q4_0.gguf -c 64000 -v n-max 4 and p-min 0.7 is also worth checking out. benchmark on your setup and workflow. if you have a single 8 gb vram nvidia rtx 4060, 3060, 3070, 2080, 2070, grab the MTP drafter GGUF link in the comments and try it yourself. Check it out even if you have asmaller or a larger gpu, such as a single rtx 3090, 4090, 3060, 2060. MTP works for all gemma 4 sizes such as gemma 4 12b, gemma 4 31b etc. but remember to grab the correct mtp draft assistant models respectively. what are you benchmarking today

31

338

47

518

195K

jelloli retweeted

Cell 细胞

@cellinlab

10 days ago

Loop Engineering 正在取代“你亲自给 agent 写 prompt”这件事。它的核心是：你不再直接 prompt agent，而是设计一个系统，让这个系统去 prompt agent。

44

295

70

411

126K

jelloli retweeted

Christoph Nakazawa

@cnakazawa

11 days ago

@dok2001 https://t.co/ABnaXqL45x

1

122

6

253

44K

jelloli retweeted

Roland.W

@rwayne

15 days ago

醍醐灌顶

2

100

20

140

39K

jelloli retweeted

The Math Flow

@TheMathFlow

15 days ago

The Hidden Geometry of Trigonometric Functions and the Unit Circle.

11

1K

285

732

53K

jelloli retweeted

WangNextDoor

@WangNextDoor2

16 days ago

十二项长期好习惯能带来人生指数级成长，从复盘、深耕、理财到自律，每项都有据可循、落地可行。不必一次性全践行，挑选当下最抗拒的一件小事起步，循序渐进拓展自身边界。长期深耕微小改变，日积月累便能拉开人生差距，收获复利式蜕变。

2

119

25

128

6K

jelloli retweeted

Jackywine

@Jackywine

16 days ago

今天绝对不能错过 Anthropic 这篇博客运营一个人工智能原生组织 https://t.co/ciu2Y3511j 顺便问一下大家有无更好的翻译模型

7

254

48

408

41K

jelloli retweeted

总裁简报 CEO Briefing

@CEOBriefing

16 days ago

$AAPL 创始人史蒂夫·乔布斯，1983年给员工们，上了一堂“股票期权”大师课。当时年仅28岁的乔布斯，不仅通俗易懂地拆解股票期权的底层逻辑，还分享了他对于“为什么来苹果上班”的经典管理哲学：大家来苹果，首先不是为金钱和股票。只有对产品的狂热和改变世界的愿景，才能支撑员工熬过高强度的研发之夜…

66

250

51

244

59K

jelloli retweeted

Max For AI

@MaxForAI

16 days ago

昨天字节Seed开源了一个非常有意思的checkpoint⬇️ TaskMem 它基于Qwen3-VL-30B-A3B训练，目标不是直接回答问题，而是让多模态Agent在视频/环境流里学会生成更有用的长期记忆。重点是让Agent学会在连续视频/环境流里判断「什么值得被记住」，而不是把记忆当成简单摘要、RAG库或者剪贴板。对应的论文叫《Task-Focused Memorization for Multimodal Agents》，作者是Tao Zou、Yichen He、Tian Qiu、Yuan Lin、Hang Li，来自ByteDance Seed和复旦。论文里的核心方法是两阶段训练。第一阶段学「怎么记」。用RL训练记忆生成策略，让它生成准确、不重复、格式稳定、信息量足够的episodic memory。论文里用GSPO做训练，奖励包括format、thinking length、quality、richness。这里有个细节很有意思：他们专门加了richness reward，因为只优化质量会被模型钻空子，生成很短但看起来没错的记忆。模型嘛，一旦发现考试漏洞，作弊速度比大学生还快。第二阶段学「该记什么」。部署后，根据最近环境里出现的任务/问题，训练一个很轻的adapter，让模型把记忆焦点转向未来更可能用到的信息。论文里说这个adapter只有2048个可训练参数，主模型冻结，用DPO优化；它更像一个「任务方向的记忆偏置向量」。实验设计很有意思，他们把VideoMME、EgoLife、EgoTempo改造成streaming任务。 Agent先看视频流并生成记忆，问题后出现，回答时不能再看原视频，只能看生成出来的记忆。这个设定比普通视频问答更接近真实Agent，因为真实环境里你也不能每次都把录像倒回去重看，虽然我很想这么干。结果上，TaskMem在三个benchmark上的准确率是VideoMME67.9、EgoLife45.4、EgoTempo27.6。相比基础Qwen3-VL-30B-A3B的61.6、38.4、22.3，提升分别是6.3、7.0、5.3个百分点。它在VideoMME和EgoLife上超过了表里的GPT-5.2；在EgoTempo上准确率低于GPT-5.2，但precision更高。这个方向对personal AI、embodied agent、截图记忆、视频理解都很有启发。比如用户截图很多，难点不只是搜出来，而是系统能不能提前知道哪些截图、哪些细节、哪些上下文以后会有用。链接：https://t.co/66g84awIMQ

MaxForAI's tweet photo. 昨天字节Seed开源了一个非常有意思的checkpoint⬇️

TaskMem

它基于Qwen3-VL-30B-A3B训练，目标不是直接回答问题，而是让多模态Agent在视频/环境流里学会生成更有用的长期记忆。

重点是让Agent学会在连续视频/环境流里判断「什么值得被记住」，而不是把记忆当成简单摘要、RAG库或者剪贴板。

对应的论文叫《Task-Focused Memorization for Multimodal Agents》，作者是Tao Zou、Yichen He、Tian Qiu、Yuan Lin、Hang Li，来自ByteDance Seed和复旦。

论文里的核心方法是两阶段训练。

第一阶段学「怎么记」。

用RL训练记忆生成策略，让它生成准确、不重复、格式稳定、信息量足够的episodic memory。

论文里用GSPO做训练，奖励包括format、thinking length、quality、richness。

这里有个细节很有意思：他们专门加了richness reward，因为只优化质量会被模型钻空子，生成很短但看起来没错的记忆。
模型嘛，一旦发现考试漏洞，作弊速度比大学生还快。

第二阶段学「该记什么」。

部署后，根据最近环境里出现的任务/问题，训练一个很轻的adapter，让模型把记忆焦点转向未来更可能用到的信息。

论文里说这个adapter只有2048个可训练参数，主模型冻结，用DPO优化；它更像一个「任务方向的记忆偏置向量」。

实验设计很有意思，他们把VideoMME、EgoLife、EgoTempo改造成streaming任务。

Agent先看视频流并生成记忆，问题后出现，回答时不能再看原视频，只能看生成出来的记忆。

这个设定比普通视频问答更接近真实Agent，因为真实环境里你也不能每次都把录像倒回去重看，虽然我很想这么干。

结果上，TaskMem在三个benchmark上的准确率是VideoMME67.9、EgoLife45.4、EgoTempo27.6。

相比基础Qwen3-VL-30B-A3B的61.6、38.4、22.3，提升分别是6.3、7.0、5.3个百分点。

它在VideoMME和EgoLife上超过了表里的GPT-5.2；在EgoTempo上准确率低于GPT-5.2，但precision更高。

这个方向对personal AI、embodied agent、截图记忆、视频理解都很有启发。

比如用户截图很多，难点不只是搜出来，而是系统能不能提前知道哪些截图、哪些细节、哪些上下文以后会有用。

链接：https://t.co/66g84awIMQ

4

68

11

59

6K

Jianchuanman @jelloli

18 days ago

Check out All Things Fair #hoopladigital https://t.co/8gUXtUxNC1

0

3

jelloli retweeted

⚡AI Search⚡

@aisearchio

23 days ago

NVIDIA's LocateAnything is a new vision model for grounding and detection. Very performant and accurate! > 10x faster than Qwen3-VL > 138M queries + 785M boxes > GUI, OCR, docs, dense detection > Free & open source https://t.co/UvkH8l0QRb

33

2K

253

2K

120K

jelloli retweeted

Amto

@XAMTO_AI

23 days ago

音频领域又炸出一个开源猛货，MOSS-Audio 来了。 4B 和 8B 两种尺寸，每个都有 Instruct 和 Thinking 版，任你选。最硬核的是它把六项能力塞进了一个模型： 1️⃣ 语音识别（ASR） 2️⃣ 说话人分离——谁在说话，分得清清楚楚 3️⃣ 情绪识别——听得懂你是高兴还是烦躁 4️⃣ 环境音解析——雨声、车流、键盘声都能识别 5️⃣ 音乐理解——不只是识别歌名，是真的听懂结构 6️⃣ 带时间戳的 ASR——精确到每个字什么时候说的时间戳 ASR 这块，直接把 Gemini 2.5 Pro 甩开一大截，不是略胜，是碾压。以前做音频处理要拼一堆模型，现在一个全搞定，还开源。字幕、播客、客服质检、音乐标注，落地成本直接打下来了。 OpenMOSS 团队低调出手，行业震动。去 HuggingFace 直接拿。 🔗 https://t.co/Iy4LUYpvD2

XAMTO_AI's tweet photo. 音频领域又炸出一个开源猛货，MOSS-Audio 来了。

4B 和 8B 两种尺寸，每个都有 Instruct 和 Thinking 版，任你选。

最硬核的是它把六项能力塞进了一个模型：

1️⃣ 语音识别（ASR）
2️⃣ 说话人分离——谁在说话，分得清清楚楚
3️⃣ 情绪识别——听得懂你是高兴还是烦躁
4️⃣ 环境音解析——雨声、车流、键盘声都能识别
5️⃣ 音乐理解——不只是识别歌名，是真的听懂结构
6️⃣ 带时间戳的 ASR——精确到每个字什么时候说的

时间戳 ASR 这块，直接把 Gemini 2.5 Pro 甩开一大截，不是略胜，是碾压。

以前做音频处理要拼一堆模型，现在一个全搞定，还开源。字幕、播客、客服质检、音乐标注，落地成本直接打下来了。

OpenMOSS 团队低调出手，行业震动。

去 HuggingFace 直接拿。

🔗 https://t.co/Iy4LUYpvD2

45

753

129

1K

45K

Jianchuanman

@jelloli

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users