bim

@bimsarap1

Veni, vidi, vici Ex Applied Scientist intern @amazon. Computer Science PhD student @asu Gen AI, Image Editing, Multimodal Reasoning, Autonomous Driving

Tempe, AZ

Joined March 2021

331 Following

94 Followers

158 Posts

Pinned Tweet

11 months ago

✨ Excited to share that our paper "RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing on Referring Expressions" has been accepted at ICCV 2025! 🌺📍See you in Hawaii! 🧵 Here's what we did, why it matters, and what we’re releasing ⬇️ (1/8)

1

3

2

0

438

bimsarap1 retweeted

AI Dance @AI_Whisper_X

about 1 month ago

挺有意思的研究。闭源实验室都对模型规模讳莫如深，但他们其实藏不住模型"知道什么"。而模型知道什么，恰恰就是参数量的指标。核心逻辑：推理能力可以靠蒸馏压缩到小模型里，事实知识不行。一个模型记得多少冷门事实，直接跟它的参数量挂钩。知乎博主李博杰为这个写了一篇小论文，构建了一套叫 IKP（不可压缩知识探针）的数据集：1400 个问题、7 层稀有度，扔到 27 家厂商的 188 个模型上跑了一遍，只看事实准确率。结果在 89 个公开参数的开源模型上，准确率 vs log(参数量) 的拟合 R²=0.917，基本是一条直线。把闭源模型投影上去，规模就估出来了： GPT-5.5 ≈ 9T Claude Opus 4.7 ≈ 4T GPT-5.4 ≈ 2.2T Claude Sonnet 4.6 ≈ 1.7T Gemini 2.5 Pro ≈ 1.2T （90% 置信区间：0.3-3 倍规模）另外两个发现也挺反直觉：一是引用数和 h-index 不能预测一个研究者是否被前沿模型认识。两个引用数相近的人，模型给的回答可能完全不一样。它记的是有影响力的工作，不是论文数量。二是事实容量不会被时间压缩。跨 3 年的 96 个开源模型，IKP 时间系数统计上为零（p<10⁻¹⁵），直接拒绝了 Densing Law 预测的 +0.0117/月衰减。benchmark 在饱和，但事实容量还在随参数继续扩张。来源：知乎博主李博杰侵权联系删 https://t.co/Bt5CiMGc5M

AI_Whisper_X's tweet photo. 挺有意思的研究。

闭源实验室都对模型规模讳莫如深，但他们其实藏不住模型"知道什么"。而模型知道什么，恰恰就是参数量的指标。
核心逻辑：推理能力可以靠蒸馏压缩到小模型里，事实知识不行。一个模型记得多少冷门事实，直接跟它的参数量挂钩。

知乎博主李博杰为这个写了一篇小论文，构建了一套叫 IKP（不可压缩知识探针）的数据集：1400 个问题、7 层稀有度，扔到 27 家厂商的 188 个模型上跑了一遍，只看事实准确率。

结果在 89 个公开参数的开源模型上，准确率 vs log(参数量) 的拟合 R²=0.917，基本是一条直线。把闭源模型投影上去，规模就估出来了：

GPT-5.5 ≈ 9T
Claude Opus 4.7 ≈ 4T
GPT-5.4 ≈ 2.2T
Claude Sonnet 4.6 ≈ 1.7T
Gemini 2.5 Pro ≈ 1.2T
（90% 置信区间：0.3-3 倍规模）

另外两个发现也挺反直觉：
一是引用数和 h-index 不能预测一个研究者是否被前沿模型认识。两个引用数相近的人，模型给的回答可能完全不一样。它记的是有影响力的工作，不是论文数量。
二是事实容量不会被时间压缩。跨 3 年的 96 个开源模型，IKP 时间系数统计上为零（p<10⁻¹⁵），直接拒绝了 Densing Law 预测的 +0.0117/月衰减。benchmark 在饱和，但事实容量还在随参数继续扩张。

来源：知乎博主李博杰
侵权联系删
https://t.co/Bt5CiMGc5M

AI_Whisper_X's tweet photo. 挺有意思的研究。

闭源实验室都对模型规模讳莫如深，但他们其实藏不住模型"知道什么"。而模型知道什么，恰恰就是参数量的指标。
核心逻辑：推理能力可以靠蒸馏压缩到小模型里，事实知识不行。一个模型记得多少冷门事实，直接跟它的参数量挂钩。

知乎博主李博杰为这个写了一篇小论文，构建了一套叫 IKP（不可压缩知识探针）的数据集：1400 个问题、7 层稀有度，扔到 27 家厂商的 188 个模型上跑了一遍，只看事实准确率。

结果在 89 个公开参数的开源模型上，准确率 vs log(参数量) 的拟合 R²=0.917，基本是一条直线。把闭源模型投影上去，规模就估出来了：

GPT-5.5 ≈ 9T
Claude Opus 4.7 ≈ 4T
GPT-5.4 ≈ 2.2T
Claude Sonnet 4.6 ≈ 1.7T
Gemini 2.5 Pro ≈ 1.2T
（90% 置信区间：0.3-3 倍规模）

另外两个发现也挺反直觉：
一是引用数和 h-index 不能预测一个研究者是否被前沿模型认识。两个引用数相近的人，模型给的回答可能完全不一样。它记的是有影响力的工作，不是论文数量。
二是事实容量不会被时间压缩。跨 3 年的 96 个开源模型，IKP 时间系数统计上为零（p<10⁻¹⁵），直接拒绝了 Densing Law 预测的 +0.0117/月衰减。benchmark 在饱和，但事实容量还在随参数继续扩张。

来源：知乎博主李博杰
侵权联系删
https://t.co/Bt5CiMGc5M

AI_Whisper_X's tweet photo. 挺有意思的研究。

闭源实验室都对模型规模讳莫如深，但他们其实藏不住模型"知道什么"。而模型知道什么，恰恰就是参数量的指标。
核心逻辑：推理能力可以靠蒸馏压缩到小模型里，事实知识不行。一个模型记得多少冷门事实，直接跟它的参数量挂钩。

知乎博主李博杰为这个写了一篇小论文，构建了一套叫 IKP（不可压缩知识探针）的数据集：1400 个问题、7 层稀有度，扔到 27 家厂商的 188 个模型上跑了一遍，只看事实准确率。

结果在 89 个公开参数的开源模型上，准确率 vs log(参数量) 的拟合 R²=0.917，基本是一条直线。把闭源模型投影上去，规模就估出来了：

GPT-5.5 ≈ 9T
Claude Opus 4.7 ≈ 4T
GPT-5.4 ≈ 2.2T
Claude Sonnet 4.6 ≈ 1.7T
Gemini 2.5 Pro ≈ 1.2T
（90% 置信区间：0.3-3 倍规模）

另外两个发现也挺反直觉：
一是引用数和 h-index 不能预测一个研究者是否被前沿模型认识。两个引用数相近的人，模型给的回答可能完全不一样。它记的是有影响力的工作，不是论文数量。
二是事实容量不会被时间压缩。跨 3 年的 96 个开源模型，IKP 时间系数统计上为零（p<10⁻¹⁵），直接拒绝了 Densing Law 预测的 +0.0117/月衰减。benchmark 在饱和，但事实容量还在随参数继续扩张。

来源：知乎博主李博杰
侵权联系删
https://t.co/Bt5CiMGc5M

52

1K

158

759

205K

2 months ago

Needed to generate ~50 videos. Veo3 rate per day is 10 for Tier 1.🤦‍♂️ Started using Grok - cheaper than Veo3, even cheaper with batch processing, faster than Veo3, supports 480p, supports varying time lengths and apparently better quality.

0

0

0

0

90

bimsarap1 retweeted

Products @Products

5 months ago

Power up with Lenovo ThinkPad T480: i7 processor, 16GB RAM, 256GB SSD, Windows 11 Pro. Refurbished excellencePower up with Lenovo ThinkPad T480: i7 processor, 16GB RAM, 256GB SSD, Windows 11 Pro. Refurbished excellencePower up with Lenovo ThinkPad T480: i7 processor, 16GB RAM, 256GB SSD, Windows 11 Pro. Refurbished excellencePower up with Lenovo ThinkPad T480: i7 processor, 16GB RAM, 256GB SSD, Windows 11 Pro. Refurbished excellencePower up with Lenovo ThinkPad T480: i7 processor, 16GB RAM, 256GB SSD, Windows 11 Pro. Refurbished excellence!

Products's tweet photo. Power up with Lenovo ThinkPad T480: i7 processor, 16GB RAM, 256GB SSD, Windows 11 Pro. Refurbished excellence!

Products's tweet photo. Power up with Lenovo ThinkPad T480: i7 processor, 16GB RAM, 256GB SSD, Windows 11 Pro. Refurbished excellence!

Products's tweet photo. Power up with Lenovo ThinkPad T480: i7 processor, 16GB RAM, 256GB SSD, Windows 11 Pro. Refurbished excellence!

Products's tweet photo. Power up with Lenovo ThinkPad T480: i7 processor, 16GB RAM, 256GB SSD, Windows 11 Pro. Refurbished excellence!

Products's tweet photo. Power up with Lenovo ThinkPad T480: i7 processor, 16GB RAM, 256GB SSD, Windows 11 Pro. Refurbished excellence!

0

119

15

73

1M

Who to follow

Kithmin Wickremasinghe

Rational & Optimistic | MASc | Alumni of @ECEUBC @ENTC_UOM | Run 🏃‍➡️ | Travel 🧳 | BB 🏀 | (He/Him) | "My dream is to become Hokage": Naruto Uzumaki |

Hasindu Piyumantha

A Post Baccalaureate Fellow at Harvard University. Loves Cricket.

5 months ago

@readswithravi Yuval Noah Harari

0

0

0

0

7

bimsarap1 retweeted

M U M Ali Sabry

5 months ago

We are not without our flaws, yet Sri Lanka has much to be proud of. Our universal, free healthcare system places us among the world’s best, one of the many achievements of our nation over the past 76 years.

alisabrypc's tweet photo. We are not without our flaws, yet Sri Lanka has much to be proud of.
Our universal, free healthcare system places us among the world’s best, one of the many achievements of our nation over the past 76 years. https://t.co/zOeHWQzhS7

17

252

61

27

45K

7 months ago

Spent the weekend going through the concepts and have to say the quality of the each post and the link between each post is beyond amazing. 🔥

7 months ago

If you’re an "ML Engineer" and you think “Transformer” just means stacking encoder–decoder blocks and calling it a day, you’re missing the actual mechanism that makes modern AI work. Concept 16: The Transformer Is a Math Engine, Not a “Model Architecture "Most people can implement a Transformer pipeline. Very few can explain why the Transformer works." Let’s break it down properly. 1. The core idea, Transformers = Vector Field Manipulation Every layer of a Transformer applies three mathematical operations: 1. Projection 2. Attention as weighted integration 3. Update via residual fields The Transformer is basically a learned vector field processor. Not a sequence model. Not an architecture choice. It could be called a mathematical engine that maps token representations through a series of controlled linear and nonlinear transformations. 2. Attention is not magic, it is a quadratic form. (Take a minute while reading this part) Self-attention computes: Attention(Q, K, V) = softmax(QKᵀ / √d) V This means: • QKᵀ is a similarity matrix • softmax turns similarities into probability weights • multiplying by V computes a weighted expectation over token values Attention = learnable, data-dependent kernel smoothing. It is a kernel machine inside your neural network. 3. Multi-head attention = multiple kernels in parallel Each head learns a different geometry of similarity. One head may focus on local patterns, another on long-range dependencies, another on syntax, another on semantics. When people say “transformers understand context,” this is what they mean: each head builds a different function approximator. 4. Residual connections are the true backbone Forget attention. Residuals are the reason Transformers train at all if you look at it properly. xₜ₊₁ = xₜ + f(xₜ) This means every layer learns a correction to the current representation. Gradient flow stays stable. Representations evolve smoothly. Without residuals, Transformers collapse. 5. LayerNorm = curvature control Norms scale the Jacobian of each layer. This keeps the singular values of the mapping from blowing up or collapsing. LayerNorm is not cosmetic. It is what ensures the model doesn’t EXPLODE internally. 6. Feedforward layers = feature expansion and contraction The FFN block: FFN(x) = W₂ σ(W₁ x) expands dimension, applies a nonlinearity, then compresses. This lets the model create new features that attention alone cannot express. It acts as a learned universal approximator inside each layer. 7. Why people can code Transformers but not explain them Because coding a transformer is wiring blocks together. Understanding a transformer requires knowing: • attention as kernel regression • softmax as a probability normalizer • LayerNorm as Jacobian control • residuals as stable integration • FFN as feature synthesis • positional encodings as geometric priors • multi-head structure as an ensemble of learned kernels Most people never go beyond surface-level implementation is something I realised when I caught myself trying to work on advanced papers without actually understanding the fundamentals. TL;DR The Transformer works because its math is designed to stabilize gradients, amplify structure, and integrate information the way a continuous dynamical system would. It is not “just an architecture.” It is the most efficient numerical method we’ve found for learning functions over sequences, graphs, and basically anything with structure.

25

801

72

1K

90K

0

1

0

0

24

7 months ago

@ScottEdwar33859 @karpathy Get the answer script from Gemini run that in the challenge interface and paste the terminal output as the next input to Gemini. There was no help or guiding from my side. Both Gemini and GPT-5 Pro received same initial prompt and challenge files.

0

0

0

0

22

7 months ago

@karpathy Another main difference is the compute time. Altogether Gemini took like 3 mins and GPT-5 Pro spent close to 2 hours. Even when I provided the correct script from Gemini, GPT-5 Pro said that solution is wrong.

1

0

0

0

54

7 months ago

@karpathy Tried a software vulnerability challenge from one of the graduate courses (stack buffer overflow challenge) and Gemini 3 Pro was able to complete in around 3-4 iterations. GPT-5 Pro couldn't solve even after 10 iterations.

1

3

0

0

574

7 months ago

GDM really cooked this time

7 months ago

Gemini 3 leak - some crazy improvements on math, screen understanding, and simpleqa.. somehow beaten by sonnet on swebench but winning on terminalbench Lower context length than 2.5 pro too 🫣 https://t.co/qQah8GPDIV

Teknium's tweet photo. Gemini 3 leak - some crazy improvements on math, screen understanding, and simpleqa.. somehow beaten by sonnet on swebench but winning on terminalbench

Lower context length than 2.5 pro too 🫣

https://t.co/qQah8GPDIV https://t.co/ymQ2P94l3r

31

516

33

100

195K

0

1

0

0

133

8 months ago

We’re presenting our RefEdit paper at ICCV 2025 — Booth #71! Come chat with us about challenging cases in image editing and see how RefEdit pushes the limits of image editing. #iccv25

0

1

0

0

88

8 months ago

Video overviews in @NotebookLM is a top tier feature 🔥

0

0

0

0

36

9 months ago

Looks like reasoning era of LLMs is coming to vision https://t.co/KV93hnyZXg

0

0

0

0

29

9 months ago

Watching #RWC2025 I’m convinced why Rugby is calles ‘the game played in heaven’. #foreverrugbyfan

Rugby World Cup

9 months ago

Slick, sharp, sensational 🙌 A brilliant @WomenBoks set-piece ends with Ayanda Malinga flying over the line 🇿🇦 #RWC2025 | #ITAvRSA

3

449

117

9

38K

0

0

0

0

61

9 months ago

500 days!

bimsarap1's tweet photo. 500 days! https://t.co/Ndbi6oZbV6

0

1

0

0

23

bimsarap1 retweeted

'YZ' Yezhou Yang (杨叶舟) @prof_yz

10 months ago

Our @ApgAsu crew @SCAI_ASU kicked off the Fall 2025 semester with a board game night 🎲✨ When the timing is right, in the right context, make the right move. Just like in board games, in research, and in life 🤠

prof_yz's tweet photo. Our @ApgAsu crew @SCAI_ASU kicked off the Fall 2025 semester with a board game night 🎲✨

When the timing is right, in the right context, make the right move. Just like in board games, in research, and in life 🤠 https://t.co/GK0Iea7T5p

prof_yz's tweet photo. Our @ApgAsu crew @SCAI_ASU kicked off the Fall 2025 semester with a board game night 🎲✨

When the timing is right, in the right context, make the right move. Just like in board games, in research, and in life 🤠 https://t.co/GK0Iea7T5p

prof_yz's tweet photo. Our @ApgAsu crew @SCAI_ASU kicked off the Fall 2025 semester with a board game night 🎲✨

When the timing is right, in the right context, make the right move. Just like in board games, in research, and in life 🤠 https://t.co/GK0Iea7T5p

prof_yz's tweet photo. Our @ApgAsu crew @SCAI_ASU kicked off the Fall 2025 semester with a board game night 🎲✨

When the timing is right, in the right context, make the right move. Just like in board games, in research, and in life 🤠 https://t.co/GK0Iea7T5p

1

26

4

0

1K

10 months ago

Finished it last week. Pantheon has this decades best season finale hands down. 🔥

10 months ago

pantheon is such a good show!

866

8K

463

1K

1M

0

0

0

0

40

11 months ago

Huge thanks to my amazing co-authors: @patelmaitreya, Shivam Singh, @prof_yz, and @cbaral — couldn’t have done this without you! We would also like to thank the @SCAI_ASU, ASU Research Computing, and @cr8dlcloud for generous support w.r.t. GPUs. Onward! 🚀

0

2

0

1

160

11 months ago

✨ Excited to share that our paper "RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing on Referring Expressions" has been accepted at ICCV 2025! 🌺📍See you in Hawaii! 🧵 Here's what we did, why it matters, and what we’re releasing ⬇️ (1/8)

1

3

2

0

438

11 months ago

🎁 We’re releasing: 💾 RefEdit-Bench 🧠 RefEdit model checkpoints 🛠️ Our full synthetic data generation pipeline 🌐 Online demo: https://t.co/cSf4ViDv0k 📄 Paper: https://t.co/GHS4FiLh8y

1

0

0

0

62

Last Seen Users on Sotwe

Trends for you

Most Popular Users