Samira Abnar @samira_abnar - Twitter Profile

Pinned Tweet

over 1 year ago

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute? We explored this through the lens of MoEs:

samira_abnar's tweet photo. 🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute?

We explored this through the lens of MoEs: https://t.co/0TXu6RMGDx

4

283

65

193

48K

samira_abnar retweeted

Hadjar

@xeegeex

6 days ago

به عنوان یک زن ایرانی کنار زنان افغانستان می‌ایستم، همان‌طور که آن‌ها کنار ما ایستادند و می‌ایستند. زن، زندگی، آزادی #هرات

31

609

44

3

16K

samira_abnar retweeted

Hadjar

@xeegeex

2 months ago

Want to send Trump a message before he escalates the war with Iran? Buy gas today. High oil prices are his kryptonite. Do it now. 🛢️ #PumpUpThePressure #NoWarCrimeslnIran

0

23

10

1

694

samira_abnar retweeted

Hadjar

@xeegeex

2 months ago

Just speechless. how would Americans feel if Iran bombed mayo clinic?! #StopWarOnIran

0

53

16

0

1K

samira_abnar retweeted

Mohammad Ali Shabani

@mashabani

2 months ago

با دقت توجه کنید مطمئن باشید که جاسوس‌های اسرائیل در ایران تحصیل‌کردهٔ آمریکا نیستند. آن‌ها تظاهر می‌کنند انقلابی هستند و حتی در زمان جنگ هم با دامن زدن به فتنه، خودشان را لو می‌دهند.

42

319

36

24

27K

samira_abnar retweeted

Ehsan Movahedian

@ultra_ehsan

2 months ago

اشک غیرت دکتر رحیمی مدیر پروژه بزرگترین پل خاورمیانه در کرج: این پروژه پروژه ای بود برای مردم و عمومی بود نه استفاده نظامی میشد ونه هیچ یک از دروغ هایی که رسانه ها گفتند مصداق دارد. این کارگاه مهندسی است. من ناراحت این هستم که بد قول در برابر مردم شدم. ولی این آرزو را به گور خواهند برد که این پل بلا استفاده شود.

1

1K

164

109

57K

samira_abnar retweeted

Hadjar

@xeegeex

2 months ago

آخرش من با این گریه م گرفت. آقا بزارین ما بیایم ایران با خیال راحت کار و زندگیمون رو بکنیم. تخصصمون رو بزنیم به زخم مملکت خودمون. ما نه وطن فروشیم نه خائن، هر مخالفتی هم با شما داریم به عشق ایرانه. چند نفر از دوستانم رو باید ببینیم که برگشتن تا برای ایران کار کنند و به اتهام جاسوسی کار و خانه و زندگیشون رو به باد دادید و دوباره آواره کردید؟ یکی برای آوردن واکسن HPV، یکی برای آوردن استاد بین الملل تئاتر، یکی برای چاپ کتاب ... این چه سرنوشتیه؟ متنفرم از زندگی در کشوری که داره پدر مادرم رو بمباران میکنه!‌ میفهمید یعنی چی؟ #نه_به_جنگ

5

143

19

10

10K

samira_abnar retweeted

شفق @MareLontano

3 months ago

سازمان منابع طبیعی و آبخیزداری استان تهران که در بمباران امروز رسماً پودر شد. این‌جا هم داشتن موشک و بمب اتم می‌ساختن هموطن؟ چرا هرجای دنیا که درباره ویرانی‌های ایران خبر کار شده یک ایرانی نوشته این‌جا پاسدارها مخفی شدن/حتما این‌جا موشک می‌ساختن؟ بس کنید این بساط وقاحت و پستی و رذالت رو

133

997

234

64

49K

samira_abnar retweeted

Hadjar

@xeegeex

3 months ago

For years I watched children of Gaza suffer. Helplessly all my poor heart could do was to study their faces, and every word, promising them I'd remember them forever. And that's how I know what Israel is planning for my home country and our children now.

4

44

15

2

2K

samira_abnar retweeted

Miguel Angel Bautista

@itsbautistam

9 months ago

So I understand that was unexpected for a lot of people, @Apple MLR has released a protein folding model! https://t.co/n6qpEvwByS. Here’s a summary of what SimpleFold is and what it represents: - What is SimpleFold? A generative model that essentially treats protein folding almost exactly as if it were a text-to-image or text-to-3D problem. - What are we sharing? A research paper and a codebase under an MIT license https://t.co/JnehdmQilR (looking forward to people contributing to it!). We are also releasing pre-trained checkpoints of different sizes so that researchers can best tradeoff performance for efficiency. - Why protein folding? We are doing this work largely because protein folding is an excellent benchmark for structured data generation and multi-modality. Protein folding is a very interesting problem from a generative modeling perspective and we do research on generative modeling :) - Why is it interesting? IMO SimpleFold is interesting because I believe in finding recipes (architectures, training objectives, etc.) that generalize across the board to many different data modalities. Let’s say you are an ML expert in text-to-image or text-to-3D, now you can apply your latest and greatest architectural blocks or efficient samplers to protein folding with SimpleFold. I believe this is a net benefit for ML research and science in general. Now getting more into the technical details: - Our architecture is very simple (hence the name), just a stack of transformer blocks with time-step conditioning. This is important because it makes the model efficient at inference time. You can run SimpleFold directly on your Mac and get results quickly without data ever leaving your laptop. - SimpleFold is not necessarily a model that “rejects” inductive biases, it just doesn’t enforce them directly on the architecture. For example, we apply rotation augmentation to all the protein structures during training. This makes the model “softly” invariant to this symmetry. - There were some concerns online about data leakage from AFESM and that driving performance of SimpleFold or making it overfit. We filtered AFESM data so that the CASP14 sequences are not seen during training. As a matter of fact we distilled structures from AF2/ESMFold models, which have the same cutoff data as SimpleFold for PDB data. Both AF2 and ESMFold train on self-distilled datasets, we just train SimpleFold on a bigger set of distilled data. I want to thank my awesome team of collaborators, they are all rockstars. That’s all, for now :)

7

310

40

169

50K

samira_abnar retweeted

Alaa El-Nouby @alaa_nouby

9 months ago

Last year at @Apple MLR, we published a number of interesting papers like AIM, AIMv2, and Scaling laws for: Sparsity, Native Multimodal Models, Data mixing. Today the team has open-sourced the training codebase we used for conducting this research! https://t.co/WNvOWMkgm3

4

438

55

236

29K

samira_abnar retweeted

Mustafa Shukor @MustafaShukor1

11 months ago

We propose new scaling laws that predict the optimal data mixture, for pretraining LLMs, native multimodal models and large vision encoders ! Only running small-scale experiments is needed, and we can then extrapolate to large-scale ones. These laws allow 1/n 🧵

MustafaShukor1's tweet photo. We propose new scaling laws that predict the optimal data mixture, for pretraining LLMs, native multimodal models and large vision encoders !

Only running small-scale experiments is needed, and we can then extrapolate to large-scale ones. These laws allow 1/n 🧵 https://t.co/ISSAo9Ymp2

6

264

45

214

31K

samira_abnar retweeted

Awni Hannun

@awnihannun

about 1 year ago

We have two awesome new videos on MLX at #WWDC25 this year. - Learn all about MLX. - Learn all about running LLMs locally with MLX. @angeloskath, @shshnkp, myself, and others worked super hard to make these. Check them out and hope you find them useful!

awnihannun's tweet photo. We have two awesome new videos on MLX at #WWDC25 this year.

- Learn all about MLX.
- Learn all about running LLMs locally with MLX.

@angeloskath, @shshnkp, myself, and others worked super hard to make these. Check them out and hope you find them useful!

25

431

52

190

90K

samira_abnar retweeted

Vimal Thilak🦉🐒

@AggieInCA

about 1 year ago

Check out this post that has information about research from Apple that will be presented at ICLR 2025 in 🇸🇬 this week. I will be at ICLR and will be presenting some of our work (led by @samira_abnar) at SLLM @sparseLLMs workshop. Happy to chat about JEPAs as well!

0

19

5

1

3K

samira_abnar retweeted

Pau Rodríguez @prlz77

about 1 year ago

Our work on fine-grained control of LLMs and diffusion models via Activation Transport will be presented @iclr_conf as spotlight✨Check out our new blog post https://t.co/dAJQtcETNX

1

40

9

3K

samira_abnar retweeted

Mustafa Shukor @MustafaShukor1

about 1 year ago

We release a large scale study to answer the following: - Is late fusion inherently better than early fusion for multimodal models? - How do native multimodal models scale compared to LLMs. - How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵

MustafaShukor1's tweet photo. We release a large scale study to answer the following:
- Is late fusion inherently better than early fusion for multimodal models?
- How do native multimodal models scale compared to LLMs.
- How sparsity (MoEs) can play a detrimental role in handling heterogeneous modalities? 🧵 https://t.co/677ZM4kHbm

10

459

80

385

86K

samira_abnar retweeted

Enrico Fini @DonkeyShot21

about 1 year ago

Training and scaling large multimodal models from scratch? This is the thread for you. In this new paper, we provide an extensive study with hundreds of runs, fitting scaling laws for early/late fusion models, MoEs, and exploring different data mixtures. Tons of cool findings.

3

91

11

36

7K

samira_abnar retweeted

Yuyang Wang

@YuyangW95

over 1 year ago

We’re looking for an intern at Apple MLR 🍎 ASAP. Join us if interested in building universal diffusion/flow-matching model at scale!

1

68

9

37

15K

samira_abnar retweeted

Enrico Fini @DonkeyShot21

over 1 year ago

We release AIMv2, the second iteration of the AIM family of large autoregressive vision encoders. This time we bring multimodality into the game 🔥 Paper: https://t.co/YpU6T8Pr9p Repo: https://t.co/g1LO5rE5Y0 Model Gallery: https://t.co/j3jZ8TEtf5

DonkeyShot21's tweet photo. We release AIMv2, the second iteration of the AIM family of large autoregressive vision encoders. This time we bring multimodality into the game 🔥

Paper: https://t.co/YpU6T8Pr9p
Repo: https://t.co/g1LO5rE5Y0
Model Gallery: https://t.co/j3jZ8TEtf5 https://t.co/P4RE5LRrDt

6

171

36

52

29K

Samira Abnar @samira_abnar

over 1 year ago

@RamonDarioIT @DrewSteinman Our cost function is based only on FLOPs, we need to incorporate these factors(memory and communication costs) based on hardware specifications and some other details such as sharding in the cost function…

0

1

0

37

Samira Abnar @samira_abnar

over 1 year ago

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute? We explored this through the lens of MoEs:

4

283

65

193

48K

Samira Abnar @samira_abnar

over 1 year ago

@RamonDarioIT @DrewSteinman One thing to note is that besides stability related issues as we increase sparsity, in practice memory and communication costs are not negligible. Specifically at larger scales, these would become bottlenecks and hardware constraints would determine the efficient sparsity level.

1

2

0

39

Samira Abnar

@samira_abnar

Last Seen Users on Sotwe

Trends for you

Most Popular Users