Mattia Verasani @matrazor - Twitter Profile

MatRazor retweeted

wh

@nrehiew_

2 days ago

For the visual learners

7

602

53

641

40K

MatRazor retweeted

Rosinality @rosinality

3 days ago

https://t.co/60FGO6TzbB Depth attention. This uses a weighted sum of values as a value input for attention.

1

80

8

57

5K

MatRazor retweeted

Sasha Rush

@srush_nlp

3 days ago

On-Policy Distillation is the most active new research direction being explored in RL for LLMs. Had the chance to discuss how it works with Dwarkesh and why it fits so nicely into large-scale pipelines.

21

1K

127

1K

133K

MatRazor retweeted

Nando de Freitas

@NandoDF

5 days ago

Very proud of my team for achieving this important milestone. They are very talented. Within a year, they transformed the AI capabilities of a large corporation.

10

175

14

12

15K

Who to follow

Rebuilding how people consume information Building https://t.co/7FtDKCpOOp | Writing about AI & Startups 10+ yrs in data science | IIT BHU

Hamza Benchekroun

@hparams

Core Team Researcher @hcompany_ai. Into anything that starts with Reinforcement and ends with Learning.

MatRazor retweeted

elie

@eliebakouch

5 days ago

WOW microsoft new "MAI Thinking 1" model comes with a 109 page tech report that looks REALLY detailed, this is amazing

24

985

121

684

197K

MatRazor retweeted

Jeff Dean

@JeffDean

5 days ago

Thanks for a great @twominutepapers conversation, Károly!

10

121

15

63

29K

MatRazor retweeted

Edward Z. Yang @ezyang

6 days ago

New devlog post from yours truly: When does fragmentation occur in the CUDA caching allocator? https://t.co/ocAdv4mjy2 -- this post is LLM authored but I heavily prompted/edited, and Natalia also helped fact check.

8

136

14

86

11K

MatRazor retweeted

antirez @antirez

9 days ago

Sorry if I post this again, but: this is the best post about AI slop you are going to read in the next 2 years.

25

784

47

1K

184K

MatRazor retweeted

Clive Chan

@itsclivetime

9 days ago

>hundreds of parallel subagents gonna hit your quota in like 3 seconds

8

124

3

12

20K

MatRazor retweeted

Gabriele Berton

@gabriberton

10 days ago

Gemini Embedding 2 is out!

2

54

7

22

8K

MatRazor retweeted

NVIDIA AI

@NVIDIAAI

10 days ago

Introducing Dynamo Snapshot, our approach for fast startup for inference workloads on Kubernetes, which reduces startup time from minutes to under 5 seconds. In production inference deployments demand fluctuates over time. Cold-starting inference workloads can take minutes, leaving idle GPUs that generate no tokens and serve no requests. Snapshot leverages GMS to enable concurrent weight restoration over a high-speed interconnect, while using Linux native AIO and parallel memfd restoration to accelerate CRIU restore performance.

NVIDIAAI's tweet photo. Introducing Dynamo Snapshot, our approach for fast startup for inference workloads on Kubernetes, which reduces startup time from minutes to under 5 seconds.

In production inference deployments demand fluctuates over time. Cold-starting inference workloads can take minutes, leaving idle GPUs that generate no tokens and serve no requests.

Snapshot leverages GMS to enable concurrent weight restoration over a high-speed interconnect, while using Linux native AIO and parallel memfd restoration to accelerate CRIU restore performance.

23

362

54

148

61K

MatRazor retweeted

PyTorch

@PyTorch

12 days ago

Model Optimization and Post-Training Quantization Model quantization is an effective method to reduce VRAM usage and improve inference performance on consumer devices. By lowering computational and memory requirements while preserving model quality, quantization helps AI models run more efficiently in resource-constrained environments. This post walks through how to use NVIDIA Model Optimizer to quantize a CLIP model in FP8 format with the post-training quantization (PTQ) method, including an example workflow exporting a PyTorch checkpoint. Read the complete blog post: https://t.co/yXK4uIusyZ

2

141

22

91

12K

MatRazor retweeted

Rosinality @rosinality

12 days ago

https://t.co/WPxi1kgW3A Meta's experience on multi-datacenter training. They have used a PP schedule called Doraemon PP which allows integration with ZeRO-2/3.

rosinality's tweet photo. https://t.co/WPxi1kgW3A

Meta's experience on multi-datacenter training. They have used a PP schedule called Doraemon PP which allows integration with ZeRO-2/3. https://t.co/ZFPe6SJUcg

0

55

6

38

4K

MatRazor retweeted

Raytar

@Raytar

13 days ago

he tested 5760 architectures at Google for a full year. the winner was the original Transformer from 2017. Hyung Won Chung told that story at MIT with a small smile. then went to OpenAI and trained o1. 1 hour. free. by one of the few people on earth who actually moves the frontier. meanwhile your feed is full of guys writing architecture threads who have never trained a model anyone uses. he just told MIT that 99% of AI research is theater. your AI worldview was built by men who read his papers. badly. now you can read him directly. you will rewatch this. save it now.

11

1K

91

2K

106K

MatRazor retweeted

Rosinality @rosinality

13 days ago

https://t.co/RYOB9y2BLp Could it be useful to distill from a smaller model? I think, beyond distillation, we could get some signal from the loss difference across the scales.

rosinality's tweet photo. https://t.co/RYOB9y2BLp

Could it be useful to distill from a smaller model? I think, beyond distillation, we could get some signal from the loss difference across the scales. https://t.co/vIOn0L36Qi

4

133

27

134

11K

MatRazor retweeted

zhyncs

@zhyncs42

13 days ago

Correctness is critical for LLM inference engines. Recently, I found TRT-LLM’s work on Hypothesis Testing Methodology to be extremely professional. https://t.co/Qr1CLCIQ06

zhyncs42's tweet photo. Correctness is critical for LLM inference engines. Recently, I found TRT-LLM’s work on Hypothesis Testing Methodology to be extremely professional.
https://t.co/Qr1CLCIQ06 https://t.co/fASycE1zl1

4

234

22

182

14K

MatRazor retweeted

Greg Brockman

@gdb

14 days ago

self improvement prompt for codex

114

4K

352

6K

491K

MatRazor retweeted

Saining Xie

@sainingxie

16 days ago

check out RAEv2 led by Jas. through extensive exps, we found some really intriguing behaviors showing why strong representation encoders are key for pixel decoders. spoiler: it’s not about hillclimbing fid; new metrics like ep@fid-k/fdr^k show there’s a lot more left to explore!

4

337

32

198

53K

MatRazor retweeted

Gabriele Berton

@gabriberton

15 days ago

Apply here to join the frontier of computer vision!

2

192

11

109

35K

MatRazor retweeted

Ivan Fioravanti ᯅ

@ivanfioravanti

15 days ago

This series of articles is great! Understanding System Design is key to be able to drive your coding agents correctly!

1

99

13

86

11K

Mattia Verasani

@MatRazor

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users