Houssein Ben-Ameur @houssein - Twitter Profile

Houssein retweeted

about 1 year ago

DeepSeek’s R1 leaps over xAI, Meta and Anthropic to be tied as the world’s #2 AI Lab and the undisputed open-weights leader DeepSeek R1 0528 has jumped from 60 to 68 in the Artificial Analysis Intelligence Index, our index of 7 leading evaluations that we run independently across all leading models. That’s the same magnitude of increase as the difference between OpenAI’s o1 and o3 (62 to 70). This positions DeepSeek R1 as higher intelligence than xAI’s Grok 3 mini (high), NVIDIA’s Llama Nemotron Ultra, Meta’s Llama 4 Maverick, Alibaba’s Qwen 3 253 and equal to Google’s Gemini 2.5 Pro. Breakdown of the model’s improvement: 🧠 Intelligence increases across the board: Biggest jumps seen in AIME 2024 (Competition Math, +21 points), LiveCodeBench (Code generation, +15 points), GPQA Diamond (Scientific Reasoning, +10 points) and Humanity’s Last Exam (Reasoning & Knowledge, +6 points) 🏠 No change to architecture: R1-0528 is a post-training update with no change to the V3/R1 architecture - it remains a large 671B model with 37B active parameters 🧑‍💻 Significant leap in coding skills: R1 is now matching Gemini 2.5 Pro in the Artificial Analysis Coding Index and is behind only o4-mini (high) and o3 🗯️ Increased token usage: R1-0528 used 99 million tokens to complete the evals in Artificial Analysis Intelligence Index, 40% more than the original R1’s 71 million tokens - ie. the new R1 thinks for longer than the original R1. This is still not the highest token usage number we have seen: Gemini 2.5 Pro is using 30% more tokens than R1-0528 Takeaways for AI: 👐 The gap between open and closed models is smaller than ever: open weights models have continued to maintain intelligence gains in-line with proprietary models. DeepSeek’s R1 release in January was the first time an open-weights model achieved the #2 position and DeepSeek’s R1 update today brings it back to the same position 🇨🇳 China remains neck and neck with the US: models from China-based AI Labs have all but completely caught up to their US counterparts, this release continues the emerging trend. As of today, DeepSeek leads US based AI labs including Anthropic and Meta in Artificial Analysis Intelligence Index 🔄 Improvements driven by reinforcement learning: DeepSeek has shown substantial intelligence improvements with the same architecture and pre-train as their original DeepSeek R1 release. This highlights the continually increasing importance of post-training, particularly for reasoning models trained with reinforcement learning (RL) techniques. OpenAI disclosed a 10x scaling of RL compute between o1 and o3 - DeepSeek have just demonstrated that so far, they can keep up with OpenAI’s RL compute scaling. Scaling RL demands less compute than scaling pre-training and offers an efficient way of achieving intelligence gains, supporting AI Labs with fewer GPUs See further analysis below 👇

ArtificialAnlys's tweet photo. DeepSeek’s R1 leaps over xAI, Meta and Anthropic to be tied as the world’s #2 AI Lab and the undisputed open-weights leader

DeepSeek R1 0528 has jumped from 60 to 68 in the Artificial Analysis Intelligence Index, our index of 7 leading evaluations that we run independently across all leading models. That’s the same magnitude of increase as the difference between OpenAI’s o1 and o3 (62 to 70).

This positions DeepSeek R1 as higher intelligence than xAI’s Grok 3 mini (high), NVIDIA’s Llama Nemotron Ultra, Meta’s Llama 4 Maverick, Alibaba’s Qwen 3 253 and equal to Google’s Gemini 2.5 Pro.

Breakdown of the model’s improvement:
🧠 Intelligence increases across the board: Biggest jumps seen in AIME 2024 (Competition Math, +21 points), LiveCodeBench (Code generation, +15 points), GPQA Diamond (Scientific Reasoning, +10 points) and Humanity’s Last Exam (Reasoning & Knowledge, +6 points)

🏠 No change to architecture: R1-0528 is a post-training update with no change to the V3/R1 architecture - it remains a large 671B model with 37B active parameters

🧑‍💻 Significant leap in coding skills: R1 is now matching Gemini 2.5 Pro in the Artificial Analysis Coding Index and is behind only o4-mini (high) and o3

🗯️ Increased token usage: R1-0528 used 99 million tokens to complete the evals in Artificial Analysis Intelligence Index, 40% more than the original R1’s 71 million tokens - ie. the new R1 thinks for longer than the original R1. This is still not the highest token usage number we have seen: Gemini 2.5 Pro is using 30% more tokens than R1-0528

Takeaways for AI:
👐 The gap between open and closed models is smaller than ever: open weights models have continued to maintain intelligence gains in-line with proprietary models. DeepSeek’s R1 release in January was the first time an open-weights model achieved the #2 position and DeepSeek’s R1 update today brings it back to the same position

🇨🇳 China remains neck and neck with the US: models from China-based AI Labs have all but completely caught up to their US counterparts, this release continues the emerging trend. As of today, DeepSeek leads US based AI labs including Anthropic and Meta in Artificial Analysis Intelligence Index

🔄 Improvements driven by reinforcement learning: DeepSeek has shown substantial intelligence improvements with the same architecture and pre-train as their original DeepSeek R1 release. This highlights the continually increasing importance of post-training, particularly for reasoning models trained with reinforcement learning (RL) techniques. OpenAI disclosed a 10x scaling of RL compute between o1 and o3 - DeepSeek have just demonstrated that so far, they can keep up with OpenAI’s RL compute scaling. Scaling RL demands less compute than scaling pre-training and offers an efficient way of achieving intelligence gains, supporting AI Labs with fewer GPUs

See further analysis below 👇

65

3K

450

745

621K

Houssein Ben-Ameur @Houssein

over 3 years ago

🇲🇦 Moroccooooo ❤️ #FIFAWorldCup

0

1

0

Houssein Ben-Ameur @Houssein

over 3 years ago

Sad 🇧🇷

0

Houssein Ben-Ameur @Houssein

almost 4 years ago

Magic! #OnsJabeur #Wimbledon

Wimbledon

@Wimbledon

almost 4 years ago

Still thinking about this incredible @Ons_Jabeur winner 😅 #Wimbledon | #CentreCourt100

34

2K

125

14

0

1

0

Who to follow

Welid Naffati

@WelidNaffati

Founder of https://t.co/VjG8M0bLh4, producer of DigiClub (1st IT Podcast in Tunisia), Co-Producer of Startup Story on JawharaFM and Co-Producer of El Pitch on ElHiwarEttounsi TV

Kaïs Berrjab

@OstezEdgar

Avocat, Master 2 en Sociologie loading ... #NetFreedom #TarajiDawla

Neïla Massir Driss

@MassirDestin

Juriste et critique de cinéma. Membre de la FIPRESCI Golden Globes Awards International voter https://t.co/ymnkGYWvVS

Houssein Ben-Ameur @Houssein

almost 4 years ago

"I am a proud Tunisian woman standing here today" #OnsJabeur #TeamOJ #Wimbledon

0

Houssein retweeted

Valérie Plante

@Val_Plante

over 4 years ago

Merci Montréal 💙

228

3K

299

4

0

Houssein Ben-Ameur @Houssein

over 4 years ago

Yes. ♥️ @Val_Plante à Montréal, @CathFournierQc (29 ans) à Longueuil, et Stéphane Boyer (33 ans) à Laval. Je peux pas espérer mieux. #polmtl

Houssein's tweet photo. Yes. ♥️
@Val_Plante à Montréal, @CathFournierQc (29 ans) à Longueuil, et Stéphane Boyer (33 ans) à Laval. Je peux pas espérer mieux.

#polmtl https://t.co/2i47TW1wJx

0

1

0

Houssein retweeted

Dr. Mona Nemer @ChiefSciCan

about 6 years ago

As we fight the #COVID19 pandemic together, it is important for all of us to do our part by following the guidelines of public health officials and protecting our health care system. This is a useful table of symptoms. Via @ChiefSciAdvisor https://t.co/k83X0qOyn6

ChiefSciCan's tweet photo. As we fight the #COVID19 pandemic together, it is important for all of us to do our part by following the guidelines of public health officials and protecting our health care system. This is a useful table of symptoms. Via @ChiefSciAdvisor
https://t.co/k83X0qOyn6 https://t.co/tcnf5YQhRw

4

319

241

22

0

Houssein Ben-Ameur @Houssein

about 6 years ago

Pauvre dame.

Daniel Lewis @Daniel_Lewis3

about 6 years ago

Here is Dr. Birx's reaction when President Trump asks his science advisor to study using UV light on the human body and injecting disinfectant to fight the coronavirus.

13K

149K

52K

5K

0

Houssein Ben-Ameur @Houssein

about 6 years ago

Excellent texte de Patrick Lagacé @kick1972 qui nous donne une première analyse de la situation dans les CHSLD qui va bien au delà des salaires des préposés... https://t.co/HJDrcFwdAt

1

0

Houssein retweeted

Doc Vadeboncoeur @Vadeboncoeur_Al

about 6 years ago

En situation chaotique, ajouter des gens qui n'ont pas d'expertise dans les soins de base et la réorganisation (profs, docs, etc) n'aidera peut-être pas. Il faut mettre des gens qui savent soigner ET réorganiser: infirmières, paramédicaux, assistants-md et md... militaires.

58

452

78

0

Houssein Ben-Ameur @Houssein

about 6 years ago

#NewProfilePic

1

0

Houssein retweeted

Journal Métro

@journalmetro

over 6 years ago

Ok boomer - la chronique de @Houssein https://t.co/yhlJLly5ih

0

5

3

0

Houssein Ben-Ameur @Houssein

over 6 years ago

Ma chronique ce matin dans @journalmetro https://t.co/fjxGxxIr49

0

Houssein Ben-Ameur @Houssein

over 6 years ago

La méthode @sjb_caq: 1- proposer loi injuste décriée par l’opposition, société civile et opinion publique 2- faire semblant de de ne pas reculer 3- accorder une exception de droit acquis donnant l’impression de chercher le consensus Troisième fois qu’il nous fait le coup #PolQC

0

Houssein retweeted

Journal Métro

@journalmetro

over 6 years ago

Leçons d’une élection - Une chronique de @Houssein https://t.co/vEX7fumlw6

0

3

1

0

Houssein Ben-Ameur @Houssein

over 6 years ago

Ma chronique de ce matin dans le @journalmetro https://t.co/E7hB4TuCpR "Les Montréalais, qu’ils soient «de souche» ou non, ont voté essentiellement pour le parti qui avait le plus de chance de battre les conservateurs" #polcan #canadaelection2019