Remi Montroty @remimontroty - Twitter Profile

Senator Chris Murphy speaks the truth! Trump's Ukraine "peace deal" is just a mafia corruption scheme to enrich Trump's family and friends by selling out Ukraine. @ChrisMurphyCT

126

4K

1K

114

55K

remimontroty retweeted

General Intuition @gen_intuition

7 months ago

Yann LeCun (Chief AI Scientist, Meta, @ylecun), @PimDeWitte (CEO, General Intuition), and Aude Durand (Kyutai, @aude_drn), talk about world models, embodied agents, Yann's new company, and the limitations of LLMs 0:00 - Introduction to World Models 5:00 - Why World Models, Intuition & Introducing Yann's new company 10:00 - Architectures + Merging Language & Interaction Data towards General Agents 20:00 - Open Source, Sovereign AI & @kyutai_labs Partnership Keynote for #aiPULSE2025 at Station F in Paris 🇫🇷

29

644

123

608

111K

Remi Montroty @remimontroty

7 months ago

And .... testing!

0

3

Remi Montroty @remimontroty

8 months ago

"J'aimerais pouvoir penser à quelque chose et que ChatGPT y réponde": Merge Labs, nouvelle start-up de Sam Altman, pourrait bouleverser les interfaces cerveau-machine et nous lier encore plus à l'intelligence artificielle https://t.co/wgBWy0aGaY via @techandco

0

36

Remi Montroty @remimontroty

11 months ago

@elonmusk maybe start your own media channel too, on top a new political party. You might need people with integrity like Colbert.

Elizabeth Warren

@SenWarren

12 months ago

CBS canceled Colbert’s show just THREE DAYS after Colbert called out CBS parent company Paramount for its $16M settlement with Trump – a deal that looks like bribery. America deserves to know if his show was canceled for political reasons. Watch and share his message.

20K

138K

31K

12K

9M

0

20

remimontroty retweeted

Elizabeth Warren

@SenWarren

12 months ago

CBS canceled Colbert’s show just THREE DAYS after Colbert called out CBS parent company Paramount for its $16M settlement with Trump – a deal that looks like bribery. America deserves to know if his show was canceled for political reasons. Watch and share his message.

20K

138K

31K

12K

9M

remimontroty retweeted

Elon Musk

@elonmusk

12 months ago

That was how I felt when asking Grok 4 questions about materials science that are not in any books or on the Internet

4K

45K

5K

3K

16M

remimontroty retweeted

DAZN France

@DAZN_FR

12 months ago

💥 | Kylian Mbappé ouvre son compteur dans la compétition avec un but splendide ! 🔝🔥 #FIFACWC #RMABVB #RealMadrid #Dortmund Suivez la rencontre @realmadrid 🆚 @BlackYellow gratuitement ici : https://t.co/kAQFrmIwIx ! 👈

13

180

33

17

19K

Remi Montroty @remimontroty

about 1 year ago

Une étude d'Apple remet en question les progrès en «raisonnement» IA vantés par OpenAI, Google et Anthropic : leurs LRM subissent un « effondrement complet de leur précision » face à des problèmes complexes https://t.co/MaCkb4VLTU via @developpez

0

42

remimontroty retweeted

Lisan al Gaib

@scaling01

about 1 year ago

The Ultimate LLM Benchmark list: SimpleBench: https://t.co/51rkwsB7pZ SOLO-Bench: https://t.co/Zymtspj83V AidanBench: https://t.co/5lpH3CGhl0 SEAL by Scale: https://t.co/mAFyIfod7V (particularly the MultiChallenge leaderboard) LMArena: https://t.co/CIOyTQ9ufe (with Style Control) LiveBench: https://t.co/1fsq2IOsy1 ARC-AGI: https://t.co/bKh8xsI9WX Thematic Generalization by LechMazur: https://t.co/W9FIyRedE6 ( other ones by Lech Mazur: https://t.co/vPDH3Aj5OO, https://t.co/SrtUI7KYEZ, ...) EQBench: https://t.co/g7zmT8Ilkq (especially the Longform writing leaderboard) Fiction-Live Bench: https://t.co/NSA1d7LEGe MC-Bench: https://t.co/JpXYWvjk3Z (ordered by winrate, not by Elo) TrackingAI - IQ Bench: https://t.co/rWoTwz1eu9 Dubesor LLM: https://t.co/FyF32AKDa4 Balrog-AI: https://t.co/ZLwDpixw2E Misguided Attention: https://t.co/2VMdPg5J4m Snake-Bench: https://t.co/dEcvZYsVqz SmolAgents LLM: https://t.co/iBock5Q4V4 (just because of GAIA and SimpleQA) Context-Arena (MRCR and Graphwalks): https://t.co/bXn2wwMK6L OpenCompass: https://t.co/GQbKwZDq8k HHEM (Hallucination Benchmark): https://t.co/Z23lcd7XMc Coding, Math and Agentic Benchmarks Aider-Polyglot-Coding: https://t.co/aRGODg2PUA BigCodeBench: https://t.co/HxNMp3GLk9 WebDev-Arena: https://t.co/sQB8tBLekG WeirdML: https://t.co/38CA9RBml4 Symflower Coding: https://t.co/WxYMXjcHpZ PHYBench: https://t.co/gyp0bGXxzt MathArena: https://t.co/QVzZSeW9t9 Galileo Agent: https://t.co/Igs3TW3s1I XLANG Agent: https://t.co/NZwxnbGMry Important for tracking AI take-off METR long task benchmarks: https://t.co/IYzI5SGUFd (incl. RE Bench) PaperBench: https://t.co/uLLybqtwIg SWE-Lancer: https://t.co/amsmTZYK7n MLE-Bench: https://t.co/2DkbRKdVA5 SWE-Bench: https://t.co/TFJyzWqURA other classics I ALWAYS want to see when a new model is released GPQA-Diamond: https://t.co/t2HV6IiyaC SimpleQA: https://t.co/lhpHqQcxJf Tau-bench: https://t.co/TZT7fQc6cc SciCode: https://t.co/J8HmPK9kiU MMMU: https://t.co/rMHpsZQvRJ Humanities Last Exam (HLE): https://t.co/OHSoyPZ9nY Overview for classical benchmarks (GPQA, SimpleQA, AIME, MMLU, ...) Simple-Evals: https://t.co/3sQysnQaVd Vellum AI: https://t.co/E1g047GWk7 Artificial Analysis: https://t.co/sALmriQ4qC Benchmarks I literally don't care about - saturated / no signal MMLU, HumanEval, BBH, DROP, MGSM, basically all math benchmarks like GSM8K, MATH, AIME

22

635

94

845

208K

remimontroty retweeted

Rob Wiblin

@robertwiblin

about 1 year ago

OpenAI catches a lot of shit because it promised the public so much and is now falling short. But don't forget that it remains miles ahead of: • Meta • xAI • Microsoft • DeepSeek which all get off light because they've only ever promised little or nothing!

robertwiblin's tweet photo. OpenAI catches a lot of shit because it promised the public so much and is now falling short. But don't forget that it remains miles ahead of:

• Meta
• xAI
• Microsoft
• DeepSeek

which all get off light because they've only ever promised little or nothing! https://t.co/AOzzHFFak0

27

166

15

34

18K

remimontroty retweeted

Rob Wiblin

@robertwiblin

about 1 year ago

AI models currently have a 50% chance of doing something that takes a human expert one hour. This doubles every 7 months. In 2 years? They could automate full workdays. In 4 years? A full month. I discuss the most important graph in AI today with Beth Barnes, the CEO of METR, which uncovered this rule of AI progress. Her bottom line: "It really doesn't seem like 2 years would be surprising for recursively self-improving AI." Beth also explains: where company safety testing fails, why there are no true closed-weight models, AI undermines leading powers, why she's come around on open weighting, and why models might be about to start playing dumb much more often. Enjoy! Available on the 80,000 Hours Podcast in all apps. Links below. 1:51 Can we see AI scheming in the chain of thought? 12:50 Alignment faking 17:33 We have to test models before they're even used inside AI companies 31:56 Each 7 months models can do tasks twice as long 51:31 METR's research finds AIs are solid at AI research already 58:18 AI may turn out to be strong at novel and creative research 1:07:55 Recursively self-improving AI might even be here in two years 1:14:29 Could evaluations backfire? 1:39:55 Do we need external auditors doing AI safety tests? 1:54:09 Why not work at AI companies 2:08:40 The new more dire situation has forced changes to METR's strategy 2:21:49 Overrated: Interpretability research 2:32:55 Overrated: Major AI companies' contributions to safety research 2:39:15 Could we ban using AI to enhance AI, or is that just naive? 2:45:31 Open-weighting models is often good 2:50:22 What we can learn about AGI from the nuclear arms race 3:10:43 AI is more like bioweapons because it undermines the leading power 3:42:09 What research METR plans to do next

11

233

25

198

94K

Remi Montroty

@remimontroty

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users