Louis @lcb0b - Twitter Profile

LCB0B retweeted

2 days ago

Le “test des 3 Chinois” : au-delà du buzz, Gilles Gressani, directeur du Grand Continent, explique ce que révèle notre méconnaissance de ce pays, qui pourrait bien devenir la première puissance mondiale. ➡️ https://t.co/lVckYxDBma

5

90

23

52

11K

LCB0B retweeted

Anthropic

@AnthropicAI

1 day ago

The speedup isn’t just in volume. On open-ended coding problems where answers are unclear, Claude’s success rate is now 76%—a 50 point jump in just 6 months. Many engineers also say Claude’s code quality is now on par with human code; we expect it to be better within the year.

AnthropicAI's tweet photo. The speedup isn’t just in volume. On open-ended coding problems where answers are unclear, Claude’s success rate is now 76%—a 50 point jump in just 6 months.

Many engineers also say Claude’s code quality is now on par with human code; we expect it to be better within the year. https://t.co/SXWKlAYuak

36

2K

108

196

440K

LCB0B retweeted

Nobuhiro Mifune @NobuMifune

10 days ago

25年分のファッションショーなどのデータを分析すると、モデルの体型の多様性は広がっているがそれは非白人のモデルに偏っており、モデルの体型の中央値は変化していないらしい。

0

4

1

0

537

Louis @LCB0B

10 days ago

@NobuMifune https://t.co/m5luzFmdyz

0

1

0

723

Who to follow

Paul Jeha

@jeha_paul

PhD in Cph curr. @gen_intuition / https://t.co/uz1rGMixzO the work is mysterious and important

Nacho Martínez

@NachoMtez09

Maths engineering and operations research

LCB0B retweeted

Nobuhiro Mifune @NobuMifune

10 days ago

Cultural evolution of beauty standards | PNAS https://t.co/v3AxpCaJfx

1

21

5

9

2K

Louis @LCB0B

15 days ago

There is more in the paper: past attempts to regulate the industry (when they succeeded and when they failed), and a network analysis of industry prestige. https://t.co/DxZSxehlkc

0

33

Louis @LCB0B

15 days ago

Is the fashion industry actually becoming more inclusive? We analyzed 793,199 records over 25 years to find out. Our results are published today in PNAS. https://t.co/DxZSxehlkc

1

0

1

54

Louis @LCB0B

15 days ago

This pattern aligns with decades of intersectionality scholarship. What we add is large-scale empirical evidence from the fashion industry.

LCB0B's tweet photo. This pattern aligns with decades of intersectionality scholarship. What we add is large-scale empirical evidence from the fashion industry. https://t.co/6h0l1QFbtu

1

0

31

LCB0B retweeted

Bojie Li

@bojie_li

about 1 month ago

Closed labs hide model sizes. They can't hide what their models know, and what a model knows is an indicator on how big it is. Reasoning compresses. Factual knowledge doesn't. So you can size a frontier model from black-box API calls alone, and across releases you can literally watch a single fact arrive in the parameters over time. For three years, my friends Jiyan He and Zihan Zheng have been asking frontier LLMs the same question: "what do you know about USTC Hackergame?", a CTF contest. May 2024: GPT-4o invented fake titles. Feb 2025: Claude 3.7 Sonnet listed 19 verified 2023 challenges. By April 2026, frontier models recall specific challenges across consecutive years. After DeepSeek-V4 dropped, I instructed my agent to spend four days autonomously turning that habit into Incompressible Knowledge Probes (IKP) — 1,400 questions, 7 tiers of obscurity, 188 models, 27 vendors. Three findings: 1/ You can approximately size any black-box LLM from factual accuracy alone. Penalized accuracy is log-linear in log(params), R² = 0.917 on 89 open-weight models from 135M to 1.6T params. Project closed APIs onto the curve → GPT-5.5 ~9T, Claude Opus 4.7 ~4T, GPT-5.4 ~2.2T, Claude Sonnet 4.6 ~1.7T, Gemini 2.5 Pro ~1.2T (90% CI: 0.3-3x size). 2/ Citation count and h-index don't predict whether a frontier model recognizes a researcher. Two researchers with similar citation profiles get very different responses. Models memorize impact — work that shaped a field, not many incremental papers. 3/ Factual capacity doesn't compress over time. Across 96 open-weight models across 3 years, the IKP time coefficient is statistically zero, rejecting the Densing-Law prediction of +0.0117/month at p<10⁻¹⁵. Reasoning benchmarks saturate; factual capacity keeps scaling with parameters. Website: https://t.co/CkwJsXqnsX Paper: https://t.co/eNUdC9ye7w

bojie_li's tweet photo. Closed labs hide model sizes. They can't hide what their models know, and what a model knows is an indicator on how big it is.

Reasoning compresses. Factual knowledge doesn't. So you can size a frontier model from black-box API calls alone, and across releases you can literally watch a single fact arrive in the parameters over time.

For three years, my friends Jiyan He and Zihan Zheng have been asking frontier LLMs the same question: "what do you know about USTC Hackergame?", a CTF contest. May 2024: GPT-4o invented fake titles. Feb 2025: Claude 3.7 Sonnet listed 19 verified 2023 challenges. By April 2026, frontier models recall specific challenges across consecutive years.

After DeepSeek-V4 dropped, I instructed my agent to spend four days autonomously turning that habit into Incompressible Knowledge Probes (IKP) — 1,400 questions, 7 tiers of obscurity, 188 models, 27 vendors. Three findings:

1/ You can approximately size any black-box LLM from factual accuracy alone. Penalized accuracy is log-linear in log(params), R² = 0.917 on 89 open-weight models from 135M to 1.6T params. Project closed APIs onto the curve → GPT-5.5 ~9T, Claude Opus 4.7 ~4T, GPT-5.4 ~2.2T, Claude Sonnet 4.6 ~1.7T, Gemini 2.5 Pro ~1.2T (90% CI: 0.3-3x size).

2/ Citation count and h-index don't predict whether a frontier model recognizes a researcher. Two researchers with similar citation profiles get very different responses. Models memorize impact — work that shaped a field, not many incremental papers.

3/ Factual capacity doesn't compress over time. Across 96 open-weight models across 3 years, the IKP time coefficient is statistically zero, rejecting the Densing-Law prediction of +0.0117/month at p<10⁻¹⁵. Reasoning benchmarks saturate; factual capacity keeps scaling with parameters.

Website: https://t.co/CkwJsXqnsX
Paper: https://t.co/eNUdC9ye7w

71

2K

235

1K

390K

LCB0B retweeted

Lujain Ibrahim @lujainmibrahim

about 1 month ago

🚨Very excited to see our work on warmth & sycophancy in LLMs out in @Nature today!🚨 We study what happens when LLMs are fine-tuned to be warmer, and find that warmth and sycophancy can be linked, with warm models showing higher errors on a range of benchmarks (🔗s below)

lujainmibrahim's tweet photo. 🚨Very excited to see our work on warmth & sycophancy in LLMs out in @Nature today!🚨

We study what happens when LLMs are fine-tuned to be warmer, and find that warmth and sycophancy can be linked, with warm models showing higher errors on a range of benchmarks (🔗s below) https://t.co/N8OiBDpwac

14

269

61

138

37K

Louis @LCB0B

about 1 month ago

@francoisfleuret

1

0

73

LCB0B retweeted

Ethan Mollick

@emollick

about 2 months ago

With max thinking Opus 4.7 is quite impressive, with a real sense of style In two prompts: "implement the Tower of Babel, in 3D, in as sophisticated and visually interesting a way as possible. It should be interactive" and then "make it better." Play: https://t.co/JWTVewpwZ9

33

775

51

278

93K

LCB0B retweeted

François Chollet

@fchollet

5 months ago

If you're wondering whether saturating ARC-AGI-1 or 2 means we have AGI now... I refer you to what I said when we launched ARC-AGI-2 last year (which is also the same thing I said when we announced ARC-AGI-2 was coming, in Spring 2022, before the rise of LLM chatbots)... The ARC-AGI series is not an AGI threshold, it's a compass that points the research community toward the right questions. ARC-AGI-1 is a minimal test of fluid intelligence -- to pass it, you needed to show nonzero fluid intelligence. This required AI to move past the classic deep learning / LLM paradigm of pretraining scaling + static models at inference, toward test-time adaptation. ARC-AGI-2 is the same, but with tasks that probe deeper levels of reasoning complexity (particularly with regard to concept composition). Still, these are tasks that are solvable in minutes by regular people with no external tool use (we hired our test takers off the street), so it does not represent the upper bound of what human fluid intelligence can achieve (say, solving a Millennium problem). ARC-AGI-3 (launching March 2026) probes interactive reasoning: we evaluate how systems explore unknown environments, model them, set their own goals, and plan/execute towards these goals, autonomously, without instructions. We have also started work on ARC-AGI-4 and ARC-AGI-5, which I am pretty excited about!

fchollet's tweet photo. If you're wondering whether saturating ARC-AGI-1 or 2 means we have AGI now... I refer you to what I said when we launched ARC-AGI-2 last year (which is also the same thing I said when we announced ARC-AGI-2 was coming, in Spring 2022, before the rise of LLM chatbots)...

The ARC-AGI series is not an AGI threshold, it's a compass that points the research community toward the right questions.

ARC-AGI-1 is a minimal test of fluid intelligence -- to pass it, you needed to show nonzero fluid intelligence. This required AI to move past the classic deep learning / LLM paradigm of pretraining scaling + static models at inference, toward test-time adaptation.

ARC-AGI-2 is the same, but with tasks that probe deeper levels of reasoning complexity (particularly with regard to concept composition). Still, these are tasks that are solvable in minutes by regular people with no external tool use (we hired our test takers off the street), so it does not represent the upper bound of what human fluid intelligence can achieve (say, solving a Millennium problem).

ARC-AGI-3 (launching March 2026) probes interactive reasoning: we evaluate how systems explore unknown environments, model them, set their own goals, and plan/execute towards these goals, autonomously, without instructions.

We have also started work on ARC-AGI-4 and ARC-AGI-5, which I am pretty excited about!

94

1K

93

271

212K

LCB0B retweeted

Crémieux

@cremieuxrecueil

8 months ago

The tiny Caribbean island of Anguilla now derives almost half of its state budget from the sale of .ai domain names.

97

10K

574

743

479K

LCB0B retweeted

Melanie Weber @mweber_PU

9 months ago

An implementation of Contrastive Poincaré Maps with sample notebooks for the biomedical case studies is now publicly available: https://t.co/6g5YS6qM8A

0

15

4

1

1K

Louis @LCB0B

9 months ago

📢More from our recent @NatureHumBehav article from the Technical University of Denmark: Our study shows that behind the apparent complexity of human mobility lies a simple rule shaped by geography and distance. 🔗 https://t.co/xEHNVbN0V6 DOI: 10.1038/s41562-025-02282-7

0

3

0

103

Louis

@LCB0B

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users