francois

@fozenne

lead data scientist. AI for high expertise domains, functional programing and domain driven design

Versailles, France

Joined January 2013

104 Following

70 Followers

582 Posts

francois @fozenne

6 months ago

The three body problem novel is about AI doom

francois @fozenne

6 months ago

2026 prediction : MD5: d1c5c969fc61989992d0a5128c1a42b1 Let’s see how long it takes to realize 👀

francois @fozenne

6 months ago

@gchampeau @_mcorbin @le_trappiste Tout utilisateur a grosse conso (donc ceux qui dictent la roadmap) font de l’IaC pour la reproductibilité et utilisent boto3 / la CLI pour monitorer les usages. Ça n’exclut pas que ces memes outils sont souvent complexes, mais ce n’est pas un sujet UI web

fozenne retweeted

Terrible Maps

@TerribleMaps

6 months ago

Mind blown.. Germany’s 5 biggest cities lie perfectly on a 4th-degree polynomial by u/BarisSayit

340

25K

859

Who to follow

Bryan McAnulty

@BryanMcAnulty

Helping thousands of creators earn a living with https://t.co/1R9KFmsWKX. Founder of @LatchLoopAI @HeightsPlatform & @Velora. Host of The Creator's Adventure.

Building @shadeformai (@YCombinator S23) - single API for all accelerated compute. Built ctrl planes and orchestrators for @Microsoft. Matthew 5:10.

fozenne retweeted

Justin Mitchel

@JustinMitchel

6 months ago

So... Postgres is now basically a search engine? pg_textsearch was just open sourced. It enables BM25 to search your database.... massive upgrade for key word search. Google uses BM25 in their search engine. Claude told me: "if you're already on Postgres, you can now skip the whole sync-your-data-to-Elasticsearch dance for search." (ps, how can you not love Claude). Now I got to figure out how to implement in my Django querysets... future course? Grab it at https://t.co/bMwRSgtOcO #sponsored

401

507K

fozenne retweeted

Mistral AI

@MistralAI

6 months ago

Mistral OCR 3 sets new benchmarks in both accuracy and efficiency, outperforming enterprise document processing solutions as well as AI-native OCR.

MistralAI's tweet photo. Mistral OCR 3 sets new benchmarks in both accuracy and efficiency, outperforming enterprise document processing solutions as well as AI-native OCR. https://t.co/vukGQkEcen

766

214

198K

francois @fozenne

6 months ago

And that’s why you in-house them

alex fazio

@alxfazio

6 months ago

friend at accenture told me they don’t do evals when building llm wrappers for clients 🤡

805

206

104K

fozenne retweeted

Hunter Leath

@jhleath

6 months ago

an interesting update: the team is starting to move away from AI coding completely (devin/claude/etc) because it's so much harder to review the AI code than writing things themselves

185

220

838

764K

fozenne retweeted

Simon Willison

@simonw

7 months ago

This one is pretty nasty - it tricks Antigravity into stealing AWS credentials from a .env file (working around .gitignore restrictions using cat) and then leaks them to a webhooks debugging site that's included in the Antigravity browser agent's default allow-list

317

315K

fozenne retweeted

Jeffrey Emanuel

@doodlestein

7 months ago

Just read through the new LeJEPA paper by Yann LeCun and Randall Balestriero. I’ve been curious to know what Yann’s been working on lately, especially considering all his criticisms of LLMs (which I disagree with, as I think LLMs will keep improving and will take us to ASI fairly soon). Anyway, there are several threads already on X about the paper and what it introduces. The short version is that it’s a principled, theoretically justified, and parsimonious approach to self-supervised learning that replaces a complex hodgepodge of ad-hoc, hacky heuristics for preventing mode collapse, which is the bane of self-supervised learning. That’s where the model screws up and starts mapping all inputs to nearly identical embeddings or to a narrow subspace of embeddings, collapsing down all the richness of the problem into a pathologically simple and wrong correspondence. The first pillar of the new approach is their proof that isotropic Gaussian distributions uniquely minimize worst-case downstream prediction risk. As soon as I read that, I immediately thought of CMA-ES, the best available black-box optimization algorithm for when you don’t have access to the gradient of the function you’re trying to minimize, but can only do (expensive/slow) function evaluations. Nikolaus Hansen has been working on CMA-ES since he introduced it way back in 1996. I’ve always been fascinated by this approach and used it with a lot of success to efficiently explore hyper-parameters of deep neural nets back in 2011 instead of doing inefficient grid searches. Anyway, the reason why I bring it up is because there’s a striking parallel and deep connection between that approach and the core of LeJEPA. CMA-ES says: Start with an isotropic Gaussian because it's the maximum entropy (least biased) distribution given only variance constraints. Then adapt the covariance to learn the problem's geometry. LeJEPA says: Maintain an isotropic Gaussian because it's the maximum entropy (least biased) distribution for unknown future tasks. Both recognize that isotropy is optimal under uncertainty for three reasons: The maximum entropy principle; Among all distributions with fixed variance, the isotropic Gaussian has maximum entropy; I.e., it makes the fewest assumptions. There’s no directional bias; Equal variance in all directions means you're not pre-committing to any particular problem structure. You get worst-case optimality; Minimize maximum regret across all possible problem geometries. So then what’s the difference? It comes down to adaptation timing. CMA-ES can adapt during optimization; it starts isotropic but then becomes anisotropic as it learns the specific optimization landscape. In contrast, LeJEPA has to stay isotropic because it's preparing for unknown downstream tasks that haven't been seen yet. This parallel suggests LeJEPA is applying a fundamental principle from optimization theory to representation learning. It's essentially saying: “The optimal search distribution for black-box optimization is also the optimal embedding distribution for transfer learning.” This makes sense because both problems involve navigating unknown landscapes; for CMA-ES, this is the unknown optimization landscape; for LeJEPA, this is the unknown space of downstream tasks. This difference then makes me wonder: could we have "adaptive LeJEPA" that starts isotropic but adapts its embedding distribution once we know the downstream task, similar to how CMA-ES adapts during optimization? That would be like meta-learning the right anisotropy for specific task families. Anyway, I thought I’d share my thoughts on this. It’s fascinating to see the connections between these different areas. The black-box optimization community has always been pretty separate and distinct from the deep learning community, and there’s not much cross-pollination there. This makes sense, because if you have a gradient, you’d be crazy not to use it. But there are strong connections.

doodlestein's tweet photo. Just read through the new LeJEPA paper by Yann LeCun and Randall Balestriero. I’ve been curious to know what Yann’s been working on lately, especially considering all his criticisms of LLMs (which I disagree with, as I think LLMs will keep improving and will take us to ASI fairly soon).

Anyway, there are several threads already on X about the paper and what it introduces. The short version is that it’s a principled, theoretically justified, and parsimonious approach to self-supervised learning that replaces a complex hodgepodge of ad-hoc, hacky heuristics for preventing mode collapse, which is the bane of self-supervised learning.

That’s where the model screws up and starts mapping all inputs to nearly identical embeddings or to a narrow subspace of embeddings, collapsing down all the richness of the problem into a pathologically simple and wrong correspondence.

The first pillar of the new approach is their proof that isotropic Gaussian distributions uniquely minimize worst-case downstream prediction risk.

As soon as I read that, I immediately thought of CMA-ES, the best available black-box optimization algorithm for when you don’t have access to the gradient of the function you’re trying to minimize, but can only do (expensive/slow) function evaluations.

Nikolaus Hansen has been working on CMA-ES since he introduced it way back in 1996. I’ve always been fascinated by this approach and used it with a lot of success to efficiently explore hyper-parameters of deep neural nets back in 2011 instead of doing inefficient grid searches.

Anyway, the reason why I bring it up is because there’s a striking parallel and deep connection between that approach and the core of LeJEPA.

CMA-ES says: Start with an isotropic Gaussian because it's the maximum entropy (least biased) distribution given only variance constraints. Then adapt the covariance to learn the problem's geometry.

LeJEPA says: Maintain an isotropic Gaussian because it's the maximum entropy (least biased) distribution for unknown future tasks.

Both recognize that isotropy is optimal under uncertainty for three reasons:

The maximum entropy principle; Among all distributions with fixed variance, the isotropic Gaussian has maximum entropy; I.e., it makes the fewest assumptions.

There’s no directional bias; Equal variance in all directions means you're not pre-committing to any particular problem structure.

You get worst-case optimality; Minimize maximum regret across all possible problem geometries.

So then what’s the difference? It comes down to adaptation timing. CMA-ES can adapt during optimization; it starts isotropic but then becomes anisotropic as it learns the specific optimization landscape.

In contrast, LeJEPA has to stay isotropic because it's preparing for unknown downstream tasks that haven't been seen yet.

This parallel suggests LeJEPA is applying a fundamental principle from optimization theory to representation learning. It's essentially saying:

“The optimal search distribution for black-box optimization is also the optimal embedding distribution for transfer learning.”

This makes sense because both problems involve navigating unknown landscapes; for CMA-ES, this is the unknown optimization landscape; for LeJEPA, this is the unknown space of downstream tasks.

This difference then makes me wonder: could we have "adaptive LeJEPA" that starts isotropic but adapts its embedding distribution once we know the downstream task, similar to how CMA-ES adapts during optimization? That would be like meta-learning the right anisotropy for specific task families.

Anyway, I thought I’d share my thoughts on this. It’s fascinating to see the connections between these different areas. The black-box optimization community has always been pretty separate and distinct from the deep learning community, and there’s not much cross-pollination there.

This makes sense, because if you have a gradient, you’d be crazy not to use it. But there are strong connections.

923

809

89K

fozenne retweeted

Jack Morris

@jxmnop

7 months ago

there are dozens or perhaps a couple hundred ex-{OpenAI, xAI, Google DeepMind} researchers founding companies in the current climate there are, as far as i know, zero people leaving to found startups out of Anthropic really makes you think

360

732K

fozenne retweeted

special k | CEO of stressed out era

@specialkdelslay

8 months ago

This is supposed to be the thermodynamic quantum computer? it looks like a 3d printed plastic toy with demon symbols on the side or sum, 14 million in seed funding?? fill me in on what I'm missing here

specialkdelslay's tweet photo. This is supposed to be the thermodynamic quantum computer?
it looks like a 3d printed plastic toy with demon symbols on the side or sum, 14 million in seed funding?? fill me in on what I'm missing here https://t.co/nHxYnlRCIw

715

86K

fozenne retweeted

Simo Ryu

@cloneofsimo

8 months ago

Im confused about "10,000 more efficient" part. This means you can train stable-diffusion-3 like model with 20$~ ish amount of electricity. What stops them from building a model and demonstrating it, beyond *checks note* ... Fashion MNIST? Im genuinely curious whats stopping them from demonstrating something like imagenet-1k which should take less than a dollar of electricity (if my math is right) for 200k steps of training

664

146

149K

francois @fozenne

8 months ago

@Sauers_ Plenty of fish in this pond

332

fozenne retweeted

Anthropic

@AnthropicAI

8 months ago

New Anthropic research: Signs of introspection in LLMs. Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them? We found evidence for genuine—though limited—introspective capabilities in Claude.

AnthropicAI's tweet photo. New Anthropic research: Signs of introspection in LLMs.

Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them? We found evidence for genuine—though limited—introspective capabilities in Claude. https://t.co/4FCfkG9WVT

282

776

fozenne retweeted

Wirelyss 👁️‍🗨️💫

@wirelyss

8 months ago

Luckily since the Louvre made NFTs of their jewelry, even though the crowns physically were stolen, they still own the same assets. Because the tokens still exist and are in limited supply just as before. Nothing has changed. few understand blockchain technology.

wirelyss's tweet photo. Luckily since the Louvre made NFTs of their jewelry, even though the crowns physically were stolen, they still own the same assets. Because the tokens still exist and are in limited supply just as before. Nothing has changed. few understand blockchain technology. https://t.co/HBUX656Hn2

320

15K

645

619K

fozenne retweeted

terminally onλine εngineer

@tekbog

8 months ago

multi cloud multi az systems engineers right now

18K

416K

fozenne retweeted

terminally onλine εngineer

@tekbog

8 months ago

this is basically how open source works for big tech

435

790

292K

fozenne retweeted

Simon Willison

@simonw

8 months ago

I grabbed a full copy of the folder and shared it on GitHub here: https://t.co/sMWO6E09xr - here are my notes so far: https://t.co/b9S1kMb4Jj

301

359

19K

francois @fozenne

9 months ago

Nice! Data extraction via web search tool calls was a vulnerability we were worried about early on. Glad it hzs been properly documented.

Simon Willison

@simonw

9 months ago

Classic prompt injection attack here against Notion: hidden text (white on white) in a PDF which, when processed by Notion, causes their agent to gather confidential data from other pages and append it into a query string that gets passed to their functions_search() tool

147

780

218K

francois

@fozenne

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users