@gchampeau@_mcorbin@le_trappiste Tout utilisateur a grosse conso (donc ceux qui dictent la roadmap) font de l’IaC pour la reproductibilité et utilisent boto3 / la CLI pour monitorer les usages.
Ça n’exclut pas que ces memes outils sont souvent complexes, mais ce n’est pas un sujet UI web
So... Postgres is now basically a search engine?
pg_textsearch was just open sourced. It enables BM25 to search your database.... massive upgrade for key word search.
Google uses BM25 in their search engine.
Claude told me: "if you're already on Postgres, you can now skip the whole sync-your-data-to-Elasticsearch dance for search."
(ps, how can you not love Claude).
Now I got to figure out how to implement in my Django querysets... future course?
Grab it at https://t.co/bMwRSgtOcO
#sponsored
an interesting update: the team is starting to move away from AI coding completely (devin/claude/etc) because it's so much harder to review the AI code than writing things themselves
This one is pretty nasty - it tricks Antigravity into stealing AWS credentials from a .env file (working around .gitignore restrictions using cat) and then leaks them to a webhooks debugging site that's included in the Antigravity browser agent's default allow-list
Just read through the new LeJEPA paper by Yann LeCun and Randall Balestriero. I’ve been curious to know what Yann’s been working on lately, especially considering all his criticisms of LLMs (which I disagree with, as I think LLMs will keep improving and will take us to ASI fairly soon).
Anyway, there are several threads already on X about the paper and what it introduces. The short version is that it’s a principled, theoretically justified, and parsimonious approach to self-supervised learning that replaces a complex hodgepodge of ad-hoc, hacky heuristics for preventing mode collapse, which is the bane of self-supervised learning.
That’s where the model screws up and starts mapping all inputs to nearly identical embeddings or to a narrow subspace of embeddings, collapsing down all the richness of the problem into a pathologically simple and wrong correspondence.
The first pillar of the new approach is their proof that isotropic Gaussian distributions uniquely minimize worst-case downstream prediction risk.
As soon as I read that, I immediately thought of CMA-ES, the best available black-box optimization algorithm for when you don’t have access to the gradient of the function you’re trying to minimize, but can only do (expensive/slow) function evaluations.
Nikolaus Hansen has been working on CMA-ES since he introduced it way back in 1996. I’ve always been fascinated by this approach and used it with a lot of success to efficiently explore hyper-parameters of deep neural nets back in 2011 instead of doing inefficient grid searches.
Anyway, the reason why I bring it up is because there’s a striking parallel and deep connection between that approach and the core of LeJEPA.
CMA-ES says: Start with an isotropic Gaussian because it's the maximum entropy (least biased) distribution given only variance constraints. Then adapt the covariance to learn the problem's geometry.
LeJEPA says: Maintain an isotropic Gaussian because it's the maximum entropy (least biased) distribution for unknown future tasks.
Both recognize that isotropy is optimal under uncertainty for three reasons:
The maximum entropy principle; Among all distributions with fixed variance, the isotropic Gaussian has maximum entropy; I.e., it makes the fewest assumptions.
There’s no directional bias; Equal variance in all directions means you're not pre-committing to any particular problem structure.
You get worst-case optimality; Minimize maximum regret across all possible problem geometries.
So then what’s the difference? It comes down to adaptation timing. CMA-ES can adapt during optimization; it starts isotropic but then becomes anisotropic as it learns the specific optimization landscape.
In contrast, LeJEPA has to stay isotropic because it's preparing for unknown downstream tasks that haven't been seen yet.
This parallel suggests LeJEPA is applying a fundamental principle from optimization theory to representation learning. It's essentially saying:
“The optimal search distribution for black-box optimization is also the optimal embedding distribution for transfer learning.”
This makes sense because both problems involve navigating unknown landscapes; for CMA-ES, this is the unknown optimization landscape; for LeJEPA, this is the unknown space of downstream tasks.
This difference then makes me wonder: could we have "adaptive LeJEPA" that starts isotropic but adapts its embedding distribution once we know the downstream task, similar to how CMA-ES adapts during optimization? That would be like meta-learning the right anisotropy for specific task families.
Anyway, I thought I’d share my thoughts on this. It’s fascinating to see the connections between these different areas. The black-box optimization community has always been pretty separate and distinct from the deep learning community, and there’s not much cross-pollination there.
This makes sense, because if you have a gradient, you’d be crazy not to use it. But there are strong connections.
there are dozens or perhaps a couple hundred ex-{OpenAI, xAI, Google DeepMind} researchers founding companies in the current climate
there are, as far as i know, zero people leaving to found startups out of Anthropic
really makes you think
This is supposed to be the thermodynamic quantum computer?
it looks like a 3d printed plastic toy with demon symbols on the side or sum, 14 million in seed funding?? fill me in on what I'm missing here
Im confused about "10,000 more efficient" part. This means you can train stable-diffusion-3 like model with 20$~ ish amount of electricity. What stops them from building a model and demonstrating it, beyond *checks note* ... Fashion MNIST?
Im genuinely curious whats stopping them from demonstrating something like imagenet-1k
which should take less than a dollar of electricity (if my math is right) for 200k steps of training
New Anthropic research: Signs of introspection in LLMs.
Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them? We found evidence for genuine—though limited—introspective capabilities in Claude.
Luckily since the Louvre made NFTs of their jewelry, even though the crowns physically were stolen, they still own the same assets. Because the tokens still exist and are in limited supply just as before. Nothing has changed. few understand blockchain technology.
Classic prompt injection attack here against Notion: hidden text (white on white) in a PDF which, when processed by Notion, causes their agent to gather confidential data from other pages and append it into a query string that gets passed to their functions_search() tool