Xavi Giró

@DocXavi

Applied scientist at @amazonscience Barcelona, Catalonia. Made at @la_upc & @columbia. Promoting @dlbcnai. Opinions my own.

Badalona, Catalonia

Joined July 2012

1.8K Following

3K Followers

6.5K Posts

Pinned Tweet

Xavi Giró @DocXavi

4 months ago

X and @elonmusk have failed into promoting the values of democracy and human rights. Time to leave this platform. We learned a lot here, thanks to those who made it possible. Find me on LinkedIn and Bluesky.

Universitat Politècnica de Catalunya (UPC) @la_UPC

over 1 year ago

La #UPC deixa de publicar a X per mantenir la seva comunicació en entorns que garanteixin la qualitat i la veracitat de la informació. Una decisió que ha pres per consens el #ConsellGovernUPC, el 19 de febrer. 🔗https://t.co/K75TpS0hL3

la_UPC's tweet photo. La #UPC deixa de publicar a X per mantenir la seva comunicació en entorns que garanteixin la qualitat i la veracitat de la informació.

Una decisió que ha pres per consens el #ConsellGovernUPC, el 19 de febrer.

🔗https://t.co/K75TpS0hL3 https://t.co/5rM1ElJVgl

27K

557

DocXavi retweeted

Gabriele Berton

@gabriberton

6 days ago

I followed up on two misconduct cases at top ML conferences. TLDR; academic dishonesty pays 😓 Bans (especially cross-venue bans) are non-existent and hard to enforce [1/3]

gabriberton's tweet photo. I followed up on two misconduct cases at top ML conferences.

TLDR; academic dishonesty pays 😓

Bans (especially cross-venue bans) are non-existent and hard to enforce [1/3] https://t.co/Nw1X2AH8lQ

213

45K

Xavi Giró @DocXavi

5 days ago

@dimadamen @CSProfKGD @ETH_en @mapo1 @xiwang1212 @leto__jean @JitendraMalikCV @pulkitology Hiking wearing the badge is next-level. Greetings !

566

DocXavi retweeted

Clément Chadebec

@CChadebec

7 days ago

📢 New @heyjasper release ! 📢 MONET 🌸 : An Apache2.0 deduped and recaptioned dataset of 105M samples unlocking reproducible text-to-image research. Nano T2I 🖌️ : A codebase to train your own T2I model 🤗 @huggingface: https://t.co/x6gEhQIaFV 💻: https://t.co/K6VIU2wjtW Very excited about this new release, pushing the boundaries of open and reproducible T2I research. Congrats to the team! Benjamin Aubin Gonzalo Quintana @onurxtasar @UlaLaParis @_jeev2 @dh7net @clipdropapp @heyjasperai

117

45K

Who to follow

Oscar Mañas

@oscmansan

Research scientist at @AIatMeta, PhD from @Mila_Quebec @UMontrealDIRO. Working on multimodal vision+language generation & evaluation. Català a Zúric.

Cees Snoek

@cgmsnoek

Head of Video & Image Sense Lab | University of Amsterdam | Scientific Director Amsterdam AI

Albert Pumarola

@AlbertPumarola

Researcher & Manager @Meta Superintelligence Labs

DocXavi retweeted

Amazon Science

@AmazonScience

8 days ago

Announcing the #AmazonResearchAwards fall 2025 recipients: 🔍 68 researchers 🏫 49 universities 🌏 11 countries Each gains access to 800+ Amazon public datasets and AWS AI/ML tools. Meet the cohort: https://t.co/47jUdPuRrV

19K

DocXavi retweeted

hardmaru

@hardmaru

8 days ago

For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall. We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal. This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (https://t.co/PK5h0mqQSo), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.

152

646

734K

Xavi Giró @DocXavi

12 days ago

@StevenGlinert Greetings from Catalonia.

DocXavi retweeted

Nando de Freitas

@NandoDF

13 days ago

This is the way AI to solve environmental challenges and ensure a safe and prosperous future for our children. Congratulations and thanks @cusp_ai team 👏🙏

DocXavi retweeted

clem 🤗

@ClementDelangue

2 months ago

After @Pinterest @Airbnb @NotionHQ @cursor_ai, today it’s @eoghan @intercom publicly sharing that they’re finding it better, cheaper, faster to use and train open models themselves rather than use APIs for many tasks. And hundreds of other companies are doing the same without sharing. Ultimately, I believe the majority of AI workflows will be in-house based on open-source (vs API). It took much more time than we anticipated but it’s happening now!

ClementDelangue's tweet photo. After @Pinterest @Airbnb @NotionHQ @cursor_ai, today it’s @eoghan @intercom publicly sharing that they’re finding it better, cheaper, faster to use and train open models themselves rather than use APIs for many tasks.

And hundreds of other companies are doing the same without sharing.

Ultimately, I believe the majority of AI workflows will be in-house based on open-source (vs API). It took much more time than we anticipated but it’s happening now!

184

405K

Xavi Giró @DocXavi

2 months ago

@ychngji6 I miss a proposed solution after all this sounding motivation. Even if not fully solved, could you provide pointers to some approachew aligned with your vision?

DocXavi retweeted

Chongjie(CJ) Ye

@ychngji6

2 months ago

https://t.co/k4DHsjmXJi

258

278

82K

DocXavi retweeted

International Conference on 3D Vision (3DV) @3DVconf

2 months ago

The #3DV2026 Keynote and Award Talk recordings are officially live! 🎥🍿 Revisit all the fantastic presentations from our insightful speakers and keep the 3D vision inspiration going! See the links below⬇️

3DVconf's tweet photo. The #3DV2026 Keynote and Award Talk recordings are officially live! 🎥🍿

Revisit all the fantastic presentations from our insightful speakers and keep the 3D vision inspiration going!

See the links below⬇️ https://t.co/5BhgwisOgP

119

11K

DocXavi retweeted

Brian Roemmele

@BrianRoemmele

2 months ago

LeWorldModel: Yann LeCuns Radical Simplification of World Models Just Made Physics-Aware AI Practical In the race for artificial general intelligence, two paths have emerged. One is the familiar scale everything route: bigger LLMs trained on ever-larger text corpora. The other, championed for years by Yann LeCun, is building world models: compact systems that learn the underlying physics of reality directly from raw sensory data (pixels) so AI can plan, predict, and act in the physical world like a robot or self-driving car actually would. Until now, the second path has been frustratingly difficult. Joint-Embedding Predictive Architectures (JEPAs) - LeCuns elegant framework for learning predictive representations without reconstructing every pixel - kept collapsing during training. Researchers had to resort to a laundry list of hacks: multi-term loss functions (up to six hyperparameters), frozen pre-trained encoders, stop-gradients, exponential moving averages, and other duct-tape tricks just to keep the model from mapping every input to the same useless output. LeCuns team (Mila, NYU, Samsung SAIL, and Brown University) dropped a bombshell: LeWorldModel (LeWM) - the first JEPA that trains stably end-to-end from raw pixels using only two loss terms. No more house-of-cards engineering. Just a clean, simple recipe that works on a single GPU in a few hours with only 15 million parameters. The Core Breakthrough: SIGReg Saves the Day LeWorldModels secret weapon is a new regularizer called SIGReg (for spherical isotropic Gaussian regularizer). It enforces a simple Gaussian distribution on the latent embeddings. This single term prevents representation collapse without any of the previous heuristics. The training objective now has just two parts: 1. Next-embedding prediction loss - the model predicts what the next latent state should be. 2. SIGReg - keeps the latent space well-behaved and diverse. Thats it. Hyperparameters drop from six to one. Training becomes stable, reproducible, and dramatically cheaper. The model learns directly from raw video frames (no pre-trained vision encoders needed) and produces a compact latent world model that can be used for fast planning. Impressive Results on Real Benchmarks Despite its tiny size, LeWorldModel punches way above its weight: - Trains on a single GPU in a few hours. - Plans actions up to 48 times faster than foundation-model-based world models. - Uses roughly 200 times fewer tokens than alternatives. - Matches or beats far larger models on diverse 2D and 3D control tasks (e.g., manipulation, navigation). - Its latent space encodes meaningful physical quantities (position, velocity, etc.) - proven by direct probing. - It reliably detects physically implausible surprise events, showing genuine causal understanding. Crucially, adding a decoder and reconstruction loss hurts performance on downstream control tasks. The pure JEPA objective already captures everything needed for planning - extra visual details just get in the way. Project website: https://t.co/KhGR9LiIQZ Official code: https://t.co/s1lI9kevJS Why This Matters for the Future of AI LeCun has been saying since 2022 that world models (not next-token predictors) are the key to real intelligence. Critics always pointed to the training instability. LeWorldModel removes that objection with elegant simplicity. This is a philosophical reset: AI can learn physics the way babies do - by watching the world unfold - without needing supercomputers or endless text. The implications for robotics, autonomous vehicles, and embodied agents are enormous. Suddenly, building a physically grounded planner is something a researcher (or even a hobbyist) can do on consumer hardware. 1 of 2

BrianRoemmele's tweet photo. LeWorldModel: Yann LeCuns Radical Simplification of World Models Just Made Physics-Aware AI Practical

In the race for artificial general intelligence, two paths have emerged. One is the familiar scale everything route: bigger LLMs trained on ever-larger text corpora. The other, championed for years by Yann LeCun, is building world models: compact systems that learn the underlying physics of reality directly from raw sensory data (pixels) so AI can plan, predict, and act in the physical world like a robot or self-driving car actually would.

Until now, the second path has been frustratingly difficult. Joint-Embedding Predictive Architectures (JEPAs) - LeCuns elegant framework for learning predictive representations without reconstructing every pixel - kept collapsing during training. Researchers had to resort to a laundry list of hacks: multi-term loss functions (up to six hyperparameters), frozen pre-trained encoders, stop-gradients, exponential moving averages, and other duct-tape tricks just to keep the model from mapping every input to the same useless output.

LeCuns team (Mila, NYU, Samsung SAIL, and Brown University) dropped a bombshell:

LeWorldModel (LeWM) - the first JEPA that trains stably end-to-end from raw pixels using only two loss terms. No more house-of-cards engineering. Just a clean, simple recipe that works on a single GPU in a few hours with only 15 million parameters.

The Core Breakthrough: SIGReg Saves the Day

LeWorldModels secret weapon is a new regularizer called SIGReg (for spherical isotropic Gaussian regularizer). It enforces a simple Gaussian distribution on the latent embeddings.

This single term prevents representation collapse without any of the previous heuristics.

The training objective now has just two parts:

1. Next-embedding prediction loss - the model predicts what the next latent state should be.

2. SIGReg - keeps the latent space well-behaved and diverse.

Thats it. Hyperparameters drop from six to one. Training becomes stable, reproducible, and dramatically cheaper.

The model learns directly from raw video frames (no pre-trained vision encoders needed) and produces a compact latent world model that can be used for fast planning.

Impressive Results on Real Benchmarks

Despite its tiny size, LeWorldModel punches way above its weight:

- Trains on a single GPU in a few hours.
- Plans actions up to 48 times faster than foundation-model-based world models.
- Uses roughly 200 times fewer tokens than alternatives.
- Matches or beats far larger models on diverse 2D and 3D control tasks (e.g., manipulation, navigation).
- Its latent space encodes meaningful physical quantities (position, velocity, etc.) - proven by direct probing.
- It reliably detects physically implausible surprise events, showing genuine causal understanding.

Crucially, adding a decoder and reconstruction loss hurts performance on downstream control tasks. The pure JEPA objective already captures everything needed for planning - extra visual details just get in the way.

Project website: https://t.co/KhGR9LiIQZ
Official code: https://t.co/s1lI9kevJS

Why This Matters for the Future of AI

LeCun has been saying since 2022 that world models (not next-token predictors) are the key to real intelligence. Critics always pointed to the training instability. LeWorldModel removes that objection with elegant simplicity.

This is a philosophical reset: AI can learn physics the way babies do - by watching the world unfold - without needing supercomputers or endless text.

The implications for robotics, autonomous vehicles, and embodied agents are enormous. Suddenly, building a physically grounded planner is something a researcher (or even a hobbyist) can do on consumer hardware.

1 of 2

664

132

636

70K

DocXavi retweeted

Amazon Science

@AmazonScience

2 months ago

📣 Amazon Research Awards spring 2026 call for proposals is now open for submissions. Successful applicants will receive unrestricted funds, AWS promotional credits, and training resources. Deadline for submissions is May 6. https://t.co/TVTKV9yDxS

AmazonScience's tweet photo. 📣 Amazon Research Awards spring 2026 call for proposals is now open for submissions. Successful applicants will receive unrestricted funds, AWS promotional credits, and training resources. Deadline for submissions is May 6. https://t.co/TVTKV9yDxS https://t.co/PUBVYjbNSi

Xavi Giró @DocXavi

4 months ago

Humanity well-being 2030s ? https://t.co/N4W5CUZA5e

Pedro Domingos

@pmddomingos

4 months ago

Decade in which each subfield of AI went from not being for real to being for real: Search: 1960s Machine learning: 1990s Vision: 2010s NLP: 2020s Reasoning, planning, robotics, etc.: TBD

181

Xavi Giró @DocXavi

4 months ago

@pmddomingos Humanity well-being 2030s ?

135

DocXavi retweeted

#CVPR2026 @CVPR

4 months ago

Before you hit submit: Check if your paper title is included. It must be there to comply with the #CVPR2026 rebuttal template. 🔍

33K

DocXavi retweeted

Angjoo Kanazawa @akanazawa

5 months ago

In an effort to better understand VLMs, we found that they are fragile in surprising ways. Just changing the color of pointing markers (red circle → blue circle) can completely change the results! :

104

15K

DocXavi retweeted

ELLISBarcelona @ELLISBarcelona

5 months ago

✨ Kind reminder! The ELLIS Unit Barcelona is hosting its fourth Scientific Seminar. Join us for the Scientific Seminar on January 28th with a talk by Prof. @PascalMettes on "Hyperbolic Deep Learning". Don't miss out ➡️https://t.co/3nx3L3f9D4

ELLISBarcelona's tweet photo. ✨ Kind reminder! The ELLIS Unit Barcelona is hosting its fourth Scientific Seminar.

Join us for the Scientific Seminar on January 28th with a talk by Prof. @PascalMettes on "Hyperbolic Deep Learning".

Don't miss out ➡️https://t.co/3nx3L3f9D4 https://t.co/DaSSxHYhhq

338

DocXavi retweeted

#CVPR2026 @CVPR

5 months ago

The #CVPR2026 review deadline has now passed. If you have not yet submitted your review, please contact your Area Chair (AC) immediately to confirm your status and submission plan!

Xavi Giró

@DocXavi

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users