Thiago Matheus @thimabru - Twitter Profile

thimabru retweeted

2 months ago

Demis Hassabis’s “Einstein test” for defining AGI: Train a model on all human knowledge but cut it off at 1911, then see if it can independently discover general relativity (as Einstein did by 1915); if yes, it’s AGI.

373

6K

453

1K

674K

Thiago Matheus @thimabru

7 months ago

Specifically Computer Vision

0

12

Thiago Matheus @thimabru

7 months ago

Which are the best universities on Europe for a PhD in Artificial Intelligence?

1

0

19

thimabru retweeted

Yashas

@YashasGunderia

7 months ago

ChatGPT (OpenAI) servers facing issues, conversations dissapeared, facing issues in loading.

29

186

17

6

35K

Who to follow

Hongyin Luo

@lhyTHU

X is for meme. @subsysdev, @MIT_CSAIL, @TsinghuaNLP

Jian Wang

@jwanglvy

Agents, Reasoning, Interaction. Prev. PhD @HongKongPolyU | Visiting PhD student @UMich @SLED_AI.

Felix Wu

@fw4cs

Researcher @ GDM

Thiago Matheus @thimabru

7 months ago

@rohanpaul_ai The Reinforcement Learning could be this idea of trial and error continual learning ?

0

39

thimabru retweeted

chastronomic

@chastronomic

7 months ago

If you're an "ML Engineer" and you don't understand how the Jacobian of your network evolves during training, you’re missing the mechanism that controls feature learning, signal flow, and training stability. Concept 14: The Jacobian Flow of Deep Networks For a model fθ, the Jacobian with respect to inputs is: J(x) = ∂fθ(x) / ∂x Its singular values determine how information is amplified or compressed. Training moves this spectrum along a predictable trajectory called Jacobian Flow. 1. At Initialization, Spectrum Controls Trainability Pointer, signal propagation For a deep network to be trainable, the singular value distribution of J(x) must be balanced. If σ_max is too large, signals explode. If σ_min is too small, signals vanish. For linear networks, J = Πₗ Wₗ, so log singular values add, making initialization crucial. 2. Early Training, Singular Value Expansion Pointer, representation shaping As features form, the Jacobian spectrum expands: • large singular values grow • small singular values shrink • anisotropy increases The model stretches informative directions in the data and compresses noise. This matches Phase 1 in Concept 8. 3. Mid Training, Alignment with Data Geometry Pointer, anisotropy matching structure The Jacobian aligns with principal components of the data. If Σ_x has eigenvectors uᵢ, training pushes J(x) so that: dominant singular vectors of J(x) align with uᵢ. This amplifies discriminative directions and contracts irrelevant ones. 4. Late Training, Collapse Toward Low Rank Pointer, converged representation Near convergence: • intermediate Jacobians change little • singular values stabilize • effective rank decreases • Jacobian becomes a spiked spectrum This connects to Neural Collapse, where intra class features contract and class means form a simplex. 5. Overparameterization Enhances Stability Pointer, wide models and depth scaling In wide networks: • J(x) concentrates around its expectation • singular value spread narrows • gradient norms remain stable across depth This is part of why larger models train more reliably and avoid exploding or vanishing gradients. NTK behavior emerges when Jacobians stop evolving significantly. 6. Jacobian Flow Predicts Generalization Pointer, curvature and conditioning Sharp minima correlate with Jacobians having large σ_max and high condition number. Flat minima correspond to lower operator norms and better conditioning. Generalization improves when: ‖J(x)‖₂ is controlled and cond(J) = σ_max / σ_min is low. 7. Practical Implications for ML Engineers Pointer, actionable insights • Residual connections stabilize Jacobian flow • LayerNorm constrains singular values • Dropout reduces anisotropy • Weight decay lowers operator norms • Gradient clipping prevents spectrum spikes • Warmup smooths early spectral expansion • Large models maintain healthier Jacobians automatically Monitoring Jacobian norms can warn of instability before divergence. TL;DR Deep networks train by reshaping the Jacobian spectrum. Early training expands it to create features, mid training aligns it with data structure, and late training collapses it toward a stable low rank form. This Jacobian flow is a key reason deep learning yields expressive, stable, and generalizable solutions.

11

511

44

733

40K

thimabru retweeted

AI Breakfast

@AiBreakfast

7 months ago

3D Gaussian Splat is becoming incredibly good Pretty soon you’ll be able to recreate your own environments in video games just by snapping a few photos

7

253

30

110

21K

thimabru retweeted

Google DeepMind @GoogleDeepMind

8 months ago

We built a web app that lets you fly a spaceship through a 3D constellation of music - powered by our Lyria RealTime model. 🎶 Space DJ is an interactive visualization where every star represents a different music genre. As you explore, your path is translated into prompts for the API, creating a continuously evolving soundtrack. ↓

94

2K

270

810

280K

thimabru retweeted

Rohan Paul

@rohanpaul_ai

9 months ago

A beautiful paper from MIT+Harvard+ @GoogleDeepMind 👏 Explains why Transformers miss multi digit multiplication and shows a simple bias that fixes it. The researchers trained two small Transformer models on 4-digit-by-4-digit multiplication. One used a special training method called implicit chain-of-thought (ICoT), where the model first sees every intermediate reasoning step, and then those steps are slowly removed as training continues. This forces the model to “think” internally rather than rely on the visible steps. That model learned the task perfectly — it produced the right answer for every example (100% accuracy). The other model was trained the normal way, called standard fine-tuning, where it only saw the input numbers and the final answer, not the reasoning steps. That model almost completely failed — it only got about 1% of the answers correct. i.e. model trained with implicit chain of thought, called ICoT, gets 100% on 4x4 multiplication while normal training could not learn it at all

rohanpaul_ai's tweet photo. A beautiful paper from MIT+Harvard+ @GoogleDeepMind 👏

Explains why Transformers miss multi digit multiplication and shows a simple bias that fixes it.

The researchers trained two small Transformer models on 4-digit-by-4-digit multiplication.

One used a special training method called implicit chain-of-thought (ICoT), where the model first sees every intermediate reasoning step, and then those steps are slowly removed as training continues.

This forces the model to “think” internally rather than rely on the visible steps.

That model learned the task perfectly — it produced the right answer for every example (100% accuracy).

The other model was trained the normal way, called standard fine-tuning, where it only saw the input numbers and the final answer, not the reasoning steps.

That model almost completely failed — it only got about 1% of the answers correct.

i.e. model trained with implicit chain of thought, called ICoT, gets 100% on 4x4 multiplication while normal training could not learn it at all

29

2K

211

1K

388K

thimabru retweeted

Junxuan Wang @JunxuanWang0929

10 months ago

🔍 Ever notice how attention layers only tweak the residual stream in a low-dimensional way? That low-rank writing is exactly why so many SAE features stay dead—until Active Subspace Init rescues them. 👇 #AI #ML #MechInterp

JunxuanWang0929's tweet photo. 🔍 Ever notice how attention layers only tweak the residual stream in a low-dimensional way? That low-rank writing is exactly why so many SAE features stay dead—until Active Subspace Init rescues them. 👇
#AI #ML #MechInterp https://t.co/9kns55hLVj

3

516

74

423

36K

thimabru retweeted

Google DeepMind @GoogleDeepMind

11 months ago

What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵

812

13K

3K

4K

4M

thimabru retweeted

🥱 Sleepy (ML/DL)

@KrishnaNaraKun

12 months ago

Another Arxiv book 📖 😃 "The principles of Deep Learning Theory" Very good Textbook of 471 pages, Ideal for students and researchers interested in AI. Freely available on arxiv. Link in comments 😎 👇

KrishnaNaraKun's tweet photo. Another Arxiv book 📖 😃

"The principles of Deep Learning Theory"

Very good Textbook of 471 pages, Ideal for students and researchers interested in AI. Freely available on arxiv.

Link in comments 😎 👇 https://t.co/1p8WxpoFKa

3

215

31

234

21K

Thiago Matheus @thimabru

12 months ago

@TheVariational where can I buy the variational book?? I signed to the mail list and just got a small copy. I'm doing my master thesis about diffusion models and would be of great help for me

1

0

54

thimabru retweeted

Paul Couvert

@itsPaulAi

12 months ago

Finally! Google has just released Gemini CLI an AI agent that brings Gemini directly into your terminal → 1,000 free requests PER DAY → Open source You can use it as a coding agent, automate tasks, use MCPs, generate videos & images, etc. Steps to install and use it:

itsPaulAi's tweet photo. Finally! Google has just released Gemini CLI an AI agent that brings Gemini directly into your terminal

→ 1,000 free requests PER DAY
→ Open source

You can use it as a coding agent, automate tasks, use MCPs, generate videos & images, etc.

Steps to install and use it: https://t.co/unauuwMXJh

107

5K

686

6K

714K

thimabru retweeted

10X AI

@10X_AI_

about 1 year ago

6. His Personal Regret At 77, Hinton's biggest regret isn't professional. "I wish I'd spent more time with my wife and with my children when they were little." Both his wives died of cancer. He was "obsessed with work" and now warns others about time's true value.

5

228

35

89

30K

thimabru retweeted

Google DeepMind @GoogleDeepMind

about 1 year ago

Introducing MedGemma, our most capable open model for multimodal medical text and image comprehension. 🩻 MedGemma is available now as part of Health AI Developer Foundations → https://t.co/GfJkvBHjTF