Félix Robles

@findeton

Startup & AI Enthusiast, Engineering @sequentech

Joined June 2018

190 Following

15 Followers

68 Posts

findeton retweeted

Yasir Ai

@AiwithYasir

2 months ago

🚨Just IN: MIT proved you can delete 90% of a neural network without losing accuracy. Researchers found that inside every massive model, there is a "winning ticket”, a tiny subnetwork that does all the heavy lifting. They proved if you find it and reset it to its original state, it performs exactly like the giant version. But there was a catch that killed adoption instantly.. you had to train the massive model first to find the ticket. nobody wanted to train twice just to deploy once. it was a cool academic flex, but useless for production. The original 2018 paper was mind-blowing: But today, after 8 years… We finally have the silicon-level breakthrough we were waiting for: structured sparsity. Modern GPUs (NVIDIA Ampere+) don’t just “simulate” pruning anymore. They have native support for block sparsity (2:4 patterns) built directly into the hardware. It’s not theoretical, it’s silicon-level acceleration. The math is terrifyingly good: a 90% sparse network = 50% less memory bandwidth + 2× compute throughput. Real speed.. zero accuracy loss. Three things just made this production-ready in 2026: - pruning-aware training (you train sparse from day one) - native support in pytorch 2.0 and the apple neural engine - the realization that ai models are 90% redundant by design Evolution over-parameterizes everything. We’re finally learning how to prune. The era of bloated, inefficient models is officially over. The tooling finally caught up to the theory, and the winners are going to be the ones who stop paying for 90% of weights they don’t even need. The future of AI is smaller, faster, and smarter.

AiwithYasir's tweet photo. 🚨Just IN: MIT proved you can delete 90% of a neural network without losing accuracy.

Researchers found that inside every massive model, there is a "winning ticket”, a tiny subnetwork that does all the heavy lifting.

They proved if you find it and reset it to its original state, it performs exactly like the giant version.

But there was a catch that killed adoption instantly..

you had to train the massive model first to find the ticket. nobody wanted to train twice just to deploy once. it was a cool academic flex, but useless for production.

The original 2018 paper was mind-blowing:

But today, after 8 years…

We finally have the silicon-level breakthrough we were waiting for: structured sparsity.

Modern GPUs (NVIDIA Ampere+) don’t just “simulate” pruning anymore.

They have native support for block sparsity (2:4 patterns) built directly into the hardware.

It’s not theoretical, it’s silicon-level acceleration.

The math is terrifyingly good: a 90% sparse network = 50% less memory bandwidth + 2× compute throughput. Real speed.. zero accuracy loss.

Three things just made this production-ready in 2026:

- pruning-aware training (you train sparse from day one)
- native support in pytorch 2.0 and the apple neural engine
- the realization that ai models are 90% redundant by design

Evolution over-parameterizes everything. We’re finally learning how to prune.

The era of bloated, inefficient models is officially over. The tooling finally caught up to the theory, and the winners are going to be the ones who stop paying for 90% of weights they don’t even need.

The future of AI is smaller, faster, and smarter.

894

201

606

52K

findeton retweeted

alphaXiv

@askalphaxiv

2 months ago

"Hyperloop Transformers" This paper propose a memory-efficient LLM via looped Transformers. They basically reuse the middle block across depth, then add hyper-connections only between loops. Key result is that this restores flexibility lost from weight sharing, letting the model beat depth-matched Transformers with ~50% fewer parameters. The result still holds after INT4 quantization too.

askalphaxiv's tweet photo. "Hyperloop Transformers"

This paper propose a memory-efficient LLM via looped Transformers.

They basically reuse the middle block across depth, then add hyper-connections only between loops.

Key result is that this restores flexibility lost from weight sharing, letting the model beat depth-matched Transformers with ~50% fewer parameters. The result still holds after INT4 quantization too.

382

242

20K

findeton retweeted

Peihao Wang @peihao_wang

2 months ago

Latent space reasoning via looped transformers has gained attention lately. It is rooted in optimization unrolling , where each loop implicitly models a GD step on hidden states. Our ICLR paper studied what if we explicitly run GD in latent space at test time?

381

326

36K

findeton retweeted

Grigory Sapunov

@che_shr_cat

2 months ago

1/ Flat minima theory is breaking. At modern scale, gradient descent doesn't settle into a nice convex bowl. It bounces chaotically at the "Edge of Stability." Turns out, this chaos is exactly why massively overparameterized networks generalize. 🧵

che_shr_cat's tweet photo. 1/
Flat minima theory is breaking. At modern scale, gradient descent doesn't settle into a nice convex bowl. It bounces chaotically at the "Edge of Stability."

Turns out, this chaos is exactly why massively overparameterized networks generalize. 🧵 https://t.co/CFKJK05srd

230

284

15K

findeton retweeted

Juan Ramón Rallo

@juanrallo

2 months ago

En efecto: ¿por qué redistribuir (vía impuestos y coactivamente) fuera de tu esfera personal, familiar o local? Prioridad personal, familiar y local.

797

101

71K

findeton retweeted

Paul Graham

@paulg

2 months ago

Hamming's talk is so important that I reproduced it on my site. It's one of the only things on my site written by someone else. https://t.co/kWvKdwIiOm

428

790K

findeton retweeted

Federico Alves, Econ.

@federicoalves

2 months ago

El comunismo sigue vivito y coleando

270

findeton retweeted

La Fuerza @lafuerzacarajo

2 months ago

Situación de la Ley Laboral

findeton retweeted

Mario Nawfal

@MarioNawfal

2 months ago

🇫🇷 A French tax official was arrested for selling crypto investors' home addresses and financial records to criminal networks. 41 kidnappings followed. One every 2.5 days since January 2026. The criminals didn't need to hack anything. They bought a list from someone inside the government. France is the most dangerous country in the world right now if you hold crypto and someone knows about it 💀 Source: Le Mond

MarioNawfal's tweet photo. 🇫🇷 A French tax official was arrested for selling crypto investors' home addresses and financial records to criminal networks.

41 kidnappings followed. One every 2.5 days since January 2026.

The criminals didn't need to hack anything. They bought a list from someone inside the government.

France is the most dangerous country in the world right now if you hold crypto and someone knows about it 💀

Source: Le Mond

346

10K

839K

findeton retweeted

Argentina Potencia

@argypotencia

2 months ago

Finalmente llegamos a súper netas positivas.

395

findeton retweeted

Juan Ramón Rallo

@juanrallo

2 months ago

Esta tarde entrevistaré a @Alvaro_DMaria sobre Bitcoin y sus retos de futuro. https://t.co/890j1o9IDH

258

40K

findeton retweeted

Lawrence M. Krauss

@LKrauss1

2 months ago

Our newest @OriginsProject podcast, What's New in Science with @skdh & Lawrence Krauss: From Ghost Murmers to AI Cures, will premiere at 4 PM ET today. Don't miss it! https://t.co/y2zc0KEJc2 via @YouTube

14K

findeton retweeted

Paul Graham

@paulg

2 months ago

Whoah, self-driving cars compete with airlines. I never considered that till now.

11K

findeton retweeted

Handre

@Handre

2 months ago

The Japanese railway privatization of 1987 stands as one of the most devastating defeats ever dealt to statist transportation mythology. The government split the bloated Japan National Railways into seven regional companies, sold them off, and watched private ownership transform a bankruptcy-bound disaster into the world's most efficient rail system. JNR hemorrhaged money for decades before privatization. By 1987, the state railway carried debt equivalent to $200 billion in today's money while delivering mediocre service plagued by strikes and inefficiency. Politicians treated it as a jobs program rather than a transportation service. The predictable result: chronic losses, deteriorating infrastructure, and customer service that reflected government monopoly arrogance. Private ownership changed everything overnight. The new JR companies slashed operating costs by 40% within five years while dramatically improving service quality. JR East alone now generates annual profits exceeding $3 billion. These companies invest billions in cutting-edge technology, maintain punctuality rates above 99%, and operate the world's most advanced high-speed rail networks. They achieved this without a single yen of operational subsidies. The transformation reveals a core dynamic of transportation infrastructure: private companies must satisfy customers to survive, while government monopolies need only satisfy politicians. JR companies diversified into real estate, retail, and hospitality around their stations, creating integrated profit centers that cross-subsidize rail operations. Government railways never innovate this way because bureaucrats face no market pressure to generate returns. Meanwhile, Amtrak burns through $2 billion in annual subsidies while delivering third-world service across most routes, and European state railways require massive taxpayer bailouts every few years to stay solvent.

Handre's tweet photo. The Japanese railway privatization of 1987 stands as one of the most devastating defeats ever dealt to statist transportation mythology. The government split the bloated Japan National Railways into seven regional companies, sold them off, and watched private ownership transform a bankruptcy-bound disaster into the world's most efficient rail system.

JNR hemorrhaged money for decades before privatization. By 1987, the state railway carried debt equivalent to $200 billion in today's money while delivering mediocre service plagued by strikes and inefficiency. Politicians treated it as a jobs program rather than a transportation service. The predictable result: chronic losses, deteriorating infrastructure, and customer service that reflected government monopoly arrogance.

Private ownership changed everything overnight. The new JR companies slashed operating costs by 40% within five years while dramatically improving service quality. JR East alone now generates annual profits exceeding $3 billion. These companies invest billions in cutting-edge technology, maintain punctuality rates above 99%, and operate the world's most advanced high-speed rail networks. They achieved this without a single yen of operational subsidies.

The transformation reveals a core dynamic of transportation infrastructure: private companies must satisfy customers to survive, while government monopolies need only satisfy politicians. JR companies diversified into real estate, retail, and hospitality around their stations, creating integrated profit centers that cross-subsidize rail operations. Government railways never innovate this way because bureaucrats face no market pressure to generate returns.

Meanwhile, Amtrak burns through $2 billion in annual subsidies while delivering third-world service across most routes, and European state railways require massive taxpayer bailouts every few years to stay solvent.

741

15K

findeton retweeted

DAN DEL FUTURO @MonstruOSOGordo

2 months ago

QUE POCO SE HABLA DE LA BATALLA QUE ESTÁ DANDO @SPettovelloOK COMPARTE PODIO CON TOTO CAPUTO

420

25K

findeton retweeted

Ravid Shwartz Ziv

@ziv_ravid

2 months ago

Attention sinks and compression valleys? Same coin. Presenting at #ICLR2026 this morning: "Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin." Two phenomena everyone's been studying separately turn out to have the same root cause - massive activations in the residual stream. We prove it, show it across models from 410M to 120B, and use it to propose Mix-Compress-Refine: a three-phase view of how transformers organize computation in depth. w/ Enrique* @arroyo_alvr @fedzbar @epomqo @mmbronstein @ylecun Pavilion 3, P3-#2002, 10:30 AM AM local time

ziv_ravid's tweet photo. Attention sinks and compression valleys? Same coin.

Presenting at #ICLR2026 this morning: "Attention Sinks and Compression Valleys in LLMs are Two Sides of the Same Coin." Two phenomena everyone's been studying separately turn out to have the same root cause - massive activations in the residual stream. We prove it, show it across models from 410M to 120B, and use it to propose Mix-Compress-Refine: a three-phase view of how transformers organize computation in depth.

w/ Enrique* @arroyo_alvr @fedzbar @epomqo @mmbronstein @ylecun

Pavilion 3, P3-#2002, 10:30 AM AM local time

findeton retweeted

Fundación Faro

@fundfaro

2 months ago

Tenemos el honor de invitarlos a un debate encabezado por el Presidente de la Nación, @JMilei , junto al diputado nacional @AdrianRavier y Juan Carlos de Pablo, en el Palacio Libertad, donde se analizará "La teoría general del empleo, el interés y el dinero" de John Maynard Keynes y sus consecuencias en las economías modernas. Link de inscripción: https://t.co/hoA8MSv2sJ INFORMACIÓN IMPORTANTE: La inscripción es gratuita mediante formulario online, con ubicaciones por orden de llegada. La inscripción no garantiza lugar y el cupo es limitado. 📅 28 de abril 🕡 18:00 hs 📍 Palacio Libertad

fundfaro's tweet photo. Tenemos el honor de invitarlos a un debate encabezado por el Presidente de la Nación, @JMilei , junto al diputado nacional @AdrianRavier y Juan Carlos de Pablo, en el Palacio Libertad, donde se analizará "La teoría general del empleo, el interés y el dinero" de John Maynard Keynes y sus consecuencias en las economías modernas.

Link de inscripción: https://t.co/hoA8MSv2sJ

INFORMACIÓN IMPORTANTE: La inscripción es gratuita mediante formulario online, con ubicaciones por orden de llegada. La inscripción no garantiza lugar y el cupo es limitado.

📅 28 de abril
🕡 18:00 hs
📍 Palacio Libertad

349

104

41K

findeton retweeted

Ian Miles Cheong

@ianmiles

2 months ago

Marc Andreessen reveals the exact framework Elon Musk uses to run six companies at once and outpace entire industries. It comes down to a rare combination of old-school industrialism and extreme, hands-on engineering. "The CEO has to not just be a great CEO, they also have to be like a great technologist," Andreessen explains. While most executives rely on distant memories of being a programmer, Elon has the encyclopedic knowledge to sit down with a chip designer at 2 AM in Austin and actually figure out what is wrong with the hardware. He is able to go hands-on with rocket designers, AI engineers, and everything in between. Instead of traditional corporate management, he treats everything as a production line. Every week, he maps out the entire operation on monitors, identifies the one critical bottleneck slowing things down, and goes directly to the engineers to solve it. This is the secret to his speed. While a normal company might take six months to clear a single issue, Elon is fixing the critical production bottleneck at his companies 52 times a year himself. He runs this loop over and over again. This relentless approach creates what one former SpaceX employee described as a "zone of shocking competence." Because Elon talks directly to the people actually doing the work, he instantly sniffs out incompetence. Anyone who cannot cut it is let go. But it is also the ultimate talent magnet. The absolute best engineers in the world want to work for him because he is the rare CEO who can actually be an engineering peer. It is a highly systematic way of optimizing a company to take on profound challenges and solve them at an unmatched speed.

205

11K

findeton retweeted

Ravid Shwartz Ziv

@ziv_ravid

2 months ago

New episode of The Information Bottleneck is out, this time with @liuzhuang1234 (Princeton). We talked about ConvNeXt and whether architecture still matters; dataset bias and what "good data" actually looks like; ImageBind and why vision is the natural bridge across modalities; CLIP's blind spots; memory as the real bottleneck behind the agent hype; whether LLMs have world models; and Transformers Without Normalization. For years, the vision community debated what actually matters: architecture, inductive bias, self-attention vs convolution. After a lot of back-and-forth, we ended up in a funny place: ViT and ConvNet give roughly the same performance once you tune the details. What I find interesting is that once you reach a certain performance level, it becomes much easier to swap and tweak components without really changing the outcome. Talking to Zhuang on this episode, I kept wondering whether the same is now true for LLMs. If we wil spent serious time on an alternative architecture today, would you actually get a meaningfully different model, or just land on the same Pareto curve with extra steps? I'm starting to suspect it's the latter. Architecture matters less than we think. Data, compute, and a handful of pillars do most of the work.

26K

findeton retweeted

HevercastroB

@HeverCastroB

2 months ago

No es casualidad que ayer se reuniera el encargado de negocios de EE.UU en Venezuela John Barrett con Delcy y Diosdado, y de inmediato la OFAC emitiera una licencia permitiendo que Delcy pague los millonarios honorarios a los abogados de Maduro y Cilia. ¿Qué cedió Delcy a cambio?

HeverCastroB's tweet photo. No es casualidad que ayer se reuniera el encargado de negocios de EE.UU en Venezuela John Barrett con Delcy y Diosdado, y de inmediato la OFAC emitiera una licencia permitiendo que Delcy pague los millonarios honorarios a los abogados de Maduro y Cilia.
¿Qué cedió Delcy a cambio? https://t.co/xEA0Sh36dA

245

894

165

143K

Félix Robles

@findeton

Last Seen Users on Sotwe

Trends for you

Most Popular Users