Preetam Mukherjee @therealpreetam - Twitter Profile

Preetam Mukherjee

@therealpreetam

13 days ago

@datarade Skills like these will help a lot: https://t.co/VuhhL27AzS

0

1

0

36

therealpreetam retweeted

Kumar🇺🇸

@datarade

19 days ago

If you're building a startup that is making a 510k medical devices that are for elective procedures in the USA, I might be interested in investing. dm me. I'm weak on biology - but have trained in operations research/industrial engineering + polymer-textile-fiber eng.

4

18

2

6

3K

Preetam Mukherjee

@therealpreetam

about 1 month ago

Sage words!

Kumar🇺🇸

@datarade

about 1 month ago

In a world of AI - as a software developer - why are you building a platform that requires hours of user configuration and meddling. F* that. Take the customer to the promised land they could only half fathom before your existence.

0

14

3

2

1K

0

1

0

65

Preetam Mukherjee

@therealpreetam

about 1 month ago

@doodlestein 😂

0

1

0

10

Who to follow

Brent

@dfspipedream

Sports Bettor. DFS Addict. Real Estate Investor. Electrician by trade. Day trader.

about 2 months ago

We’ve been struggling with GEMM efficiency on M=16,K=4096,N=16 shapes (common in cross‑attention for video). ‘DecomposeK’ + fusion of elementwise ops could be a game‑changer for our per‑step training time. Congrats to the PyTorch team!

Paul Zhang

@pz_ai1

about 2 months ago

Super excited to share some work the torch.compile team has done on generating state-of-the-art GEMMs through Inductor! We present DecomposeK, a new way to do Split-k GEMM initially presented at PyTorch conference Europe that regularly beats cuBLAS for split-k shapes. 🧵👇

pz_ai1's tweet photo. Super excited to share some work the torch.compile team has done on generating state-of-the-art GEMMs through Inductor! We present DecomposeK, a new way to do Split-k GEMM initially presented at PyTorch conference Europe that regularly beats cuBLAS for split-k shapes.

🧵👇 https://t.co/jDpuKn5ChL

5

83

9

53

9K

0

1

0

112

Preetam Mukherjee

@therealpreetam

about 2 months ago

😂

TrendSpider @TrendSpider

about 2 months ago

your friend who doesn’t use a stop loss:

40

2K

125

396

147K

0

27

Preetam Mukherjee

@therealpreetam

about 2 months ago

@MohapatraHemant That's a WILD story indeed 😅

0

989

Preetam Mukherjee

@therealpreetam

2 months ago

Games simulate choice. Vibe coding is choice.

Naval

@naval

2 months ago

Vibe coding is more addictive than any video game ever made (if you know what you want to build).

2K

30K

3K

5K

2M

0

63

therealpreetam retweeted

Naval

@naval

3 months ago

Pure software is rapidly becoming un-investable.

1K

24K

2K

4K

6M

Preetam Mukherjee

@therealpreetam

4 months ago

Most “AI productivity” is just faster procrastination. True leverage is using it to confront the hard questions sooner.

1

2

0

72

Preetam Mukherjee

@therealpreetam

4 months ago

AI doesn’t steal jobs; it steals mediocrity. The rest gets amplified.

2

1

0

70

Preetam Mukherjee

@therealpreetam

4 months ago

Luck compounds faster when you stop calling it luck and start calling it positioning.

1

0

57

Preetam Mukherjee

@therealpreetam

4 months ago

@doodlestein Hello!

0

1

0

285

Preetam Mukherjee

@therealpreetam

4 months ago

The metacrisis isn’t alignment. It’s that humans keep outsourcing alignment to the same systems that optimize for engagement over enlightenment.

1

0

45

Preetam Mukherjee

@therealpreetam

4 months ago

@cyantist @andrewfarah Very in for this. Not sure if you can swing it but @anirbanbandyo would make for an incredible guest. https://t.co/laaWlQR7jF

AnirbanBandyopadhyay @anirbanbandyo

5 months ago

I don’t know why some people argue, that brain is a receiver of consciousness, Brain could at most be a sensor, a receiver, an ammeter, and a processor of consciousness. We are writing a paper on SWP device, that is sensing whole in part. So universe is fractal network of brains.

8

58

2

14

3K

0

52

Preetam Mukherjee

@therealpreetam

4 months ago

Beff is right: backprop isn't going away just because we found a "cleaner" algorithm. It's dying because the hardware is finally catching up to the laws of physics. We're on our way to Thermodynamic Equilibrium Learning hybrids, which will train by physically annealing chips that are analog or stochastic (Extropic's TSUs, noisy oscillators, and photonic arrays). Learned representations are the same as equilibrium states. Built-in thermal noise means free exploration (no more hand-made schedulers or dropout). No activation storage, no backward pass, and no von Neumann wall. Early 2026 demos show that non-toy tasks (CIFAR/ImageNet subsets, medical imaging, and even seq modeling) can save 10 to 25 times more energy while still getting backprop accuracy. By 2027, frontier pilots should be able to handle real generative workloads with less than 20% energy draw....physics wins the race for efficiency. For pure-digital superclusters, backprop hangs on for a little while longer, but it becomes obsolete like perceptrons do today. *****The biggest change since transformers? Hardware-software co-design that lets the chip optimize itself in real time.***** 🤯 Physics goes beyond von Neumann bottlenecks. Who's betting against the heat? 🔥

Beff (e/acc)

@beffjezos

4 months ago

Backprop is going to die soon. Mark my words.

64

376

19

136

36K

1

0

55

Preetam Mukherjee

@therealpreetam

4 months ago

Even with strategies like paged attention and flash offloading, which helped companies like SanDisk reach $150 billion or higher valuations by resolving inference latency through DMA-overlapped HBM loads, Transformers' quadratic scaling is hitting walls. However, SSMs (like the Mamba variants) are linear-time monsters that are ideal for condensing lengthy sequences into small states. Hybrid Transformer-SSM distills, which save 5–6 times the memory on 1B+ models, retain only 2% of attention heads for retrieval while offloading the remainder to SSMs. This changes by 2027: Consider pre-encoding a video database or a corpus of 10M tokens into an SSM "trajectory" bundle, which is essentially a learned recurrent path that records dependencies, without keeping track of each KV pair. With the help of end-to-end fine-tuning that incorporates selection logic, the model only unrolls the pertinent sub-paths during queries.

0

28

Preetam Mukherjee

@therealpreetam

4 months ago

Bloated KV caches or even CAG hybrids won't be used in the future for multi-modal LLMs over large datasets. Instead, it will switch to "State-Space Compression" (SSC), which involves offline distillation of databases and multi-modal inputs (text, images, and video) into recurrent SSM trajectories that are then unrolled on-demand during inference. By learning the compression heuristics for you, gradient descent will transform retrieval into smooth state transitions, likely reducing memory by 70–90%, while allowing for "infinite" context, without causing attentional explosions.

2

1

0

30

Preetam Mukherjee

@therealpreetam

4 months ago

@sarahookr 1. https://t.co/dDnbv1l59i 2. https://t.co/210I87MzXK

0

1

0

459

Preetam Mukherjee

@therealpreetam

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users