Lukas Fisch

Verified account

@codingfisch

Don't panic! Just build!

Joined December 2013

102 Following

194 Followers

130 Posts

Pinned Tweet

almost 2 years ago

🧠 Exciting news for #neuroscience! Launching deepmriprep: Voxel-based Morphometry (VBM) preprocessing via neural networks in ~10 seconds per #brain image 🚀 🔗 Preprint: https://t.co/Es2lMhEDUu 🔗 GitHub: https://t.co/e7IDmkqvNT Install via "pip install deepmriprep"

5

279

65

155

26K

about 2 months ago

@OosGergVer @__tinygrad__ He should reconsider: Selling chips w solid firmware and OSS stack on top -> other people fixing the OSS stack -> possible alternative to CUDA moat CUDA is why nvidia is worth that much

1

3

0

1

279

about 2 months ago

@__tinygrad__ These straighforward comparisons to LLMs are an insult to the brain. The brain runs so much more efficient software. LLM vs brain is like Windows 11 vs TempleOS. I liked the estimate in your blog post "Brain FLOPS": 162 GFLOPS.

0

3

0

0

424

4 months ago

@gadevenyi Oops, this one should work: https://t.co/e7IDmkpXYl

1

2

0

0

78

Who to follow

Prof Psychiatry @Charité Berlin| Affective Disorders. Neuroimaging. Digital Mental Health

Dr. Katharina Brosch

Science is magic that works✨ Postdoc at @zhhresearch @NorthwellHealth – Risk and Resilience / Neuroimaging

@TheRealTimHahn1

Heisenberg Professor for Machine Learning. Tech enthusiast. Livin' the dream. Views are my own.

4 months ago

"deepmriprep: VBM preprocessing via deep neural networks" is published in Nature Computational Science 🧠💻 🔗 Paper: https://t.co/2ZQdiACU9z VBM preprocessing in ~10 seconds per #brain image 🚀 🔗 GitHub: https://t.co/PIA6NeGr2r… Install via "pip install deepmriprep"

2

89

30

60

5K

codingfisch retweeted

TimHahn @TheRealTimHahn1

4 months ago

This really speeds up preprocessing and shows - yet again - that neural networks are eating software. Great work, @codingfisch !! Happy to see this finally published.

0

1

1

0

152

4 months ago

Not convinced? How about this: Christian Gaser (author of CAT12) is building his new SBM toolbox around deepmriprep: https://t.co/3DLlZgd9S1

0

5

1

1

283

codingfisch retweeted

Nature Computational Science @NatComputSci

4 months ago

📢Out now! @codingfisch and colleagues present deepmriprep, a tool that leverages neural networks to enable 37x faster Voxel-based Morphometry preprocessing of MRI data than existing methods. https://t.co/WETn1yRrov

0

27

13

21

4K

6 months ago

@__tinygrad__ @ID_AA_Carmack RISC for array ops sounds too elegant to be impossible (as George said @clattner_llvm told him). I hope you will find the right abstractions to crush this problem soon!

0

0

0

0

147

codingfisch retweeted

8 months ago

Our GPU stack for both NVIDIA and AMD, aside from minimal pieces of signed firmware, is 100% open source and pure Python except for the compiler. It's not using vendor drivers, frameworks, or libraries. That's why it's so easy to make it work on Mac. For compilers, on AMD, we use upstream LLVM, and on NVIDIA, we use the NAK compiler from the MESA project. We plan to replace the compiler with pure tinygrad in a year or two as well. With RANGEIFY merged, our lowering stuff now matches the state of the art, TVM style. We're studying ThunderKittens and TileLang for speed at that level, and should have all this stuff ready in 200 days for the due date of our AMD Llama 405B training contract. Due to tinygrad's small size and pure Python nature, it's the easiest ML library to make progress on, aka fastest slope of improvement. With Megakernel style for scheduling, MODeL_opt style for planning, and E-graph style for symbolic, we should blow past the state of the art in PyTorch and JAX speed. If we do that, NVIDIA's moat is over. It's 1000 lines at most to add a new accelerator to tinygrad. And I don't mean to add a new accelerator with help from a kernel driver, compiler, and libraries. Just 1000 lines of software for the *whole* accelerator speaking right on the PCIe BARs, like what tinygrad is doing with the NVIDIA and AMD GPUs now.

25

820

55

205

61K

8 months ago

This!

george hotz archive @geohotarchive

8 months ago

the solution is simple but you aren’t demoralized enough yet https://t.co/z5zHLQwQ83

11

269

21

87

20K

0

0

0

0

93

8 months ago

These are special times 📈

8 months ago

Theorem: The maximum possible duration of the computational singularity is 470 years. Proof: The FLOPs capacity of all computers which existed in the year 1986 is estimated to be at most 4.5e14 (Hilbert et al. 2011). Based on public Nvidia revenue and GPU specs, this capacity has grown to at least 1e22 FLOPs as of 2025. This difference implies an average growth rate of 55% per year since 1986. Now observe that the physical universe can support at most 10^104 FLOPs (Lloyd 2000). Therefore, even if we allow for the discovery of faster than light travel, the computational singularity — i.e., the historical period of elevated social and technological unpredictability driven by rapid growth in worldwide computational capacity — cannot persist for longer than (2025 -1986) + (104-22)/log_10(1.55) ~= 470 years. References: S. Lloyd, “Ultimate physical limits to computation,” *arXiv preprint quant-ph/9908043*, 1999, doi:10.48550/arXiv.quant-ph/9908043. M. Hilbert and P. López, “The world’s technological capacity to store, communicate, and compute information,” *Science*, vol. 332, no. 6025, pp. 60–65, Apr. 2011, doi:10.1126/science.1200970.

61

686

52

397

344K

0

0

0

0

117

8 months ago

@eigenron Even better because only rollout is even faster! You can use the rollout function (see flashrl/main.py).

1

0

0

0

45

8 months ago

@eigenron Here is the repo: https://t.co/QYENXgUwKd

1

0

0

0

29

8 months ago

@eigenron Try "pip install flashrl" to make it even faster (on CPU and GPU). Write 6 lines of code to trains Pong in a few seconds!

1

1

0

3

174

8 months ago

Good summary but "frontier LLM researchers...shifted a little too much into exploit mode" is an understatement. A large chunk of ALL AI researchers bet on scaling up LLMs to AGI. If this bet fails we spent a lot of researcher FLOPs in a local optimum. New small-scale RL ideas are needed!

0

0

0

0

187

8 months ago

@jsuarez @ID_AA_Carmack @clashluke So no adam vs muon? If not, why? Would be of interest (at least for @ID_AA_Carmack an me 😄)

1

0

0

0

98

8 months ago

@jsuarez @ID_AA_Carmack @clashluke It would really help if you would quantify stuff like this rigorously. Muon vs Adam(W) across different envs (with hyperparameter sweeps if needed) with (~10) different random seeds. Would be interesting to see what changes in puffer bring the largest performance increase

1

0

0

0

106

8 months ago

@fchollet Google "Solomonoff induction" and "AIXI model" 😉 @mhutter42 formalized this idea very nicely. Unfortunately uncomputable 🙈

0

0

0

0

84

8 months ago

@charliermarsh After "pip install viiew" it also supports scrolling through your array/tensor/dataframe

2

11

1

7

493

Last Seen Users on Sotwe

Trends for you

Most Popular Users