Liling Tan @alvations - Twitter Profile

Liling Tan

@alvations

4 days ago

@lauriewired And the video: https://t.co/laa7NCIkaM

0

299

Liling Tan

@alvations

4 days ago

Learnt some fun computing history from @lauriewired =) Makes you wonder #llm https://t.co/TKvKq1hHeM And also https://t.co/sLeZKp3vrH has a fun website.

1

0

1

209

alvations retweeted

Isalia20

@Is36E

6 days ago

🚀 open sourced metalBLAS, hand-tuned Metal matmul kernels for Apple Silicon, callable from PyTorch on mps. Matches/beats MPS Graph (torch) matmuls on bf16/fp16, 2-3x faster on fp32 (TF32-relaxed) across the bench suite on M5 Pro. Next step is to upstream this to PyTorch! https://t.co/EMGdZaagXP

Is36E's tweet photo. 🚀 open sourced metalBLAS, hand-tuned Metal matmul kernels for Apple Silicon, callable from PyTorch on mps.

Matches/beats MPS Graph (torch) matmuls on bf16/fp16, 2-3x faster on fp32 (TF32-relaxed) across the bench suite on M5 Pro.

Next step is to upstream this to PyTorch!
https://t.co/EMGdZaagXP

1

78

11

37

9K

Liling Tan

@alvations

7 days ago

And this makes it an even more attractive idea to pursue! 😊

0

150

Who to follow

Adina Williams

@adinamwilliams

Computational linguistics, cognitive science, NLP; semantics + syntax AI research scientist (NYC); formerly @nyuling Also @adinawilliams.bsky.social

Mohit Bansal

@mohitban47

Parker Distinguished Prof @UNC. PECASE/ACL/AAAI Fellow. Director https://t.co/5qlPVgnrlN (@unc_ai_group). Past @Berkeley_AI @TTIC_Connect @IITKanpur #NLP #CV

Graham Neubig

@gneubig

Associate professor @LTIatCMU. Co-founder/chief scientist @OpenHandsDev. I mostly work on modeling language.

Liling Tan

@alvations

7 days ago

Apparently this is a wild enough idea that people are saying it sounds ridiculous but we’re not questioning pseudo #opensource translucent “open weights” #llm though they’re also binaries that generates code then the code generates the results.

Liling Tan

@alvations

13 days ago

What if, #llm coding agents don't need to write human readable code, no #Python, no #Java, no C++, no cobol, just binaries. Binaries that we don't understand but the CLI does the job we need. The ultimate source, not closed source, #opensource but pure binaries.

2

0

385

1

0

1

263

Liling Tan

@alvations

12 days ago

Okay, now lets think carefully (as if I'm prompting an #llm LOL), wouldn't any arguments saying that such a binary cannot be governed or audited or checked or reliable, doesn't the same applies to any "open weights" models? #foodforthought for the weekend.

0

42

Liling Tan

@alvations

13 days ago

What if, #llm coding agents don't need to write human readable code, no #Python, no #Java, no C++, no cobol, just binaries. Binaries that we don't understand but the CLI does the job we need. The ultimate source, not closed source, #opensource but pure binaries.

2

0

385

Liling Tan

@alvations

13 days ago

https://t.co/HKQzrvzdKp 🤔

0

54

Liling Tan

@alvations

13 days ago

With all these agents / #llm models in the wild. The true value will be created when human connection is placed first. The barista that makes the coffee with a robot but talks to you while the robot is making it or fetching it from the preorder rack.

0

1

92

Liling Tan

@alvations

21 days ago

合久必分 era should be over, 分久必合 is next and remember 世事變化無常

0

64

Liling Tan

@alvations

21 days ago

Startup labs doing #llm should start doing mergers and acquisitions. What’s coming is going to wipe out a lot of high burn rate companies… Think how much more powerful it’ll be if talent, compute and data is consolidated across these labs 😏

2

1

0

1

121

alvations retweeted

Stanford NLP Group

@stanfordnlp

22 days ago

Many roughly know how a transformer works To REALLY understand modern neural LMs—MoEs, GPU tiling, kernels, RLHF, data—you need CS336 By @tatsu_hashimoto, @percyliang The 2026 edition appears on yt with ~2 weeks delay https://t.co/iEWTqEivvB Materials https://t.co/E1pzUSC6Tr

stanfordnlp's tweet photo. Many roughly know how a transformer works

To REALLY understand modern neural LMs—MoEs, GPU tiling, kernels, RLHF, data—you need CS336

By @tatsu_hashimoto, @percyliang

The 2026 edition appears on yt with ~2 weeks delay
https://t.co/iEWTqEivvB

Materials
https://t.co/E1pzUSC6Tr https://t.co/yCdj8pDX45

12

2K

220

3K

294K

Liling Tan

@alvations

24 days ago

@MIT_CSAIL They can only spend money, not make them. What should happen is that agents should make money by themselves and make them self-sustaining. If it’s really so useful and clever we shouldn’t need to pay subscription for it, it should go find money to pay for itself 😉

0

2

0

1

529

Liling Tan

@alvations

26 days ago

254K -_- https://t.co/a4qH9G7P4Z Okay, I give up...

0

48

Liling Tan

@alvations

26 days ago

When stressed, build stuff =) https://t.co/RisGfOoK9U

1

0

108

Liling Tan

@alvations

26 days ago

And this person hunted some of them down... https://t.co/120G0TDTFj

1

0

76

Liling Tan

@alvations

26 days ago

https://t.co/Zsls3giWtM

0

22

alvations retweeted

hardmaru

@hardmaru

26 days ago

The human brain🧠 is incredibly efficient because it only activates the specific neurons needed for a thought. Modern LLMs naturally try to do this too (> 95% of neurons in feedforward layers stay silent for any given word), but our hardware punishes them for it. One of the most frustrating paradoxes in deep learning: making a model do less math often makes it run slower. Why? Because unstructured sparsity introduces irregular memory access, and GPUs are built for predictable, dense blocks of math. We teamed up with @NVIDIA to try to fix this hardware mismatch. Instead of forcing the GPU to adapt to the sparsity, we built a "Hybrid" format that reshapes the sparsity to fit the GPU. Our sparsity format (TwELL) dynamically routes the 99% of highly sparse tokens through a fast path, and uses a dense backup matrix as a safety valve for the rare, heavy tokens. Through TwELL and a new set of custom CUDA kernels for both LLM inference and training, we translated theoretical sparsity into actual wall-clock speedups: >20% faster training and inference on H100 GPUs, while also cutting energy consumption and memory requirements. Paper: https://t.co/rqIY9SYBDe Blog: https://t.co/oRjNbpJKha Code: https://t.co/FAFaJwpxAJ ⚡️

hardmaru's tweet photo. The human brain🧠 is incredibly efficient because it only activates the specific neurons needed for a thought. Modern LLMs naturally try to do this too (> 95% of neurons in feedforward layers stay silent for any given word), but our hardware punishes them for it.

One of the most frustrating paradoxes in deep learning: making a model do less math often makes it run slower. Why? Because unstructured sparsity introduces irregular memory access, and GPUs are built for predictable, dense blocks of math.

We teamed up with @NVIDIA to try to fix this hardware mismatch. Instead of forcing the GPU to adapt to the sparsity, we built a "Hybrid" format that reshapes the sparsity to fit the GPU. Our sparsity format (TwELL) dynamically routes the 99% of highly sparse tokens through a fast path, and uses a dense backup matrix as a safety valve for the rare, heavy tokens.

Through TwELL and a new set of custom CUDA kernels for both LLM inference and training, we translated theoretical sparsity into actual wall-clock speedups: >20% faster training and inference on H100 GPUs, while also cutting energy consumption and memory requirements.

Paper: https://t.co/rqIY9SYBDe
Blog: https://t.co/oRjNbpJKha
Code: https://t.co/FAFaJwpxAJ
⚡️

52

3K

503

3K

429K

Liling Tan

@alvations

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users