Vivian Score Fang @fangscore - Twitter Profile

Vivian Score Fang @FangScore

20 days ago

@Unveiled_ChinaX It should be 89-64 Tiananmen Square Memorials instead!

0

19

Vivian Score Fang @FangScore

about 2 months ago

@KrisPatel99 New innovations in inference, quantization of weights, new kv compression and new attention, are all easily implemented with CUDA, difficult to do with TPU. They need to pay extra to maintain convertibility to TPU

0

1

0

51

Vivian Score Fang @FangScore

about 2 months ago

@UziCryptoo Get your facts straight! Where do you get 12% tax rate? Standard deduction is $16000!

0

3

Vivian Score Fang @FangScore

about 2 months ago

@BichonRedux Biggest fairy tale I have ever heard

0

20

Who to follow

All EV-family since 2013. (Tesla Model-S, -3 & Fiat 500e). TSLA investor since 2014. Born in Apollo program years I am a big Space flight fan. Science rules.

Stephen Golubic

@GolubicStephen

Elon Musk saved free speech🙏🏻, America 1st patriot🇺🇸, rocket nerd🚀 dad, husband......plan to perish on Mars....they understand

Vivian Score Fang @FangScore

about 2 months ago

@yao30059829 即便模型本身不变N卡的灵活性也是非常重要的。最近经常出新的quantization 算法，每一个在N卡上都可以直接编程实现。TPU就不行

0

1

0

60

Vivian Score Fang @FangScore

about 2 months ago

@BTCdayu TPU其实没有明显优势。 HBM带宽比GPU 差，网络那块他们是买的，而且是从老黄家买的。在inferences 过程中你��要load balance 做得好再加上mtp ，就可以做到大batch，最麻烦的是TPU不支持custom programming ，比方说最近经常出新的quantization 算法很容易在GPU上编程，而TPU就不行

0

206

Vivian Score Fang @FangScore

about 2 months ago

@bboczeng This is a boat load of bullshit. One person’s agent load maybe bursty, the GPU cluster handles requests from large numbers of people’s agents. all of the GPU requests queue up to form batches and GB200 are perfectly suited to handle those

0

1K