I want to feel excited about the future, not afraid of it.
I want to thrive with what I’m given, not fear AI.
I want to do things for the love of the game, not because of fomo :)
I was very inactive lately because of some college related stuff but this was something that I was reading about so built this interactive demo, will try to add more details, cards etc. into this :)
Thanks @GPU_MODE for improving accessibility of GPU programming resources :)
@jino_rohit i see the only problem i have with nano-vllm (while its code is very clean and easy to understand) is that it’s based on deprecated v0 architecture which is quite different from the current v1 and advanced model runner ..
i actually think local bug fixes in vllm (which i did few times) is easier than understanding the whole system, or at least they feel like different skills.
also tbh i don’t know what else to regularly contribute to vllm besides bug fixes since the framework’s getting more and more mature
currently reviewing on common llm parallelism methods in inference: tp, dp, pp, and ep. this first note focuses on tensor parallelism.
it covers column and row parallelism introduced in Megatron-LM and vocab parallelism for embeddings/LM heads. Also a case study on how these ideas show up in vLLM codebase. hope it helps.
i’ve been regretting not going hands-on enough with what I’m learning lately.
you feel like you’re absorbing so much by reading papers, articles, and taking notes on it. that's all good but if you don’t actually implement (at least some part of) it, there’s a massive gap. you end up understanding things only “abstractly” (which is often very leaky) without ever verifying it in a real system.
ml (especially mlsys) is an extremely hands-on field. you really have to build and tinker on it to truly understand it.