And just to follow up - your launcher isn’t accepting a custom path for reinstall: c:\star_citizen. Doesn’t save on commit; resets back to its default path.
@CloudImperium curious if error 8004 means anything to you? Deleted and reinstalled, over 1TB free on the ssd. Rebooted and all that. Error hits so quick it feels like it’s not querying my PC. Any ideas? Should I update my gpu drivers?
Vin Scully and Hank Stram's CBS broadcast call of the #49ers' game-winning 89-yard touchdown drive — culminating in "The Catch — to defeat the #Cowboys in the 1981 NFC Championship at Candlestick.
The ensuing kickoff and Dallas's final possession are included.
January 10, 1982
@maxhbain howdy - I’m a jack of all trades product manger who likes getting into the weeds. Something appears to have changed with whisperx in last 48-72 hours; I can’t for the life of me get transcription + diarizarion and labeling running; suspect pants update and lost.
@neatprompts@Hesamation People who want to learn should start from scratch. False sense of security starting with someone else’s repo; too much gamification and min/maxing obsfucate the science and limitations beind this tech.
this is the most comprehensive and in-depth blog to understand vLLM. must read if you are into inference and ML systems and also helpful for beginners who want to contribute to vLLM. thank you aleksa!!
New in-depth blog post - "Inside vLLM: Anatomy of a High-Throughput LLM Inference System". Probably the most in depth explanation of how LLM inference engines and vLLM in particular work!
Took me a while to get this level of understanding of the codebase and then to write up this one - i quickly realized i understimated the effort. 😅 It could have easily been a book/booklet (lol).
I covered:
* Basics of inference engine flow (input/output request processing, scheduling, paged attention, continuous batching)
* "Advanced" stuff: chunked prefill, prefix caching, guided decoding (grammar-constrained FSM), speculative decoding, disaggregated P/D
* Scaling up: going from smaller LMs that can be hosted on a single GPU all the way to trillion+ params (via TP/PP/SP) -> multi-GPU, multi-node setup
* Serving the model on the web: going from offline deployment to multiple API servers, load balancing, DP coordinator, multiple engines setup :)
* Measuring perf of inference systems (latency (ttft, itl, e2e, tpot), throughput) and GPU perf roofline model
Lots of examples, lots of visuals!
---
I realize i've been silent on social - many of you noticed and thanks for reaching out! :) --> I'm so back! lots of things happened.
Also, in general, I'm a bit sick of superficial content, it really is an equivalent of junk food (h/t @karpathy).
I want to do the best/deepest technical work of my life over the next years and write much more in depth (high quality organic food ;)) so I might not be as frequent around here as i used to be (? we'll see). I'll make it a goal to share a few paper summaries a week or stuff that's relevant / in the zeitgeist.
If you have any topics that happened over the past few weeks/months drop it down in the comments i might focus on some of those in my next posts.
---
Huge thank you to @Hyperstackcloud for giving me an H100 node to run some of the experiments and analysis that i needed to write this up. The team there led by Christopher Starkey is amazing!
Also a big thank you to Nick Hill (who did a very thorough review of the post - basically a code review lol; Nick's a core vLLM contributor and principal SWE at RedHat) and to my friends Kyle Krannen (NVIDIA Dynamo), @marksaroufim (PyTorch), and @ashVaswani (goat) for taking the time during weekend when they didn't have to!
@LinkedIn I noticed I can’t select more than single paragraphs in articles on mobile iOS app. Not saying your product manager wrote a user story to remove that functionality, but regardless I’m using safari on mobile as a result. Lmk when you can restore functionality - thanks!
@Sumanth_077 Nice! Do you have the roadmap for an intuitive and useful windows 11 start menu? What about folders with thousands of files that load quickly?
@CloudImperium food for thought: it’d be nice to disable mfd casts as a global general setting. They’re a nice feature; team should be proud - but I prefer MFD panels. Nice to have for crafting: upgrade cockpits to support additional MFDs as well.