@chamath When CEOs hire a PwC type consulting firm vs. an AI lab, isnโt CYA aka โthe consultant confirmed this was a great ideaโ a big reason the CEO still would want to hire a consulting firm?
Did a very different format with @reinerpope โ a blackboard lecture where he walks through how frontier LLMs are trained and served.
It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk.
Itโs a bit technical, but I encourage you to hang in there - itโs really worth it.
There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him.
Recommend watching this one on YouTube so you can see the chalkboard.
0:00:00 โ How batch size affects token cost and speed
0:31:59 โ How MoE models are laid out across GPU racks
0:47:02 โ How pipeline parallelism spreads model layers across racks
1:03:27 โ Why Ilya said, โAs we now know, pipelining is not wise.โ
1:18:49 โ Because of RL, models may be 100x over-trained beyond Chinchilla-optimal
1:32:52 โ Deducing long context memory costs from API pricing
2:03:52 โ Convergent evolution between neural nets and cryptography
Great comment from BG on where AI software vs. hardware opportunity may land in the long runโฆand how new open source voice models are changing business lines
People really donโt understand what a competitive strategic weapon open source has become and how it works. Alfred Marshall would be proud. Always good to reread the cathedral and bazaar. https://t.co/Il8BsDuDeI
I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue. There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering. Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession. Roll up your sleeves to not fall behind.
@BenBajarin@BenBajarin is it odd if the goal was an acqui-hire type deal? Been seeing more of these in the software space in SV, maybe time it comes to chips
Today Groq entered into a non-exclusive licensing agreement with Nvidia for Groqโs inference technology. Along with other members of the Groq team, Iโll be joining Nvidia to help integrate the licensed technology. GroqCloud will continue to operate without interruption.
Learn more here: https://t.co/1Lgv1EKZNH