Jane Street put Cornell professor Nate Foster on Signals and Threads.Your H100 sits idle most of the run.
Not broken. Waiting for its peers to finish exchanging tensors. Every uncovered microsecond is burned cash.
That's why NVIDIA quietly split the world into scale-up and scale-out. The whole training arms race is now a networking arms race.
The same tricks that move trades in nanoseconds across an exchange now move gradients across a GPU cluster.
A hedge fund analyst code named Tipper X made four trades on inside information.
Total profit: 46'000 dollars. The smallest of all 81 people arrested.
A professor later did the math on the career he threw away. 23 million.
Before the FBI caught him he stuffed 15,000 in cash into his socks and shirt to clear airport security.
Then they gave him a choice. Wear a wire or go to prison.
He wore it dozens of times and helped build 20 of the biggest insider cases in a generation.
Jane Street brought in Horace He to explain how to squeeze the most out of a single GPU. And then how to scale it to tens of thousands.
Graduated from Cornell in 2020. Works on the PyTorch team.
Training Llama 3 takes 40 trillion trillion floating point operations.
He wrote the two things those models run on: torch.compile and FlexAttention.
JP Morgan Private Bank had an analyst named David that everyone ran to with questions about funds for years.
A small research team handled thousands of investment products by hand. A client asks in a meeting why a fund was terminated, and the advisor would dig through emails and databases for hours.
So they built an AI agent and named it after the real David. Now they ask the agent, not the man.
Under the hood it's a swarm of agents: a supervisor routes the work, sub-agents pull data, a separate agent checks the answer, a human stays in the loop on the big calls.