I thought pregnancy would be ok because everyone offers to do things for you and you get to eat whatever you want but instead people don’t really care so you have to do the same amount of stuff except with excruciating back pain and you can’t eat because of acid reflux
MEMORY IS THE MOAT
@nikesharora, Chairman & CEO of @PaloAltoNtwks , interviewed by @HarryStebbings (@20vcFund )
Summary: Nikesh Arora took Palo Alto Networks from an $18 billion company to one worth $225 billion, and his read on enterprise AI is blunt: most companies are doing it wrong, and most of the products are not ready. His core claim is that consumers forgive AI's mistakes while enterprises cannot, so the money will flow to whoever builds the depth (the context, the memory, and the edge-case training) that lets an agent act without a human catching its errors. The companies that win will redesign themselves around AI instead of adding it to yesterday's workflow, and the lasting advantage will be the memory a system builds up about you. He expects token prices to fall 90%, half of G&A roles to disappear in 3 years, and more engineers and salespeople, not fewer.
1. Context Stickiness. The lasting advantage in AI is the context a system holds about you, not the model itself. Arora says the frontier labs are racing to remember what you asked over the last 30, 60, 90 days so each new answer gets easier and you stop wanting to leave. The more a model knows about a user, the higher the cost of switching, and that stickiness is the moat. For enterprises the same logic holds: the company that owns its context wins, not the one renting the smartest model.
2. Breadth Versus Depth. The frontier model problem is a breadth versus depth problem. Consumers tolerate false positives and enterprises have none to spare. Arora had Gemini write a passable investment memo in 4 minutes, and a wrong line or two did not matter because a person was sitting in the middle to catch it. An agent acting on its own has no person in the middle, so a false positive becomes a live failure. Consumer AI wins on breadth and brand, while real enterprise revenue comes from depth.
3. The Waymo Standard. Waymo is the biggest agentic product in the world, and it shows what depth actually costs. Replacing one human, the driver, took tens of billions of dollars of edge-case training and data that exists nowhere on the internet. You cannot drop the next Anthropic model into your Mercedes and tell it to drive you home. Every enterprise agent that truly replaces a person needs that same depth, which is why most agentic enterprise products are not ready.
4. Rethink The Workflow. Most enterprises are losing because they add a little AI to an old workflow instead of redesigning the workflow around AI. Arora's example: scanning an invoice 20% faster is the trap, while the real win is letting AI do 80% of the thinking, like reading every CV and telling you which 20 people to interview and what to ask each one. That means giving up human control, which is exactly what companies resist. The winners over the next 3 years rethink the company with AI, not the task.
5. Software With Opinions. The next wave of enterprise software will have opinions, and that is the real change Arora is pointing at. Coded SaaS gives you the output you defined for the input you fed it. An AI marketing assistant reads your copy, tells you it is off-brand, and says how to fix it. That opinion makes an average employee smarter, which is why Arora expects half the people in G&A functions like marketing, finance, and HR to be gone within 3 years.
6. More Engineers, Not Fewer. The fear that AI shrinks headcount is half wrong. Process-heavy G&A roles compress, but Arora wants more technical and more sales people. His teams keep asking for resources to rework marketing and HR, and for people who can prompt frontier models, build harnesses, and bring in data nobody else has. A good product also needs more sellers: he met 20 customers in Europe last week and half did not know what his 20-year-old company already ships.
7. Tokens At One-Tenth. Long-term token pricing should be a tenth of what it is today. Compute costs 2 to 4 times what it did 2 years ago because more than half of it feeds loss-making consumer AI, which forces the pricing pressure onto enterprise and coding workloads that have to pay. As compute gets more efficient and consumer usage gets capped, prices fall hard over the next 3 to 5 years. The model from 2 years ago was already good enough for 90% of tasks; the problem was it cost too much to run.
8. The Token Allocation Trap. Capping token spend punishes your best people. Arora runs a "use judiciously" model, not a free-for-all, because the smartest AI-savvy employee can burn 20 times the tokens of an average one. Playing whack-a-mole with cost hurts the high performers most and slows the learning you need. The better move is to track usage, leave the power users alone, and cap only the genuine outliers.
9. The Attacker's New Edge. Powerful coding models cut both ways. Trained to write good code, they are just as good at finding bad code. Pointed at his own systems, a model found in 6 weeks what would have taken his team 5 to 6 years. It cannot safely auto-patch, because it would "fix" 30% of things that are not broken, so it arms attackers faster than defenders. The result is urgency: every enterprise has to fix its systems faster, which is good for security companies.
10. The FTE Tell. If a startup needs forward-deployed engineers to sell into the enterprise, the product is not finished. Arora's read: enterprise AI is barely 12 months old, agents keep changing what the product even is, so vendors send engineers to build the product inside the customer while the technology keeps moving. A real forward-deployed engineer brings code back and folds it into the product; many are just adoption consultants. Expect customers to churn from one tool to the next, the way coding went from Windsurf and Devin to Codex, Claude, and Factory.
11. Three Missed Tricks. Miss one trick and you survive, miss two and you are partly impaled, miss three and you could be obsolete. This is why Arora spends more time than ever learning, pinging founders building things he does not yet understand. He buys early and cheap on conviction, treating an acquisition as a 10x or 100x bet where paying 1 or 2 times more does not matter, rather than waiting to buy the proven winner for a billion. He runs a twice-weekly "AI EIO" meeting so his top 15 leaders compete to show what they shipped.
12. The Sunk Cost Walk. A board member taught Arora to separate effort from wanting the outcome. After months grinding through a near-billion-dollar acquisition, he was told to take a long walk and ask one question: if this deal walked in the door right now with zero effort, would I still write the check? You have not spent a dollar yet, so the only thing that counts is whether it stands on its own merits. The same trap catches investors who confuse beating 8 VCs to a term sheet with the deal being good.
I’m non-technical but want to deeply understand AI.
@karpathy's “Intro to LLMs” is the best resource I’ve found so far.
Here are my biggest takeaways and questions from his 60-minute talk:
1. A large language model is “just two files.”
Under the hood, an LLM like LLaMA‑2‑70B is literally (1) a giant parameters file (the learned weights) and (2) a small run file (code that implements the neural net and feeds data through it).
Question: If the architecture code is tiny and public, what actual moat is left besides the weights?
2. Open‑weights vs closed models.
LLaMA‑2 is open‑weights: architecture + weights + paper are public. GPT‑4, Claude, etc. are closed: you get an API/web UI but not the actual model.
Question: For a company, when is “renting” a closed model strategically worse than owning an open‑weights model?
3. Training vs inference: training is the hard, expensive part.
Running the model (inference) is cheap; getting the weights (training) is a major industrial process.
Question: Where is the greatest axis of innovation in front of us to lower the cost of training significantly?
4. Pre‑training compresses ~10 TB of internet text.
LLaMA‑2‑70B is trained on roughly 10 TB of scraped internet text, compressed into 140 GB of parameters—a ~100× lossy compression of “internet knowledge.”
Question: Given that we’ve run out of knowledge on the internet to pre-train models on, is new data going to be the limiting factor on model improvement moving forward?
5. Training scale: ~6,000 GPUs × 12 days ≈ ~$2M for LLaMA‑2‑70B.
That’s already described as “rookie numbers” compared to modern frontier models, which are ~10× bigger in data/compute and cost tens to hundreds of millions.
Question: How far are we from “more compute” no longer being a competitive advantage?
6. Frontier models just scale this up by another ~10×.
State‑of‑the‑art models (i.e. GPT‑5) simply dial up parameters, data, and compute by large factors relative to LLaMA‑2‑70B.
Question: How much of GPT‑5‑style capability is just more scale vs genuinely new algorithms?
7. Core objective of an LLM predict the next word in a sequence.
LLMs are trained to take a sequence like “the cat sat on the” and predict the probability distribution over the next word (“mat” with ~97%, etc.).
Question: The beauty and the curse of LLMs is them being probabilistic. How can we create the right constraints such that people trust LLMs in enterprise settings?
8. Architecture is known: the Transformer.
We know all the math and wiring (layers, attention, etc.); that part is transparent and simple relative to behavior.
Question: If the architecture is commoditized, where exactly do you build sustainable differentiation? And how much more shelf life is there on the Transformer before a new architecture takes over?
9. Parameters are a black box.
Billions of weights cooperate to solve next‑word prediction, but we don’t really know “what each one does”—only how to adjust them to lower loss.
Rabbit hole: Read about mechanistic interpretability work.
10. Treat LLMs as empirical artifacts, not engineered machines.
They’re less like cars (fully understood mechanisms) and more like organisms we poke, test, benchmark, and characterize behaviorally.
Rabbit hole: Understand the current process for evals & if/what limitations exist in today’s eval tools.
11. Pre‑training vs. fine-tuning.
Pre-training favors quantity over quality; Fine-tuning flips that: maybe ~100k really good dialogs matter more than another terabyte of web junk.
Question: How much incremental performance can fine tuning and RHLF drive for models? Is it a fraction of what pre training does for performance or is it more meaningful than that?
12. Knowledge vs behavior.
Pre-training loads the model with world knowledge; Fine-tuning teaches it to be helpful, harmless, and to respond in Q&A format.
Rabbit hole: I’d love to deeply understand how exactly a model is fine tuned from beginning to end.
13. Reinforcement learning from human feedback (RLHF) via comparisons.
It’s often easier for labelers to rank several options vs. write the best one from scratch; RLHF uses these rankings to further improve the model.
Question: When exactly does it make sense to fine tune a model vs. use RHLF & does the answer depend on the domain of knowledge the model will be used for?
14. Closed vs open models.
Closed models are stronger but opaque; open‑weights models are weaker but hackable, fine‑tunable, and deployable on your own infra.
Question: As companies deploy agents, what is the most important consideration to make as they think about their AI tech stack?
15. Scaling laws: performance is a smooth, predictable function of model size and data.
Given parameters (N) and data (D), you can predict next‑token accuracy with surprising reliability, and the curve hasn’t obviously saturated yet.
Question: If capabilities keep scaling smoothly, what non‑technical bottlenecks (data rights, energy, chips, regulation) become the real limiters?
16. GPU and data “gold rush” is driven by scaling law confidence.
Since everyone believes “more compute → better model,” there’s a race to grab GPUs, data, and money.
Question: Let’s assume scaling laws no longer scale. Who is most screwed when the music stops?
17. LLMs as tool-using agents, not just text predictors.
Modern LLMs don’t just “think in text”; they orchestrate tools.
Given a natural-language task, the model decides to (1) browse the web, (2) call a calculator or write Python to compute ratios and extrapolations, (3) generate plots with matplotlib, and (4) even hand off to an image model (like DALL·E) to create visuals.
The intelligence is increasingly in the coordination layer: the LLM becomes a kind of “foreman” that plans, calls tools, checks outputs, and weaves everything back into a coherent answer.
18. How do LLMs know when to make a tool call?
“It emits special words, e.g. |BROWSER|. It captures the output that follows, sends it off to a tool, comes back with the result and continues the generation. How does the LLM know to emit these special words? Finetuning datasets teach it how and when to browse, by example.”
19. System 1 vs System 2 thinking applied to LLMs.
Concept popularized in Thinking Fast and Slow.
System 1 = fast, instinctive; System 2 = slower, deliberate, tree‑searchy reasoning.
Right now LLMs mostly operate in System 1 mode: same “chunk time” per token.
Rabbit hole: Explore how “chain‑of‑thought” method works & what limitations still exist in System 2 thinking for LLMs.
20. Desired future: trade time for accuracy.
This was before the first reasoning model (GPT O1) came out.
At the time, Karpathy talked about this idea of wanting to be able to say: “Here’s a hard problem, take 30 minutes,” and get a more accurate answer than a quick reply; currently, the models can’t do that in a principled way.
21. Model self‑improvement example: AlphaGo’s two stages.
AlphaGo first imitates human Go games, then surpasses humans via self‑play and a simple, cheap reward signal (did you win?).
Question: What’s the best way to improve models in domains where there isn’t a simple reward function, like creative writing or design?
22. Retrieval‑augmented generation (RAG) as “local browsing.”
Instead of searching the internet, the model searches your uploaded files and pulls snippets into its context before answering.
Question: Where does RAG break down in production?
23. Think of LLMs as the kernel process of a new operating system.
This process is coordinating resources including tools, memory, and I/O for problem-solving.
Future LLM will:
- read/generate text
- have more knowledge than any single human about all subjects
- browse the internet
- use existing software infrastructure
- see and generate images and video
- hear and speak and generate music
- think for a long time using system 2
- “self-improve” in domains with a reward function
- customized and fine-tuned
- communicate with other LLMs
Rabbit hole: Draw out the LLM OS and explain it to someone. This will show how well you understand the technology.
24. The LLM OS is reminiscent of today’s operating systems.
The finite context window is like working memory; browsing/RAG are like paging data in from disk or the internet; rapidly growing closed vs. open ecosystem; Managing what’s in context is a core challenge.
Rabbit hole: Explore techniques for working across many context windows & longer-running tasks.
25. New computing stack → new security problems.
Just as OS’ created new attack surfaces (malware, exploits), LLM‑centric stacks create their own families of attacks. Jailbreaks, adversarial prompting, adversarial suffixes, and prompt injection.
Question: security for AI systems seems orders of magnitude harder than traditional software because the # of edge cases feels infinite. Is this assumption right or wrong?
26: LLMs are a new computing paradigm with huge promise and serious challenges.
They compress internet‑scale knowledge, act as operating‑system‑like kernels, orchestrate tools and modalities, and open up both transformative products and novel security risks.
Question: what is the most nascent part of the LLM OS that needs to be built up in order to accelerate diffusion of the technology?
Link to the full “Intro to LLMs” video below ����
MY TAKEAWAY FROM TODAY'S OPENAI AI BROWSER LAUNCH:
the internet just got hands.
the average person won’t google, click, compare, or fill out forms within the next 24 months.
they’ll just say “book my trip”, “find me a job”, “launch my store” and the agent will do the 20 steps behind the scenes.
that means whole industries... travel, e-commerce, real estate, insurance, education are about to get rebuilt around outcomes instead of pages.
you won’t “go” to Expedia, you’ll just get the trip.
if you’re a founder, this is the moment to think in verbs. don’t build platforms people visit. build agents that finish what people start.
that’s where the next $100B companies come from.
the web is shifting from human browsing to agent doing.
internet hands.
Exclusive: Russian hackers (ie., APT28/Fancy Bear/Fighting Ursa) are using fake luxury car ads to target diplomats, according to new @PaloAltoNtwks research
https://t.co/a9SxQ6BWOW
What in the hell?!
A group of cybercriminals has filed an SEC complaint against a company for not disclosing a data breach.
Here's what we know and what this might mean for the future of ransomware:
All Americans should be horrified and outraged by the brazen terrorist attacks on Israel and the slaughter of innocent civilians. We grieve for those who died, pray for the safe return of those who’ve been held hostage, and stand squarely alongside our ally, Israel, as it dismantles Hamas. As we support Israel’s right to defend itself against terror, we must keep striving for a just and lasting peace for Israelis and Palestinians alike.
Boost #CyberResilience with tips from Unit 42 VP Sam Rubin — he shares insights from testifying on #ransomware attacks before the US House of Representatives Oversight and Accountability Committee. Read now: https://t.co/6zY6wTp6Av
Sam Rubin, VP and global head of operations at Unit 42, represents Palo Alto Networks as he testifies today before the House Oversight joint subcommittee hearing on “Combating Ransomware Attacks.” Watch now: https://t.co/jszvcTQIHx
Congratulations 🎉 to @wendiwhitmore! She's been named an honoree in @SCMagazine's 2023 list of Women in IT Security along with 20 other accomplished leaders. Read the full list of her fellow honorees here: https://t.co/M6h5jGNjka
Cyberattacks are on the rise, and #AI is at the forefront. Tune in as @wendiwhitmore, SVP of Unit 42, sheds light on the evolving threat landscape and the role of AI in the latest Unit 42 #ThreatVector podcast episode. Listen here: https://t.co/kPhrucZsqX
Today Lockbit ransomware group issued a poll to all of their affiliates.
Lockbit is considering implementing new rules for Lockbit affiliates due to their frustration with ransomware negotiators. Currently, Lockbit ransomware group has no rules in place for how much (or how little) affiliates can ransom a company for. They are considering "regulating" ransom demands.
They state newer affiliates are giving large discounts to victim companies out of desperation for money, whereas more experienced affiliates do not cave to negotiator's proposed payment from the victims.
Lockbit administrative staff are proposing the following options.
1. No changes in payment policy, payment options will remain "unregulated" and remain up to the affiliates.
2. New rules in place which set the minimum payment allowed to be 3% of the victim companies annual revenue with the option of a 50% discount, bringing it down to 1.5% of annual revenue.
3. Establish a new rule where affiliates can only grant a 50% discount of the original ransom price.
4. Establish a new rule where they will not accept a payment below the victims maximum ransomware insurance policy.
5. Establish a new rule where they will accept a minimum payment of 50% of the victims ransomware insurance policy.
In regards to this poll, National Hazard Agency, a subdivision of Lockbit ransomware group, has stated they will no longer accept payments below 3% of the companies annual revenue. They will immediately retaliate against any negotiator who approaches them with an offer of less than 3% of the companies revenue. The retaliation will be complete destruction of company data.
Image 1. Original Lockbit poll (Russian)
Image 2. Lockbit poll (English)
Image 3. Message from National Hazard Agency
Notorious phishing platform affecting 70000 users in 43 countries, being shut down and 3 arrests in international police operation coordinated by INTERPOL.
🚨Learn more here ⏬
https://t.co/F2WTksHhNa
"The first advert was real. The second, cheaper, repurposed notice was laced with malware." A recent #phishing campaign by #CozyBear — which we track as #CloakedUrsa — is highlighted by the @FinancialTimes. #Russia
Malicious packages disguised as legitimate software pose a threat to #cloud systems. Our new research spotlights a technical analysis of six packages meant for #CredentialStealing, personal data stealing and more found in the Python Package Index (#PyPI). https://t.co/CsYrQK5mO5