@kleinerperkins is thrilled to lead the Series A of @sailresearchco and I'm pumped to have joined the board. Agents executing complex work end to end is the ultimate promise of AI, but they are also extremely token hungry. With exploding token volumes, latency stops being the bottleneck. Cost and throughput become everything. Businesses simply cannot afford their inference bills in the age of agents and we're already seeing signs of the tokenmaxxing era running into budget caps, leaving tons of demand for intelligence unmet.
@neilmovva and @blintzbase are two of the most precocious builders I've met and are building Sail to meet the moment. They are building *the* inference platform for long horizon agents that will enable every business to run their agents with maximum efficiency and lowest cost. In addition, they have built world class observability (Sail Voyages) and a stateful sandbox (Sailboxes) positioning Sail to be the one place where you operate and run your most critical agents.
As someone who worked at Palantir when the FDE model was laughed at and misunderstood, here are three observations of what made it successful:
1. FDE != Solutions Engineering:
Solutions Engineering was a subset of what FDEs. When Foundry was emerging, FDEs had to not just implement it via data integrations, but actually write custom software on top that bridges the gap between the platform and customer value. So, the work was implementation, full stack engineering and some DevOps.
2. FDE <> Software Eng Virtuous Cycle:
Palantir hired FDEs whose backgrounds and pedigree were identical to software engineers hired. This not only allowed them to perform 1., but also created a virtuous cycle of intelligent product feedback. Because FDEs were deeply technical and understood the customer pain points so well, they were high signal PMs for product dev teams who could then build the right abstractions into Foundry. This in turn meant you required fewer FDEs on every successive deployment.
3. You need to Whale Hunt:
The FDE model as described only makes sense when pursuing massive, massive contracts. Otherwise, you're better off with solutions engineers. The work of FDEs was not linear. There was a lot of problem solving, blocking and tackling and strong engineering work that they had to perform.
Reliability is the name of the game for agents, and it's unlikely to be solved purely at the model layer for the foreseeable future. This is creating green shoots for infrastructure builders, with a few interesting trends starting to emerge:
1. Simulation as CI for agents:
a) The most valuable piece of data today is trajectory data i.e. collections of task (P) -> {t1, t2... tk} mappings. With more trajectory data, agents can be improved with techniques like RFT.
b) Since these trajectories can be quite specific to a company's underlying data (D), you need to be able to actually simulate the behavior of agents within your environment vs. rely on 3P trajectory data.
So, how might you do this?
- Maintain an agent and MCP registry for an enterprise, and a staging environment. Bootstrap a metadata layer that contains the objective of each agent, the tools it has access to, the scope of each agent vis.a.vis each tool etc. Your SDK may need to generate MCP servers on the fly for certain internal applications.
- Execute scenarios in staging for each agent by providing prompt / task variations, inspecting the tool calls produced and evaluating performance against a multi-objective reward function (e.g. performance against the objective, minimization of tool invocations).
- A critical component is accurately providing quantifiable reward functions for each agent that unlock high-fidelity evals and close the loop for reliable CI.
- All of this needs to be productized: easy-to-adopt infrastructure that developers can extend, but with batteries included. You can start to see a new paradigm forming—not unit tests for code, but simulation harnesses for agents.
What happens when you get trajectory data?
2. Enterprises will move to "context lakes":
- An evolving, queryable memory layer that serves as a hub for agent trajectories enriched by enterprise data stored in the delta lake / SNOW. A potent mix of a knowledge base, a semantic cache, and an execution log.
- Extremely fast reads for inference-time retrieval that supports high QPS.
- As mentioned in a prior post, the semantic cache (really interesting opportunity for startups) will cluster task–trajectory pairs (e.g., via k-means), enabling fast retrieval and “result fusing” during planning or tool selection.
Agents will dip into the context lake constantly. High QPS, low-latency context fetch will become as important as fast embedding search is today.
3. Agent authentication becomes a first-class concern:
-Traditional OAuth and API key models break down when agents act on behalf of users and themselves, across long-lived sessions.
-You need a framework for agent identity, delegation, and scoping—one that supports things like tool level permissions, task bound credentials and delegation graphs.
We’re entering an era where testing software means simulating behavior, querying software means retrieving context, and securing software means authenticating autonomous agents.
I remember at @Columbia engineering career fairs, the best companies in hindsight who would recruit from campuses (Pure Storage, Palantir, Dropbox etc) would all pose 1/2 tricky math and technical brain teasers to weed lines of students.
1. Immediate screening efficiency for those companies
2. Created mystique around their brands and instant respect for their engineers posing those questions.
Two decades ago @ParisHeymann and I used to compete against each other in the US chess circuit. We were both top 7 in the country.
Now we’re finally running it back ♟️
Bumped into the man himself, Jensen, and convinced him to spend a few minutes with a group we were hosting at dinner.
He showed up and spent 40 minutes. Warmest, most humble and authentic leader I've ever met. And technically so deep across a broad range of topics!
How fun was this!! Thank you https://t.co/OiLcpt9tix for having me as a panelist! Love seeing a growing group of South Asian women taking the corporate world by storm. Cheers to broader representation, and breaking barriers in the boardroom! 🚀💪🏽💯
https://t.co/cUmGijmJPk
Published some thoughts on how we may be able to unlock “AI coworkers”. https://t.co/hQcP27Metz
This will be a new class of AI native product across domains where precision is of paramount importance and correctness can be quantified.
I'm excited about the startups building towards this future, unlocking digital employees for enterprises writ large!
What an inspiring evening it was!
We are immensely grateful to Ms. Indra Nooyi for gracing us with a dynamic and power-packed session. Ms. Nooyi’s passion for her work and her dedication to inspire women was truly evident in every word she spoke.
Despite outward enthusiasm, businesses want better software guardrails for LLMs before adopting them more broadly. Without more work on building compliance standards around models and applications, enterprise adoption will be slower than expected.
Published some thoughts on the gaps that need to be closed as I see it. I talk about why role based permissions need to be a first order primitive, along with the need to tame hallucinations, inconsistencies and bias.
Perhaps a formal standard will emerge analogous to SOC-2?
https://t.co/XIuhkNrSUm
We love our founders who make the impossible possible. Through all the ups and downs, we are here with you and can't wait to continue building together. Thank you!