Awesome keynote by @alighodsi at the @databricks summit.
Some takeaways below. TLDR - Databricks is now a data processing platform, a data platform, an agents platform and an apps platform
Architecture updates
Iceberg / table formats
- v3 is now GA: the unified data layer. Delta & Iceberg files are laid out identically on disk, no rewriting to share across formats
- v4 (targeting Q4 '26) finishes the job by unifying metadata. After that, Delta and Iceberg are effectively one format end-to-end
Lakeflow ingestion (3 modalities, all GA):
- ZeroBus: kills the need for Kafka. Hit one API endpoint with bursty/tiny-row data at high rate, no buffering, no millions of small files
- Spark Real-Time Mode: sub-10ms latency in open-source Spark, closing the gap with Flink (Spark's old micro-batch floor was ~1s)
- Lakeflow Designer: Alteryx-style drag-and-drop; talk to Genie, it generates inspectable, version-controllable Spark under the hood
- 100+ connectors now (Salesforce, Workday, NetSuite, Meta, Google Analytics, and more)
Lakebase + LTAP:
- Lakebase: open-source Postgres on the lake. Serverless autoscaling to zero, plus branching: clone a petabyte DB in <1s via copy-on-write (agents love it)
- LTAP: unifies the transactional (Lakebase) and analytical (Lakehouse) layers, a breakthrough the industry has chased for 40 years
- Reyden: new query engine hitting tens-of-ms latency
Unity AI Gateway: one pane of glass for all AI
- Single entry point for every agent, harness & model
- Commit spend to Databricks, buy tokens directly from OpenAI / Anthropic / Gemini on any cloud
- Budgets + alerts down to the individual level
- Guardrails, auditing, identity management for every agent
- Register any MCP server, authenticate once
- Free + open source (part of Unity Catalog + MLflow)
Genie: The Agents Platform, powered by Ontology
Instead of agents for-looping through your data live (slow, expensive, inaccurate), Genie Ontology runs in the background and builds a knowledge graph of your most important assets across lakehouse + Drive + SharePoint + email + many other sources. It ranks importance using "OntoRank" - basically PageRank for enterprise assets. Databricks' own instance has 4.5M ontology snippets. That context feeds four agents:
Genie One: universal interface for any business user to ask questions across all data
Genie Agents: turn any Genie conversation into a deployable autonomous agent
Genie Code: coding agent that's elite at data engineering + ML/data science
Genie Zero Ops: monitors pipelines, fixes a 2am break in an isolated branch, pings you for one-click approval
New vertical apps
Lakewatch: agentic SIEM / "security lakehouse" (also acquiring Panther Labs)
Customer Lake: agentic CDP with LLM-powered identity dedupe + one-to-one "infinity campaigns"
The ideal system for machines, for humans, and for the agents that are now both is:
strict at the boundary
congruent in what it sends
capability-scoped in what it trusts
fast to fail loudly instead of corrupting silently
and brave enough to run exactly one deliberate vulnerability loop with the node it has chosen