Delta Sharing became one of the most popular ways to exchange data thanks to its open cross-platform nature. We’ve now expanded it to also support any Iceberg client and to share AI assets like agent skills and unstructured data. It needed a new name, so welcome OpenSharing!
This is a critical post to read if you’re building an applied AI company right now.
“An application earns its place in the untrainable corner by doing unglamorous work: arranging a company's private reality so a model can act on it, handing the model the tools to act, working with the customer to change the reality of its workforce. A company that brings the translation is tough to copy – and the translation never ends. Integration and maintenance run as long as the relationship does, won by teams that put domain-specialized engineers and tools next to the customer.”
There’s still an insanely large gulf between model capabilities and what it takes to apply them to specific corporate workflows. Some of that is technology that needs to be built, a lot is access to (and formatting of) the right data to work with, and a ton more is on the change management and specific implementation work (FDEs, etc.) it takes to make AI work in any specific corporate setting.
2 things can be very true at once: frontier models and labs will continue to grow an incredible amount, and there will be a vast ecosystem of software and services companies that emerge to bring the power of these models to real enterprises. This makes room for new infrastructure provides, applied AI companies in every vertical, new versions of system integrators, and more players.
Incredibly exciting time on all fronts.
Cool work from the Databricks Model Serving team and Superhuman, to scale their custom LLM serving to 200K QPS with sub-second P99 latency! Here's how our teams got a +60% throughput gain vs the previous engine to serve over 40 million daily users. https://t.co/csjPLQzQ7C
There’s a ton of interest in custom model tuning as agents reach production and scale up. Here is how we made Databricks Knowledge Assistant 3x faster using our new Instructed Retriever model trained end-to-end to do parallel test-time compute. It’s rolling out to customers now!
one of the quotes i find most inspiring on a hard day:
"Whatever your hand finds to do, do it with all your might, for in the realm of the dead, where you are going, there is neither working nor planning nor knowledge nor wisdom"
Ecclesiastes 9:10
We have moved on to entirely new moral panics, such as [squints, checks notes] water consumption in datacenters. And in a few years (or months, or weeks, or days), that will be completely forgotten too.
.@satyanadella just put the whole "water" debate to rest.
Datacenters run on a closed loop cooling system, the water usage of a datacenter for an entire year is roughly equivalent to a usage of 1 restaurant!
"You can outsource thinking, but not understanding."
I still find writing toy code one of the best ways to build real understanding. It catches the nuances that skimming code and explanations lets you skip.
So I wrote nanoRL (nanoGPT, but for post-training).
SFT, DPO, GRPO, PPO: four single files, ~150 lines each, converging on a toy task in ~30 steps on a MacBook. Readable end-to-end.
Then I continue RL Qwen2.5-0.5B-Instruct on GSM8K with this toy code + autoresearch. Interestingly, the accuracy improves tho it's a trained model.
Legal AI superempowers normal individuals with no legal background to fight big institutions in bureaucracies and in courts on a level knowledge/skill playing field, for the first time in human history. As such, it is one of the most inspiring applications of AI.
i’m increasingly convinced that the best agent evals will come from mining real agent failure traces. my view is that every failed trace contains a potential eval but not in its raw form. raw traces are messy, long and too specific. the research problem is to distill them into clean reproducible tests. the pipeline i’m interested in is (which i'm currently working on):
failure trace → failure attribution → earliest divergence point → minimal reproducible state → targeted eval → regression suite
this turns trace data from passive observability into an active improvement loop. like can we extract the exact decision point where the agent should have behaved differently? and can we convert that into an eval that catches the same failure class in the future? i guess this matters because most agent failures are trajectory-level failures and not just output-level failures.
personally i think this is much more realistic than relying only on hand-written benchmarks (imo they should look more like failure memory systems). hand-written evals encode what we think agents will fail on. traces encode what agents actually failed on. also once you have the mechanism, you can mutate the trace into variants. that is basically fuzzing for agents.
i have seen enough proof now that using a coding agent is a deep skill
it's confusing because the people you see heavily using them produce horrible results
but that's because it's a skill! you can get better and the ceiling seems pretty high - this is very exciting to me