Principal R&D Manager, Lead of Hyperspace, .NET for Spark & Dev Lead for Azure Synapse Analytics at Microsoft. I love growth hacking/building large teams!
Our humble attempt at giving back to the open-source community. I am thrilled to announce that we are open-sourcing Hyperspace v0.1!!
Congratulations to the entire team!
https://t.co/GoArPaUSCW
https://t.co/wSD8m4soK2
#SparkAISummit#Microsoft#Azure
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
Databricks started later. It built a more complex architecture. It focused on unstructured data; images, documents, logs, audio. Though vast within the enterprise, this data had historically produced little insight. Too hard to process. Too messy to query. Too expensive to store in formats that mattered.
Snowflake took the opposite bet. Structured data. Clean tables. SQL queries that ran fast & returned answers executives could read. The market agreed. Snowflake went public at a $70 billion valuation. Databricks raised private rounds at half that.
Then AI arrived. Suddenly the data that was too messy to query became the data that models needed to train. Unstructured data wasn’t a liability. It was the asset.
Databricks has overtaken Snowflake in revenue. Two years ago, Snowflake led by $220 million per quarter. Today, Databricks leads by $120 million. Databricks’ growth rate is accelerating at scale, from 50% to 55% to 65% year over year. Growth rates don’t accelerate at $5 billion in revenue.
The crossover happened because AI is an architectural transition, not a feature addition.
Most enterprise data never made it into Snowflake. It sat in object storage, unstructured, waiting. Databricks built tools to use it there. No migration required.
Watch Le Xu's @happyandslow presentation of our NSDI 2021 paper on Cameo! Joint work with @luo_mai@RahulPotharaju Shiv Venkataraman.
* Cameo NSDI Talk Video (12 mins): https://t.co/4l5fI2Outt
* DPRG Youtube Channel: https://t.co/eOSfXE1qR0
* NSDI Paper: https://t.co/324wv5btpv
Move Fast and Meet Deadlines! Don't miss the cameo by Cameo, streaming tomorrow (Tuesday) at NSDI 2021's morning session! Talk by Le Xu @happyandslow. With
@luo_mai@RahulPotharaju & Shivaram Venkataraman #nsdi21
Our recent work on actor-based stream processing got into NSDI'21. This is in joint collaboration with Le Xu, Shivaram Venkataraman & @indygupta! Don't miss this awesome talk from @happyandslow on 4/13 at the 10-11:30 EST slot! #NSDI '21 @Azure@Microsoft https://t.co/4Mqk9w1kNQ
Actor-based stream processing! New NSDI 2021 paper by PhD student Le Xu @happyandslow A collaboration with Microsoft @RahulPotharaju . Presentation next Tuesday at NSDI!
Cameo Paper Preprint: https://t.co/324wv5btpv
Usenix NSDI page: https://t.co/ufGUZxc6uA
#DotNetForSpark has been updated to version 1.1.1, as have my corresponding #Docker runtime and development images.
For a full list of the different versions that are currently available, please refer to the tag list at https://t.co/squRt1B8m6
@MikeDoesBigData@RahulPotharaju
#AzureSynapse is now globally in GA! This includes #DotNetForSpark support in Spark Pools and the shared meta data experience of making #Spark tables backed by Parquet directly queryable by #SynapseSQL Serverless (two projects I PMed 🙂).
Today's Quarantine Database Speaker: Nico Bruno + Cesar Galindo-Legaria from @Microsoft will talk about the internals of the @SQLServer Cascades query optimizer. Zoom talk is open to the public at 5:00pm ET. YouTube video will be available afterwards: https://t.co/pdi0Wl4ofq
This Thursday, I’m looking forward to sharing our vision for the future of data and analytics, along with @Kevin_Johnson, Hooi Ling Tan, Emma Walmsley, and other leaders. I hope you’ll join us. https://t.co/3c24e9BCpZ
Top 3 reasons to attend the #Azure data and analytics digital event:
1. @SatyaNadella and other CEOs discussing data as a strategic asset
2. New announcements and demos
3. Live Q&A with Microsoft engineers
Register: https://t.co/6jqHjnyrPY
✨New educational materials!✨
Today I'm announcing two new, free resources I've written: an 8-lecture course on fundamentals of distributed systems, and a 30-page tutorial on elliptic curve cryptography. https://t.co/afIJbOm271
We will be presenting Hyperspace (https://t.co/GoArPaDhLo) at the #DataAISummit 2020. Don't forget to register for free here: https://t.co/K5Dk9a3sQb
The latest v0.3 brings support for incremental indexing and hybrid scan. Read all about it here: https://t.co/miuEjZ25vB
In case you missed @jeremylikness and I talk at #dotNETConf 2020 about #DotnetForSpark, you can still catch up with the recording. Thanks again to the .NET for Spark community for helping us reach the 1.0 milestone! https://t.co/Ast0RnOAj3