At AGU I talked to NASA people about how agencies could better support open-source tools they rely on. I argued that our recent collaboration between Xarray and NASA ESDIS on xarray.DataTree was a good model to copy - read about how it happened here!
https://t.co/uTJ6VFLjHD
We're moving over to BlueSky and LinkedIn for all our future announcements. Follow us at https://t.co/CfejMIiPuH to find out more about tomorrow's showcase 😉 (p.s., it's on Xpublish at Scale at 4 PM EST 🚀) Connect with us on LinkedIn at https://t.co/b0duHratPH
🌤️ #AMS2025 is just around the corner! We are taking AMS by storm with an exhibitor booth (booth 353), two talks from @_jhamman and @rabernat , and hosting a @pangeo_data Community Happy Hour (register here: https://t.co/7CujhX87AC)!
At AGU I talked to NASA people about how agencies could better support open-source tools they rely on. I argued that our recent collaboration between Xarray and NASA ESDIS on xarray.DataTree was a good model to copy - read about how it happened here!
https://t.co/uTJ6VFLjHD
@alekpetty I'm hoping that virtual zarr datasets will make it easier to cloud-optimize data that was dumped in a bucket in a legacy format, and allow creating aggregated datasets with relevant derived information alongside it.
https://t.co/x39eXZ82Zf
Completely agree - "in theory" we have the simple scalability of the cloud, but in practice it's often a headache, for no good reason, which prevents adoption by most users (including many scientists)
New Post: Cloud Computing is Broken
https://t.co/Ode3eXkGFO
Investor asks: "What's next for Data/Cloud Infrastructure?"
My answer: "Boring stuff. People struggle with basics."
Cloud feels like MP3 players before iPod. In theory everything is good. In practice adoption is low
@alekpetty Makes total sense.
On (1) and (2) some intermediate services (e.g. Coiled, Modal) would like to sell you the solution to this, but it's annoying that NASA + AWS can't just get it right first time
On (3) - is your data in the cloud at least? If not in cloud-optimized format?
@mouthofmorrison@rabernat@betolink@EarthmoverHQ @steadyflux That said, it isn't 100% clear that NASA's best move is to immediately convert 10000+ data sets into cutting edge ARCO formats. Kerchunk and Virtual Zarr offer benefits of ARCO while keeping data in the native formats.
Are you heading to #AGU24 next month? Consider joining us for a bonus day of hacking on @pangeo_data. I'll be there representing @EarthmoverHQ and helping folks work with #icechunk and @zarr_dev.
Details and signup here: https://t.co/kgUuokUo3k
We've talked a lot about #Icechunk's performance this week 🚀. But the Zarr-Python 3 results are also very encouraging! We're a few weeks away from the 3.0 launch but what this chart shows is that the new AsyncIO + multi-threading functionality in Zarr is going to be really good.
All these integrations represent literally years-worth of effort, all coming out at once 🤯
And that's not even mentioning all the other changes you see in a typical xarray release!
ALSO this release is the first to be compatible with the much anticipated v3 implementation of zarr-python! (still on its beta branch right now)
This brings big performance benefits when reading @zarr_dev on S3 via async and (b) compatibility with @EarthmoverHQ 's Icechunk.