To all my dear #BigData colleagues!
Feel invited to submit an abstract to #BigDataTechWarsaw 2020, so that we can meet in Warsaw, talk about data and have a beer 🍺
This is the conference that I co-organise :) The CfP is open until Sep 30th.
https://t.co/khTWHhIIhC
A couple of good links, both from @DataEngWeekly:
1. Review of idempotence complete with elevator analogy: https://t.co/B6Bx0C6QeW
2. How to learn to build distributed systems: https://t.co/BcuGl52Vz8
What changed in the Big data landscape from 2013 to 2019 https://t.co/6N3ASbKldL - interesting analysis from @abbassmarouni based on articles in @DataEngWeekly
Aaand another chapter is done! 🎉 The Early Release of "Stream Processing with @ApacheFlink" was updated with a new chapter about connectors and end-to-end consistency. Only two chapters ("Setup & Configuration", "Operations") are left. I should start looking for another hobby 🤔
Data Eng Weekly 279
▪︎ Benchmarking Hive on MR3
▪︎ Event Sourcing
▪︎ Efficiently writing to a db
▪︎ Jepsen for Dgraph
▪︎ PulsarIO
▪︎ Scheduling of notebooks at Netflix
... and more!
https://t.co/FOpJydux59
Apache Airflow 1.10.0 is out ❤️🎉 !!
Highlights:
- New RBAC web interface in beta
- First class kubernetes operator
- Experimental kubernetes executor
- Timezone support
- Performance optimizations for large DAGs
- Many GCP and S3 integration improvements
- Tons of Bug Fixes
We've just released Apache Arrow 0.10.0, the biggest release yet with 4 months of work and nearly 500 issues closed. We've added 3 new languages to the project: Go, Ruby, and Rust. Read more https://t.co/i3mQp6vjMc
The @ApacheFlink community reached 10,000 issues in JIRA.
Thanks to everyone who participated!
Let's file another 10,000 feature requests/bug reports and keep the community alive!
https://t.co/4874p9TeGe
Data Eng Weekly Issue #274
Lots of stream processing coverage this week—Apache Kafka, Wallaroo, Apache Samza, WSO2, and Amazon SQS + a couple of posts on Kubernetes, db monitoring + 2 new books + a proposed data ethics checklist.
https://t.co/4BaVOKdUgb
ERA5 atmospheric data is now available on S3 as a public data set. Currently available from 2008 on-wards, all 9 petabytes dating back to 1950 will be released incrementally. https://t.co/lHnA72EfgJ
HDP 3.0 delivers new capabilities for the enterprise to enable agile application deployment, new #machinelearning /deep learning workloads, real-time database, & security and governance. Learn more about the enhancements, here: https://t.co/Y9NSPn0luo #BigData
Data Eng Weekly #273 is out. It was a tough one—so much great content to choose from.
Coverage includes Scio, make at Propublica, Paypal's NameNode analytics, MySQL on Kubernetes, Kinesis+Lambda, data replication at https://t.co/1QMJAhC6Ge,
& much more.
https://t.co/tvkAbotoxp