My @InfoQ talk 🎙️ on the "Future of Data Engineering" is up! I cover the six stages of data pipeline maturity:
0. None
1. Batch
2. Realtime
3. Integration
4. Automation
5. Decentralization
Check it out! 👀
(I'm so sorry for the link picture)
https://t.co/1CiJQB0Zou
The Apache Beam community is pleased to announce that a new @ApacheSpark runner based on Spark Structured Streaming framework is available on master for testing ! See https://t.co/tRpzahfCsJ for current set of features.
Big thanks to Keunsoo Park, Ilma Janutyte and @KenTTallakstad for sharing their experience with the data engineerig community in Oslo 👏👏👏 #databricks#apachespark#deltalake notebook here:
https://t.co/C7OpAATBHK
BigQuery has launched beta support for querying Parquet and ORC file formats in Cloud Storage, joining other federated querying capabilities. Learn more on @GCPcloud's commitment to building an open and accessible data warehouse https://t.co/q2cHQlrmAh
Splunk has agreed to acquire Streamlio, major contributors to the Apache Pulsar project. Congrats to the the @streamlio team! #ApachePulsar#gettingnoticed
https://t.co/OOAF0lI46B
1/ Post-Map/Reduce (second generation) data processing systems (Spark, Flink, Dataflow, Samza) have been about unifying batch and streaming.
@confluentinc (with Kafka streams, KSQL) is focused on unifying streaming and databases.
🎉 The wait is over! TensorFlow 2.0 is finally here.
Driven by community feedback, this release provides a complete set of tools for developers, enterprises, and researchers to easily build ML applications.
Read the blog ↓ https://t.co/eUKQVZ4HmS
Streaming data is the new database, as every company becomes software. @jaykreps, founder and CEO of @dcvc portfolio co @confluentinc, talks about fundamental changes to the database to support this new data ecosystem: https://t.co/pVQhSdmrZ8
This was a fascinating exploration of how the MinIO team have managed to innovate on object storage, while maintaining compatibility with the full S3 API, the importance of a standardized interface for multi-cloud workloads, and using it for ML workloads. https://t.co/bTwCwATCZp
If you are interested in learning about Data Mesh and happen to be in Berlin join my keynote and hands on tutorial at #OReillySACon@OReillySACon#DataMesh https://t.co/H5LIMQTIDN and https://t.co/ePsf1YmGUe
Great news: @Google just open sourced its #Kubernetes Operator for Apache Flink! Check Out the Github repo for information on installation and how to contribute to the project: https://t.co/QJ5r44XXQ7 #ACNA19#BeamSummit