Bartosz Konieczny

8 months ago

⏰ Final Reminder – Delta Lake Webinar Tomorrow! Wondering if data engineering design patterns can unlock new insights into Delta Lake? Or how Delta Lake can become a key part of your streaming data architecture? Join @newfront (@bufbuild) and @waitingforcode as they tackle these questions head-on! 🗓️ Oct 14 @ 9AM PT 🎥 Live on LinkedIn, YouTube & X 📍 Reserve your spot today: https://t.co/ERn2SbT0qZ #opensource #oss #deltalake #streaming #dataengineering

DeltaLakeOSS's tweet photo. ⏰ Final Reminder – Delta Lake Webinar Tomorrow!

Wondering if data engineering design patterns can unlock new insights into Delta Lake? Or how Delta Lake can become a key part of your streaming data architecture?

Join @newfront (@bufbuild) and @waitingforcode as they tackle these questions head-on!

🗓️ Oct 14 @ 9AM PT
🎥 Live on LinkedIn, YouTube & X
📍 Reserve your spot today: https://t.co/ERn2SbT0qZ

#opensource #oss #deltalake #streaming #dataengineering

waitingforcode retweeted

Jack Vanlightly

@vanlightly

8 months ago

Why don’t Iceberg or Delta Lake have secondary indexes? Because analytics workloads and OLTP workloads optimize for opposite I/O patterns. See my dive into data layout, pruning, and what “indexing” really means in open table formats: https://t.co/beurjdS8u4

197

185

12K

waitingforcode retweeted

Shroff Publishers @shroffpub

8 months ago

Are you wondering if general concepts like data engineering design patterns can help you learn about #DeltaLake? Or, if it's possible to leverage Delta Lake within your streaming data architecture? In this webinar, Scott Haines and Bartosz Konieczny will answer these two questions. Scott, who gained streaming expertise at Yahoo, Twilio, and Nike, will share with you best practices for leveraging Delta Lake as a component of your streaming architecture. ✅ Bartosz, who recently published Data Engineering Design Patterns, will reverse-engineer a few of these design patterns to explain which Delta Lake features make everything tick. 🗓️ Tuesday, Oct 14 🕝 9AM PT Don't miss it! 🔗 Register today: https://t.co/ERn2SbT0qZ #opensource #oss #dataarchitecture #dataengineering @waitingforcode

DeltaLakeOSS's tweet photo. Are you wondering if general concepts like data engineering design patterns can help you learn about #DeltaLake? Or, if it's possible to leverage Delta Lake within your streaming data architecture?

In this webinar, Scott Haines and Bartosz Konieczny will answer these two questions. Scott, who gained streaming expertise at Yahoo, Twilio, and Nike, will share with you best practices for leveraging Delta Lake as a component of your streaming architecture. ✅

Bartosz, who recently published Data Engineering Design Patterns, will reverse-engineer a few of these design patterns to explain which Delta Lake features make everything tick.

🗓️ Tuesday, Oct 14
🕝 9AM PT

Don't miss it! 🔗 Register today: https://t.co/ERn2SbT0qZ

#opensource #oss #dataarchitecture #dataengineering @waitingforcode

650

waitingforcode retweeted

about 1 year ago

Releasing Soon! Pre-order now https://t.co/qAV65je7Du Data Engineering Design Patterns By Bartosz Konieczny @waitingforcode. with @OReillyMedia Focusing on various aspects of data engineering, including data ingestion, data quality, idempotency, and more. #dataengineering

shroffpub's tweet photo. Releasing Soon! Pre-order now https://t.co/qAV65je7Du
Data Engineering Design Patterns
By Bartosz Konieczny @waitingforcode. with @OReillyMedia
Focusing on various aspects of data engineering, including data ingestion, data quality, idempotency, and more. #dataengineering https://t.co/n9oc7vRzlq

Who to follow

Apache Spark

@ApacheSpark

Lightning-fast unified analytics engine

Reynold Xin

@rxin

Cofounder @Databricks

#DataAISummit

@Data_AI_Summit

#DataAISummit (formerly #SparkAISummit) is the global event for the data community. The conference is organized by @Databricks.

waitingforcode retweeted

Jack Vanlightly

@vanlightly

over 1 year ago

If you want to understand the consistency models of the mentioned table formats of the paper, I've written about it extensively and written formal models. * https://t.co/JE0oPUBtAt * https://t.co/1E1F9WaXJz * https://t.co/qAQF6HUSNJ * https://t.co/nxZljyLHuw

almost 2 years ago

@AdiPolak I'm not that new anymore, but "Stream Processing with Apache Flink" was my first learning resource; well structured, covering IMO the most important parts to start. Now, I'm deeply appreciating Flink Forward technical deep dives to go further 🤩

waitingforcode retweeted

Leanpub

@leanpub

almost 2 years ago

Data Engineering patterns on the cloud by Bartosz Konieczny is on sale on Leanpub! Its suggested price is $39.00; get it for $24.65 with this coupon: https://t.co/lD2ADxSXkA @waitingforcode #CloudComputing #AmazonWebServices #GoogleCloudPlatform #MicrosoftAzure

waitingforcode retweeted

about 2 years ago

Join @newfront and @waitingforcode and learn all about streaming Delta Lake tables with Apache Spark Structured Streaming! 🦀 🗓 March 21st 🕝 9:00AM PT / 12:00PM ET 💻 Join this webinar via LinkedIn, YouTube, or Zoom! Learn more: https://t.co/FYjB9Uy2Fz #deltalake #streaming

DeltaLakeOSS's tweet photo. Join @newfront and @waitingforcode and learn all about streaming Delta Lake tables with Apache Spark Structured Streaming! 🦀

🗓 March 21st
🕝 9:00AM PT / 12:00PM ET
💻 Join this webinar via LinkedIn, YouTube, or Zoom!

Learn more: https://t.co/FYjB9Uy2Fz

#deltalake #streaming https://t.co/RQYw30NzsA

waitingforcode retweeted

Jim Dowling

@jim_dowling

over 2 years ago

I have been busy the last few months writing a book for O'Reilly about how to build ML systems (batch, real-time, and LLMs), distilling much of what I have learnt from both working with customers as well as students. Why could the book interest you? * Data Scientists - transition from training models to building ML systems * ML Engineers - learn about how to build batch, real-time, LLM systems in modular parts that you compose into a ML system * Data Engineers - learn about the data transformation taxonomy for ML and how badly structured DAGs prevent reuse in ML systems * Architects - divide et impera - learn how modularity helps you build faster and better ML systems. Early access to the first chapter (52 pages) is available here: https://t.co/px4BmxCnUV

120

15K

waitingforcode retweeted

Gwen (Chen) Shapira

@gwenshap

over 2 years ago

I don't want to start a flame war here, but IMO it is a mistake to jump straight to distributed databases (and 90% of the content below is distributed databases) without first learning fundamentals on single node databases. Here's my 10 things to understand about databases: 1. Relational model. Primary keys, foreign keys, normal form. 2. SQL language. Ideally with advanced SQL (CTE, analytics) 3. ACID and how transactions work 4. Write-ahead log (or binlog) and how it is used. Especially around restarts, recovery and replication. 5. Buffer cache, disk storage layout and how they interact 6. What happens when databases start? when they shut down? 7. Indexes, cluster tables, partitions and other types of database structures. 8. Query parsing, planning and optimizing. 9. MVCC and how to deal with its quirks in your DB of choice 10. Security - authentication, authorization, encryption on wire and at rest. 11. (Bonus) Investigating performance issues and making sense of benchmarks. Entire world, stuff that 99% of developers use daily. You can be a deep expert without ever looking at distributed databases. And this also serves as strong foundation once you do. And if you use Postgres, I found this free book super helpful in making sense of things: https://t.co/cPNk493KU5

580

736

123K

waitingforcode retweeted

Leanpub

@leanpub

over 2 years ago

Data Engineering patterns on the cloud by Bartosz Konieczny is on sale on Leanpub! Its suggested price is $39.00; get it for $26.10 with this coupon: https://t.co/5xtfyLnFcZ @waitingforcode #CloudComputing #AmazonWebServices #GoogleCloudPlatform #MicrosoftAzure

906

waitingforcode retweeted

Jack Vanlightly

@vanlightly

over 2 years ago

Chapter 4 of The Architecture of Serverless Data Systems: CockroachDB (serverless). https://t.co/jm4iwBndHl

234

166

30K

waitingforcode retweeted

over 2 years ago

The early release of Delta Lake: The Definitive Guide is here! 🎉 The latest edition includes the addition of Chapter 12: Performance Tuning. Download here ➡️ https://t.co/rXMjhs4dyV Authors @dennylee, Prashanth Babu, Tristen Wentling, & @newfront #opensource #deltalake #oss

14K

waitingforcode retweeted

Leanpub

@leanpub

over 2 years ago

Data Engineering patterns on the cloud: How to solve common data engineering problems with cloud services? https://t.co/s70rgr8RRD by Bartosz Konieczny is the featured book on the Leanpub homepage! https://t.co/7B8N80e7nt @waitingforcode #CloudComputing #AmazonWebServices

731

over 2 years ago

Last week I spent some time to understand the #PySpark applyInPandasWithState. This week I'm refactoring the code, hoping to still understand it 2 months later ;) 👉 https://t.co/qja12phovZ

waitingforcode's tweet photo. Last week I spent some time to understand the #PySpark applyInPandasWithState. This week I'm refactoring the code, hoping to still understand it 2 months later ;) 👉 https://t.co/qja12phovZ https://t.co/hB7eEXWwBF

721

over 2 years ago

In the previous release #PySpark has got an interesting streaming feature -> the arbitrary stateful processing. It has a different API than the Scala version but is more adapted to the Python world. More 👉 https://t.co/KfzgtIby32

waitingforcode's tweet photo. In the previous release #PySpark has got an interesting streaming feature -> the arbitrary stateful processing. It has a different API than the Scala version but is more adapted to the Python world.
More 👉 https://t.co/KfzgtIby32 https://t.co/5PjoyDq0Dt

433

waitingforcode retweeted

Antón @antonmry

about 3 years ago

A list of articles I share again and again when developers ask me about Kafka 🧵

323

564

59K

waitingforcode retweeted

Apache Spark

@ApacheSpark

over 2 years ago

[ANNOUNCEMENT] Congrats to the Apache Spark community and all the contributors! The Apache Spark 3.5.0 release is here. Try it out! https://t.co/o8YcLnSysZ

111

14K

over 2 years ago

It's not a rebranding but more a regrouping 😉 All my additional #dataengineering content is now available from there https://t.co/5VjHc37ZsL (planning to add some stream processing materials soon)

waitingforcode's tweet photo. It's not a rebranding but more a regrouping 😉 All my additional #dataengineering content is now available from there https://t.co/5VjHc37ZsL (planning to add some stream processing materials soon) https://t.co/CrNTXqPKIn

835