Dimitri Stafylarakis @undecisiveness - Twitter Profile

about 2 months ago

I wrote Deep Learning with Python to be the definitive guide to how deep learning works and how to best make use of it. Tens of thousands of people got their career start via this book. 120,000 copies sold, and downloaded by millions more. And now it's free to read online: https://t.co/3CbcQ7hmjp

82

4K

558

4K

720K

Dimitri Stafylarakis @undecisiveness

3 months ago

@markburgess_osl coffee with a fly in it?

0

1

0

7

Dimitri Stafylarakis @undecisiveness

7 months ago

@mitchellh thank you for everything you've shared with us so far 🙏

0

1

0

27

undecisiveness retweeted

Ο πρώτος του χωριού ® @1stinthevillage

over 1 year ago

Ό,τι καλύτερο έχει βγάλει το @LubenMag τον τελευταίο καιρό. Αν δεν έχετε θέματα υπέρτασης, δείτε το ολόκληρο. #28_Φλεβαρη #τεμπη_εγκλημα #Τεμπη_συγκάλυψη #Τεμπη_δικαιωση #Τεμπη #μητσοτακησ_τελος #μητσοτακη_παραιτησου

18

4K

1K

477

185K

Who to follow

Mircea Cadariu

@mcadariu

software engineer • data systems builder • code mender • flâneur • highlands explorer

Freelance Full stack web developer at @Robeco Posts about web tech stuff

undecisiveness retweeted

Andy Pavlo (@andypavlo.bsky.social) @andy_pavlo

over 1 year ago

Buckle up because we're crashing into the new year with my annual database retrospective: License change blowbacks! @databricks vs. @SnowflakeDB gangwar! @DuckDB shotgun weddings! Buying a college quarterback with database money for your new lover! https://t.co/NnFHGElFNy

18

722

161

361

88K

undecisiveness retweeted

Mitchell Hashimoto

@mitchellh

over 1 year ago

https://t.co/39Xj39wheA 👻

296

9K

1K

2K

1M

Dimitri Stafylarakis @undecisiveness

almost 2 years ago

@IanColdwater good chance it'll be a better experience, compared to some interactions I've had with human recruiters

0

12

undecisiveness retweeted

Stanislav Kozlovski

@kozlovski

about 2 years ago

Databricks spent $1-2 BILLION dollars to acquire a ~30 person company from the creators of Apache Iceberg. A revolution is going on in the Big Data space and its centering around Iceberg. 🧊 Why would Databricks spend this outrageous amount of money on such a small company? Figure out here (2-minute read) 👇 The story of Iceberg is a classic disruption story. You set out to solve a relatively niche problem and the solution ends up “accidentally” solving a larger problem. 🏆 The relatively niche problem in this case was Apache Hive. 🐝 Apache Hive was a popular query engine for big data sets, and it implicitly used a simple table format. ✋ Pause. Quick primer on the terminology used here: • 📁 file format - a format which adds additional metadata to a file to help you organize the file’s data (e.g so you can read only what you need, so you can modify and evolve the file’s contents, etc) - stuff like CSV, Parquet, ORC, Avro • 🗃️ table format - a format which adds additional metadata to a COLLECTION of files, so that again you enhance what you can do with them - e.g only read the file you need, modify the structure and organization of the files, add ACID capabilities, etc. This includes open table formats like Iceberg, Delta, Hudi and many implicit ones (e.g MySQL’s InnoDB, PostgreSQL, Snowflake) Hive had done a few things quite well: • its format was simple and easy to understand 👍 • this made it ubiquitous - Hive tables are WIDELY supported in most query engines - Hive, Spark, Presto, Flink, Pig • the gain was that the whole ecosystem could use the same at-rest data 👌 But Hive also had a few problems, succinctly: • non-atomic writes when writing to multiple partitions of the data, resulting in mishaps (deleted data, half-done jobs) 😨 • inefficient with relation to cloud object storage ☁️ • scale challenges Ryan Blue and Daniel Weeks, the creators of Iceberg, figured out that the main bottleneck was the table format itself. So, while at Netflix, they set out to create a new format that solved for these issues. The new format - Iceberg - improved on the following: 1. all changes are atomic with serializable isolation 2. support for many concurrent writers ⚡️ 3. native cloud object store support 🌤️ 4. no gotchas & surprises (e.g renaming a Parquet column in Hive breaks a ton of stuff) 5. … a lot more In classic disruption fashion, the first three improvements ended up solving a much bigger problem. 🏆 Which problem was that? The problem of Shared Database Storage. With an open table format like Iceberg, you can store your data in one single source of truth (e.g S3) and have many different engines access and modify the data at the same time. 🤯 This is the rise of the so-called headless data architecture, where the storage layer (data) is decoupled from the query layers (engines) that use it. 💡 It is the key enabler of the growing trend called zero copy. Zero Copy means that you do NOT have to spend millions in expensive cloud networking costs to copy petabytes of data to have it be used by the right processing engine – you can use the same set of data in the standardized Iceberg table format. 🧊 The two layers have always been tightly coupled because the query layer relies deeply on optimizations in the storage layer which allow for data to be fetched efficiently for faster querying. And because the table format essentially defines the storage layer, you have big dogs like Snowflake and Databricks outbidding each other for Tabular (a company founded by the Iceberg creators) and aggressively competing with each other on the table formats. 💸 Just in the last few weeks we had some major announcements: • June 3: Snowflake’s Open Source Polaris Iceberg Catalog announced • June 4: Databricks acquires Tabular • June 13: Databricks’ Unity Delta Catalog open sourced And it seems like this is just the beginning… Interested in more concise, simple content around the table format wars and the lakehouse revolution? 1. Follow me here - ✅ @kozlovski 2. Retweet this story so your network learns too. It takes 5 seconds to do, and it takes me 5 hours to write 🙏

kozlovski's tweet photo. Databricks spent $1-2 BILLION dollars to acquire a ~30 person company from the creators of Apache Iceberg.

A revolution is going on in the Big Data space and its centering around Iceberg. 🧊

Why would Databricks spend this outrageous amount of money on such a small company?

Figure out here (2-minute read) 👇

The story of Iceberg is a classic disruption story. You set out to solve a relatively niche problem and the solution ends up “accidentally” solving a larger problem. 🏆

The relatively niche problem in this case was Apache Hive. 🐝

Apache Hive was a popular query engine for big data sets, and it implicitly used a simple table format.

✋ Pause. Quick primer on the terminology used here:

• 📁 file format - a format which adds additional metadata to a file to help you organize the file’s data (e.g so you can read only what you need, so you can modify and evolve the file’s contents, etc) - stuff like CSV, Parquet, ORC, Avro

• 🗃️ table format - a format which adds additional metadata to a COLLECTION of files, so that again you enhance what you can do with them - e.g only read the file you need, modify the structure and organization of the files, add ACID capabilities, etc.

This includes open table formats like Iceberg, Delta, Hudi and many implicit ones (e.g MySQL’s InnoDB, PostgreSQL, Snowflake)

Hive had done a few things quite well:
• its format was simple and easy to understand 👍
• this made it ubiquitous - Hive tables are WIDELY supported in most query engines - Hive, Spark, Presto, Flink, Pig
• the gain was that the whole ecosystem could use the same at-rest data 👌

But Hive also had a few problems, succinctly:
• non-atomic writes when writing to multiple partitions of the data, resulting in mishaps (deleted data, half-done jobs) 😨
• inefficient with relation to cloud object storage ☁️
• scale challenges

Ryan Blue and Daniel Weeks, the creators of Iceberg, figured out that the main bottleneck was the table format itself.

So, while at Netflix, they set out to create a new format that solved for these issues.

The new format - Iceberg - improved on the following:
1. all changes are atomic with serializable isolation
2. support for many concurrent writers ⚡️
3. native cloud object store support 🌤️
4. no gotchas & surprises (e.g renaming a Parquet column in Hive breaks a ton of stuff)
5. … a lot more

In classic disruption fashion, the first three improvements ended up solving a much bigger problem. 🏆

Which problem was that?

The problem of Shared Database Storage.

With an open table format like Iceberg, you can store your data in one single source of truth (e.g S3) and have many different engines access and modify the data at the same time. 🤯

This is the rise of the so-called headless data architecture, where the storage layer (data) is decoupled from the query layers (engines) that use it. 💡

It is the key enabler of the growing trend called zero copy.

Zero Copy means that you do NOT have to spend millions in expensive cloud networking costs to copy petabytes of data to have it be used by the right processing engine – you can use the same set of data in the standardized Iceberg table format. 🧊

The two layers have always been tightly coupled because the query layer relies deeply on optimizations in the storage layer which allow for data to be fetched efficiently for faster querying.

And because the table format essentially defines the storage layer, you have big dogs like Snowflake and Databricks outbidding each other for Tabular (a company founded by the Iceberg creators) and aggressively competing with each other on the table formats. 💸

Just in the last few weeks we had some major announcements:
• June 3: Snowflake’s Open Source Polaris Iceberg Catalog announced
• June 4: Databricks acquires Tabular
• June 13: Databricks’ Unity Delta Catalog open sourced

And it seems like this is just the beginning…

Interested in more concise, simple content around the table format wars and the lakehouse revolution?

1. Follow me here - ✅ @kozlovski
2. Retweet this story so your network learns too. It takes 5 seconds to do, and it takes me 5 hours to write 🙏

20

731

144

864

121K

Dimitri Stafylarakis @undecisiveness

about 2 years ago

Would pay for AI-free services..

0

1

0

12

undecisiveness retweeted

nixCraft 🐧 @nixcraft

about 2 years ago

Google Ad from 1998.

242

17K

2K

1K

1M

undecisiveness retweeted

Curiosity

@CuriosityonX

about 2 years ago

From a million miles away, NASA captures Moon crossing face of Earth. (Yes, this is a real image) Credit: NASA/NOAA

3K

110K

8K

9K

16M

undecisiveness retweeted

Corey Quinn

@QuinnyPig

about 2 years ago

The fundamental error all of these GenAI use cases make is in assuming that people will want to read something that other people couldn't be bothered to write.

6

604

137

79

70K

undecisiveness retweeted

vx-underground

@vxunderground

over 2 years ago

The xz situation is absolutely insane and almost certainly state sponsored. This is an excellent example of a widely used software being maintained by basically one person. Read this web article and then frown and become sad. https://t.co/0nHEh7hsBY

54

5K

852

2K

770K

undecisiveness retweeted

nixCraft 🐧 @nixcraft

over 2 years ago

Freenginx: Core Nginx developer announces fork

56

5K

1K

540

381K

Dimitri Stafylarakis @undecisiveness

over 2 years ago

Christmas came early this year!

Randall Munroe

@xkcd

over 2 years ago

What If? is now on YouTube! Check out the first video for the answer to “What if we aimed the Hubble Telescope at Earth?” https://t.co/H7hLedk6CO

26

2K

226

229

137K

0

1

0

31

undecisiveness retweeted

daniel:// stenberg:// @bagder

over 2 years ago

We are cutting the release cycle short and will release curl 8.4.0 on October 11, including a fix for a severity HIGH CVE. Buckle up.

22

1K

430

129

272K

undecisiveness retweeted

Alasdair Allan @aallan

almost 3 years ago

Good morning, and say "Hello!" to Raspberry Pi 5! The #Pi5 is ×2–3 faster, has PCIe support, a new RTC, and our own silicon designed in‑house here in Cambridge The everything computer. Optimised, https://t.co/tsjQaESmiH. More at https://t.co/6PKTuSiDxq. #RaspberryPi #RaspberryPi5

aallan's tweet photo. Good morning, and say "Hello!" to Raspberry Pi 5! The #Pi5 is ×2–3 faster, has PCIe support, a new RTC, and our own silicon designed in‑house here in Cambridge The everything computer. Optimised, https://t.co/tsjQaESmiH. More at https://t.co/6PKTuSiDxq. #RaspberryPi #RaspberryPi5 https://t.co/RCcw21a6FK

174

7K

1K

425

1M

undecisiveness retweeted

Alex Kaplan

@alexkaplan0

almost 3 years ago

Today might have seen the biggest physics discovery of my lifetime. I don't think people fully grasp the implications of an ambient temperature / pressure superconductor. Here's how it could totally change our lives.

2K

115K

19K

36K

31M

Dimitri Stafylarakis @undecisiveness

almost 3 years ago

@rmaranhao @elonmusk x your reply

0

1

0

32

Dimitri Stafylarakis @undecisiveness

almost 3 years ago

@kelseyhightower Kubernetes from source?

0

77

Dimitri Stafylarakis

@undecisiveness

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users