My old Mac mini (mid 2010) showed white screen of death today. RIP my little friend, you served me well for many years. I guess you don’t mind if I reuse some parts of you in other projects.
Caved and got o1 Pro.
I asked it my favorite question to ask these models, and it gave much more interesting answers than either o1 preview or Claude 3.5 Sonnet.
They're not quite as mind-bending as I'd want -- I've seen a version of all of them except #7 -- but still better:
The CSS Working Group has just published the first public draft of the CSS Values and Units Level 5 Module. Although still in its early stages, this release introduces many new features and exciting improvements.
Here's a summary of some of the changes.
Revox - New Airdrop on Binance Web3 Wallet
Steps:👇
- Open Binance > Web3 Wallet
- Click Discover tab > Revox
- Connect wallet
- Enter invite code: C0V5FC
- Do all tasks, collect passes (7 days left)
You can receive up to $30 equivalent.
Databricks spent $1-2 BILLION dollars to acquire a ~30 person company from the creators of Apache Iceberg.
A revolution is going on in the Big Data space and its centering around Iceberg. 🧊
Why would Databricks spend this outrageous amount of money on such a small company?
Figure out here (2-minute read) 👇
The story of Iceberg is a classic disruption story. You set out to solve a relatively niche problem and the solution ends up “accidentally” solving a larger problem. 🏆
The relatively niche problem in this case was Apache Hive. 🐝
Apache Hive was a popular query engine for big data sets, and it implicitly used a simple table format.
✋ Pause. Quick primer on the terminology used here:
• 📁 file format - a format which adds additional metadata to a file to help you organize the file’s data (e.g so you can read only what you need, so you can modify and evolve the file’s contents, etc) - stuff like CSV, Parquet, ORC, Avro
• 🗃️ table format - a format which adds additional metadata to a COLLECTION of files, so that again you enhance what you can do with them - e.g only read the file you need, modify the structure and organization of the files, add ACID capabilities, etc.
This includes open table formats like Iceberg, Delta, Hudi and many implicit ones (e.g MySQL’s InnoDB, PostgreSQL, Snowflake)
Hive had done a few things quite well:
• its format was simple and easy to understand 👍
• this made it ubiquitous - Hive tables are WIDELY supported in most query engines - Hive, Spark, Presto, Flink, Pig
• the gain was that the whole ecosystem could use the same at-rest data 👌
But Hive also had a few problems, succinctly:
• non-atomic writes when writing to multiple partitions of the data, resulting in mishaps (deleted data, half-done jobs) 😨
• inefficient with relation to cloud object storage ☁️
• scale challenges
Ryan Blue and Daniel Weeks, the creators of Iceberg, figured out that the main bottleneck was the table format itself.
So, while at Netflix, they set out to create a new format that solved for these issues.
The new format - Iceberg - improved on the following:
1. all changes are atomic with serializable isolation
2. support for many concurrent writers ⚡️
3. native cloud object store support 🌤️
4. no gotchas & surprises (e.g renaming a Parquet column in Hive breaks a ton of stuff)
5. … a lot more
In classic disruption fashion, the first three improvements ended up solving a much bigger problem. 🏆
Which problem was that?
The problem of Shared Database Storage.
With an open table format like Iceberg, you can store your data in one single source of truth (e.g S3) and have many different engines access and modify the data at the same time. 🤯
This is the rise of the so-called headless data architecture, where the storage layer (data) is decoupled from the query layers (engines) that use it. 💡
It is the key enabler of the growing trend called zero copy.
Zero Copy means that you do NOT have to spend millions in expensive cloud networking costs to copy petabytes of data to have it be used by the right processing engine – you can use the same set of data in the standardized Iceberg table format. 🧊
The two layers have always been tightly coupled because the query layer relies deeply on optimizations in the storage layer which allow for data to be fetched efficiently for faster querying.
And because the table format essentially defines the storage layer, you have big dogs like Snowflake and Databricks outbidding each other for Tabular (a company founded by the Iceberg creators) and aggressively competing with each other on the table formats. 💸
Just in the last few weeks we had some major announcements:
• June 3: Snowflake’s Open Source Polaris Iceberg Catalog announced
• June 4: Databricks acquires Tabular
• June 13: Databricks’ Unity Delta Catalog open sourced
And it seems like this is just the beginning…
Interested in more concise, simple content around the table format wars and the lakehouse revolution?
1. Follow me here - ✅ @kozlovski
2. Retweet this story so your network learns too. It takes 5 seconds to do, and it takes me 5 hours to write 🙏
If you have a Mac and don't use its GPU, I'm about to change your life.
I recorded a video with step-by-step instructions to show you how to run PyTorch, TensorFlow, and JAX on metal.
Spoiler alert: It's fast!
https://t.co/1KvzKGNI2l
Перше велике інтервʼю у цьому році. Воно більшості не сподобається. Але хтось та й подякує… — пора починати говорити і жити реальністю — неприємною, сумною реальністю — ми не виграємо цю війну.
Але ми ще її не програли. Ще є час, ще є люди, ще можна зібратися і можна все змінити. Ще не пізно.
Цей рік буде вирішальним. І те, як усі ми, усе суспільство, повернеться обличчям до реальності — саме це і визначить все подальше: кордони, умови, втрати, майбутню економіку і загалом — майбутнє. І чи воно взагалі буде.
Час ще є. Люди ще є. Мотивація ще є.
Давайте нарешті обʼєднуватися навколо нашого спільного майбутнього. Бо якщо ні — то нашо це все взагалі?
Відео тут👇
https://t.co/AMlPMSeUNO
Перше велике інтервʼю у цьому році. Воно більшості не сподобається. Але хтось та й подякує… — пора починати говорити і жити реальністю — неприємною, сумною реальністю — ми не виграємо цю війну.
Але ми ще її не програли. Ще є час, ще є люди, ще можна зібратися і можна все змінити. Ще не пізно.
Цей рік буде вирішальним. І те, як усі ми, усе суспільство, повернеться обличчям до реальності — саме це і визначить все подальше: кордони, умови, втрати, майбутню економіку і загалом — майбутнє. І чи воно взагалі буде.
Час ще є. Люди ще є. Мотивація ще є.
Давайте нарешті обʼєднуватися навколо нашого спільного майбутнього. Бо якщо ні — то нашо це все взагалі?
Відео тут👇
https://t.co/AMlPMSeUNO
Rule for choosing between a URL path and search param: Search params are optional.
So, if it’s required to render the page make it part of the URL's path.
If it’s optional (you can fall back to a default), make it a search param.