Ever wondered why Apache Calcite is so popular among database folks? Because it combines great community and balanced technological decisions. We discuss the mechanics of Apache Calcite success in our new blog post: https://t.co/q8xXpzjywK
Apache Calcite is the most popular query optimization framework and a prominent member of the "composable data systems" movement. Our new blog post analyzes the drivers of Calcite's success and how we can use this knowledge to push innovation further.
https://t.co/Z0YMsUeYGn
Subquery unnesting is absolutely critical for distributed query engines, since they usually doesn’t support nested-loop joins. But what is more interesting, is that it generic unnesting may offer some new features, such as efficient UDFs, as described in https://t.co/mlnDZe2hws
Surprised to learn that there is a general solution for unnesting SQL subqueries — any SQL query that has subqueries can be automatically rewritten by an optimizer into a query w/o subqueries to improve performance.
Amazing finding and seems like this should have a big impact on databases!
Love this clip from Jensen Huang (Nvidia CEO) on why innovation requires failure.
"Unless you have a tolerance for failure, you will never experiment. If you don't experiment you won't innovate, and if you don't innovate you won't succeed."
To scan or not to scan, that is the question. Especially when reading large data sets. Our new article is about dynamic filtering — a critical optimization in analytical engines that allows you to read less data when doing joins. Enjoy!
How to avoid the full scan when joining a large fact table without predicates with a small dimension? Read our new blog post about dynamic filtering, a must-have optimization that skyrockets your analytical engine performance. We use Trino as an example. https://t.co/bTanQof2vL
Today I learned that in most office apps you can add a dash by typing two hyphens. Before that, I used to open the "Dash" Wikipedia article and copy-paste the dash character from there. What a productivity boost!
@ClickHouseDB Hey folks. How do you generate that image? We have a similar problem - many concurrent tasks with some unfortunate blocking, and we would like to build a similar visualization from stats data.
Hey folks. Have you ever wondered how the DISTINCT keyword is actually processed by query engines? Our new blog post explores how Apache Calcite and Trino optimizers rewrite distinct aggregations. TLDR: Calcite rewrites them joins, and Trino rewrites them to window functions.
Aggregation is one of the most frequently encountered operations in analytics. Our new blog post discusses how Apache Calcite and Trino optimizers deal with distinct aggregations and why you may need joins and window functions here. https://t.co/OhxMeAxDuL
One data architecture I expect we'll see more in 2023 is #SQLite/#DuckDB deployed as caches at the edge, updated via change feeds from system-of-record: stellar read performance due to close local proximity to users and fully queryable data models tailored for specific use cases.
There have been several calls to build applications with transformed data from the warehouse. I think this is directionally correct: data is not impactful until it's activated and I genuinely believe that the creative use of data/models will unlock better user experiences.
Top free writing tools I use:
1. Grammarly— Correct mistakes.
2. Hemingway App—Make your writing bold and clear.
3. Quillbot— Rewrite and enhance any sentence.
4. Coschedule Headline Analyzer—Sharpen headlines to drive maximum traffic.
Get 10X writing help.
A job title I am seeing more and more often:
Product (Software) Engineer / Product Developer
Startups especially are starting to use it more, to indicate that software engineers are directly impacting the product, and look for (and empower) product-minded engineers.
Listened to 5 bootstrapped B2C tech startups with 10-30 engineers each sharing details on how their business evolved. Common themes:
1. Infra costs. A huge focus. Almost all use @Cloudflare to reduce bills.
2. "We use pragmatic, proven tech, and are cautious with new stuff."
Чем больше вы хотите успевать, тем более эффективны вы должны быть в единицу времени. Открытие этого года для меня - информационная гигиена. Удалить ненужные приложения, отписаться от негатива, заблокировать мудаков. Очень жалею, что дорос до этого только сейчас.
One somewhat unfortunate thing about blogging culture is that (unlike journals) there's no strong norm of explaining where your work sits with respect to existing human understanding.
This is an enormous advantage journals have in pushing forward understanding.