Heneli @henelidotdev - Twitter Profile

12 days ago

This is generally right, but in practice it's important to differentiate between point reads (get this key) and scans (get this range of keys). SQL-like workloads have a mix of both, and their proportion changes the trade-offs here substantially. Let's go back to the 70s!

7

302

17

280

31K

henelidotdev retweeted

Felipe O. Carvalho @_Felipe

17 days ago

Amethyst: Adaptive Compaction for LSM Trees via Segment-Level Policy Selection

2

35

5

34

2K

henelidotdev retweeted

Ian Cook @ianmcook

26 days ago

📣 Introducing databow: one command-line tool to query them all, built with Rust and ADBC. Query any SQL source that has an ADBC driver (30+ and counting) right from your terminal with one simple CLI. To install 👉 uv tool install databow Link to announcement in the comments👇

ianmcook's tweet photo. 📣 Introducing databow: one command-line tool to query them all, built with Rust and ADBC.

Query any SQL source that has an ADBC driver (30+ and counting) right from your terminal with one simple CLI.

To install 👉 uv tool install databow

Link to announcement in the comments👇 https://t.co/y2XZEpf7Q2

2

42

5

29

3K

henelidotdev retweeted

Felipe O. Carvalho @_Felipe

27 days ago

dbt Fusion is the Python-to-Rust rewrite that started at SDF Labs before it was acquired by dbt Labs. After a huge refactoring effort in the last months, we are announcing dbt Core 2.0 – the open-source slice of the Fusion codebase that is continously updated w/ Copybara.

4

70

6

23

11K

Who to follow

Nikita Volkov

@NikitaYVolkov

Consultant in Haskell, Rust and software design. Author of "pGenie", "hasql" and "refined". More at https://t.co/EERSJsDsX9.

Patrick Mineault

@patrickmineault

NeuroAI researcher @ Amaranth Foundation, safety, open science. Previously engineer @ Google, Meta, Mila.

mitchellsalad

@mitchellsalad

mitchell @unisonweb formerly @SimSpaceCorp @sentenai

henelidotdev retweeted

Felipe O. Carvalho @_Felipe

29 days ago

A pre-condition that can make a lexer/parser run much faster is assuming that the input is already validated and the job of the parser is just extracting the contents from the input.

4

12

2

5

1K

henelidotdev retweeted

Andrew Lamb @andrewlamb1111

about 1 month ago

I just had the chance to watch Samyak Sarnayak's talk about cancellation safety and async Rust (and how a `&mut` can lead to a deadlock). If this is a topic that interests you, I recommend checking it out: https://t.co/TX85yIgNgE

andrewlamb1111's tweet photo. I just had the chance to watch Samyak Sarnayak's talk about cancellation safety and async Rust (and how a `&mut` can lead to a deadlock). If this is a topic that interests you, I recommend checking it out: https://t.co/TX85yIgNgE https://t.co/trVjNz9PNE

1

85

13

47

4K

henelidotdev retweeted

Cloudflare @Cloudflare

about 2 months ago

When a partitioning change to our petabyte-scale ClickHouse cluster caused critical billing jobs to stall, standard metrics showed no obvious errors. Here's how we identified severe lock contention in ClickHouse's query planner and built upstream patches to fix it. https://t.co/C4UF6RJTp6

4

163

21

81

46K

henelidotdev retweeted

Simon Eskildsen

@Sirupsen

about 2 months ago

get-if-not-match is important for building fast databases on object storage. used in e.g. tpuf for the WAL check to make sure the cache has the latest data. of the big 3 (Azure/S3/GCS), it may surprise many that Azure comes out the winner! (S3X is S3 one-zone, GCR is GCP's equivalent)

Sirupsen's tweet photo. get-if-not-match is important for building fast databases on object storage. used in e.g. tpuf for the WAL check to make sure the cache has the latest data.

of the big 3 (Azure/S3/GCS), it may surprise many that Azure comes out the winner!

(S3X is S3 one-zone, GCR is GCP's equivalent)

9

138

6

61

32K

henelidotdev retweeted

Andy Pavlo (@andypavlo.bsky.social) @andy_pavlo

2 months ago

The founders of FloeDB (@markcusack + Kurt Westerfeld) gave an interesting talk with @CMUDB about their new @ApacheIceberg-compatible query engine. Two key takeaways from their talk: 1⃣ Floe is a hard fork of @YellowbrickData. 2⃣ Floe is building a "catalog-of-catalogs" https://t.co/BzovMq4AVP

1

128

12

66

17K

henelidotdev retweeted

Marc Brooker @MarcJBrooker

3 months ago

At AWS we're big Rust users. Lambda, DSQL, S3, EC2, Bedrock, and many more run Rust code. Dial9 is a new tool, built at AWS, for diving deep into the performance of tokio-based applications. Good work, Russel!

MarcJBrooker's tweet photo. At AWS we're big Rust users. Lambda, DSQL, S3, EC2, Bedrock, and many more run Rust code.

Dial9 is a new tool, built at AWS, for diving deep into the performance of tokio-based applications.

Good work, Russel! https://t.co/4bvbXZ57gc

18

745

63

310

56K

henelidotdev retweeted

Richard Artoul

@richardartoul

4 months ago

load balancing long-lived connections is so much harder than load balancing small requests

3

38

2

8

4K

henelidotdev retweeted

Chris @criccomini

4 months ago

The 2nd edition of Designing Data-Intensive Applications, by @martinkl and me, is finished and sent to the printers! Ebooks available next week, and print books in 3–4 weeks. Sigh of relief. 😅 (BTW, this is a good opportunity to support your favourite local bookshop!)

criccomini's tweet photo. The 2nd edition of Designing Data-Intensive Applications, by @martinkl and me, is finished and sent to the printers! Ebooks available next week, and print books in 3–4 weeks. Sigh of relief. 😅

(BTW, this is a good opportunity to support your favourite local bookshop!) https://t.co/hEgFUCblbU

106

4K

438

2K

410K

henelidotdev retweeted

DuckDB

@duckdb

4 months ago

🦆 ↔️ 🦀 DuckDB Labs is looking for a Rust engineer to join our team in Amsterdam. 📝 See the details and application page at https://t.co/HUBZOnX4bm

6

190

40

43

20K

henelidotdev retweeted

Julian Hyde @julianhyde

4 months ago

Much of the credit should go to sqlglot’s test suite. For projects of this type, the test suite is the “source” and the code can be generated pretty much any way you like.

3

59

3

14

6K

henelidotdev retweeted

RisingWave

@RisingWaveLabs

4 months ago

Ever wondered how an engine actually reads an Iceberg table? Iceberg read path in one line: Catalog → Metadata → Manifest list → Manifest files → Data files Apache Iceberg Read Path (Engine → Table) When an engine reads an Iceberg table, it walks this chain from top to bottom: 1) Catalog The starting point. Stores a pointer to the table’s current metadata file, which represents the latest snapshot reference. 2) Metadata File Defines the table schema, lists snapshots, and references the manifest list for the snapshot being read. 3) Manifest List Tracks all manifest files associated with the selected snapshot. 4) Manifest Files Contain metadata about data files, including partition values and file-level statistics, which help determine which files should be read. 5) Data Files The actual table data is stored in object storage. This is what the engine ultimately reads. Why this matters During reads, Iceberg resolves the snapshot through the catalog and metadata layers, then uses manifest metadata to identify the exact set of data files for that snapshot.

RisingWaveLabs's tweet photo. Ever wondered how an engine actually reads an Iceberg table?

Iceberg read path in one line:
Catalog → Metadata → Manifest list → Manifest files → Data files

Apache Iceberg Read Path (Engine → Table)
When an engine reads an Iceberg table, it walks this chain from top to bottom:

1) Catalog
The starting point.
Stores a pointer to the table’s current metadata file, which represents the latest snapshot reference.

2) Metadata File
Defines the table schema, lists snapshots, and references the manifest list for the snapshot being read.

3) Manifest List
Tracks all manifest files associated with the selected snapshot.

4) Manifest Files
Contain metadata about data files, including partition values and file-level statistics, which help determine which files should be read.

5) Data Files
The actual table data is stored in object storage. This is what the engine ultimately reads.

Why this matters
During reads, Iceberg resolves the snapshot through the catalog and metadata layers, then uses manifest metadata to identify the exact set of data files for that snapshot.

0

5

2

1

432

henelidotdev retweeted

Felipe O. Carvalho @_Felipe

4 months ago

Using Cap’n Proto in Rust to get zero-copy deserialization, but Codex insists on writing de/serialization functions that build a struct full of heap-allocated strings. Because that’s what dominates the training set of model and programmers everywhere.

14

247

7

45

21K

henelidotdev retweeted

Chris Allen

@theodorvaryag

4 months ago

SIMD can produce insane yields but it's worth bearing in mind that at least some of the yield isn't from the SIMD instructions, it's from disciplining the programmer into writing branchless functions and pipelines the SIMD is the cheese at the end of the bit-hacking maze

13

749

38

174

37K

henelidotdev retweeted

Felipe O. Carvalho @_Felipe

4 months ago

@fredine "hand-written" recursive descent parser on a Vec<Token>. 40k lines of Rust as expected for a language as complex as SQL.

0

9

2

0

506

henelidotdev retweeted

Tobias Müller @TobiM

4 months ago

Introducing polyglot - A Rust SQL transpiler for more than 30 SQL dialects. It has 100% coverage for sqlglot‘s test fixtures. https://t.co/0EMZRNyh2v

13

460

49

296

42K

henelidotdev retweeted

Andrew Lamb @andrewlamb1111

5 months ago

A somewhat academic talk about the AI usecases driving changes in @ApacheParquet and new formats in "Column Storage for the AI Era" Recording: https://t.co/f4HxgyMZcb Slides: https://t.co/sq3auKoojo

andrewlamb1111's tweet photo. A somewhat academic talk about the AI usecases driving changes in @ApacheParquet and new formats in "Column Storage for the AI Era"

Recording: https://t.co/f4HxgyMZcb
Slides: https://t.co/sq3auKoojo https://t.co/5xxmnBkyoW

1

143

22

96

9K

Heneli

@henelidotdev

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users