Jack Vanlightly

Verified account

@vanlightly

Likes breaking ideas and systems, writing, picking systems apart @confluentinc Ex @Splunk, @VMware ESO/B. Tafreshi

Barcelona, Spain

Joined November 2016

235 Following

4.7K Followers

1.6K Posts

Jack Vanlightly

about 10 hours ago

New blog post. Broker-Visible vs Client-Local Parallelism In my benchmarking of share groups I've been focusing on parallel processing because it turns out that a few settings materially impact how effectively messages are distributed across consumers. But share groups weren't created solely for the purpose of escaping the confines of the partition as the unit of parallelism. Share groups exist to add queue semantics, which naturally leads to the consumer as the unit of parallelism instead of the partition. But if you are considering share groups solely for escaping the confines of the partition, then this post might be worth a read. https://t.co/0fp58ol1Oa

0

13

1

15

709

Jack Vanlightly

6 days ago

@Pipeline_papi That's how it starts. Shock, OMG this is good. Then you settle into the day to day and that find you, a human, is still very much needed

0

5

0

1

236

vanlightly retweeted

Robin Moffatt 🍻🏃🥓

6 days ago

It May be time … for the May edition of Interesting Links in the Data & AI World! https://t.co/9EV2hBrLm7 Thanks to the folk continuing to publish great content that I have the pleasure to link to, including @vanlightly, @VictorRentea, @MichelTricot , @jthandy, @marklit82, @DataMozart, @pdrmnvd, @AMdatalakehouse, @teivah, @jamessewell, @BenjDicken, @shanselman, @croloris, @MikeMcQuaid, and many more :)

rmoff's tweet photo. It May be time … for the May edition of Interesting Links in the Data & AI World!

https://t.co/9EV2hBrLm7

Thanks to the folk continuing to publish great content that I have the pleasure to link to, including @vanlightly, @VictorRentea, @MichelTricot , @jthandy, @marklit82, @DataMozart, @pdrmnvd, @AMdatalakehouse, @teivah, @jamessewell, @BenjDicken, @shanselman, @croloris, @MikeMcQuaid, and many more :)

0

9

2

5

995

Jack Vanlightly

8 days ago

This guy saw it coming 3 years ago https://t.co/ygQq4DyV1B

0

4

1

2

328

Who to follow

Distributed and Storage Systems. Apache Cassandra Committer and PMC member. Author of Database Internals @therealdatabass. Discord: https://t.co/8LwhZom9eQ

Distributed Bytes

@DistribSystems

I tweet/retweet interesting stuff about #DistributedSystems and #compsci. Suggest links/papers/conversations via DM! Tag for retweets. Run by @federico_ponzi

Murat Demirbas (Distributolog)

MongoDB Research: Distributed Systems, Databases, Formal Methods ex-AWS, ex-Prof SUNY Buffalo. Blog at: https://t.co/IqZPhhtcCC Opinions my own

vanlightly retweeted

Armin Ronacher ⇌

8 days ago

This is such a good post. https://t.co/IdmAnh18Nt

mitsuhiko's tweet photo. This is such a good post. https://t.co/IdmAnh18Nt https://t.co/kGVBOwRneQ

82

3K

425

799

100K

Jack Vanlightly

8 days ago

I see a slight improvement

vanlightly's tweet photo. I see a slight improvement https://t.co/SLOwMbf4AW

8 days ago

AGI is closer than you think.

Grady_Booch's tweet photo. AGI is closer than you think. https://t.co/GYC85XwI1r

84

1K

98

45

173K

0

3

0

1

820

Jack Vanlightly

8 days ago

New post: Kafka Share Groups and Parallelizing Consumption - Part 2. Kafka Share Groups make parallel consumption possible within a partition, but tuning them isn’t just about consumer count or max.poll.records (part 1). In Part 2, I look at how producer batch size, consumer share.acquire.mode and max.poll.records interact to impact the effective parallelism of a workload. We'll take the final workload from part 1 but double the batch size, dropping the target consume rate from 60K msg/s to 37K, then look at options to fix the consumer parallelism. https://t.co/9NUmorqT1D

1

27

1

24

1K

Jack Vanlightly

10 days ago

New post: Kafka Share Groups and Parallelizing Consumption - Part 1. We introduce consumer processing time to the benchmarks to reveal the underlying mechanisms that determine the effective parallelism of share group consumers. With 300 consumers and 5ms processing time, the theoretical max was 60K msg/s. With the defaults, I got 4.8K msg/s. Read to understand about the importance of max.poll.records and how a bad value can still look ok, until suddenly it doesn't. https://t.co/8TuTEnLrcH

1

39

8

31

4K

Jack Vanlightly

13 days ago

I just published the first set of Dimster benchmarks comparing Apache Kafka consumer groups and share groups. This isn’t really the interesting part yet. The goal here was just to establish a baseline: * raw overhead * raw scaling * latency behavior under non-stressful loads Basically: what does share group coordination cost? In the next post after this we'll get to the interesting stuff, testing the workloads that share groups were designed for. https://t.co/pxF4CsPFQN

0

9

2

9

2K

Jack Vanlightly

15 days ago

Link to the repo: https://t.co/IE7cPl1PVg

0

0

0

0

455

Jack Vanlightly

15 days ago

I've been performance testing Apache Kafka extensively over the last couple of months, using a new performance tool I've developed inside Confluent (with some help from Claude). The goals were fairly simple: 1. Make it easy to run sophisticated, reproducible benchmarks (no stitching together a pile of scripts and dashboards). 2. Make it really easy to share results with colleagues which contain all the information to reproduce the test. 3. Make chart generation and log gathering automatic (rolling them into the result package). 4. Support powerful workload modelling, latency/throughput analysis, and benchmark orchestration with ease-of-use as a primary concern. Dimster is my attempt to make Kafka benchmarking more structured and reproducible. I'll be publishing my performance analysis of share groups, using Dimster, very soon. https://t.co/L2c2ZA64Rz

4

127

13

92

10K

Jack Vanlightly

15 days ago

@criccomini @diptanu Idempotency still needs to be implemented by the user, exactly-once continues to be a mirage

0

4

0

0

82

Jack Vanlightly

15 days ago

@richardartoul There's a protocol for that

0

1

0

0

249

Jack Vanlightly

16 days ago

I asked Claude if HdrHistogram was still being maintained and Claude told me no, Gil Tene had passed away. Shocked I googled him and saw he was alive and giving talks. I told Claude and it updated my CLAUDE.md with an entry reminding it to check if people are still alive before saying they are dead 😂

0

5

0

0

460

Jack Vanlightly

16 days ago

vanlightly's tweet photo. https://t.co/96a5X2LanY

1

20

1

4

2K

Jack Vanlightly

16 days ago

Link to my sketches: https://t.co/hzQ7IMUgDr

1

0

0

1

478

Jack Vanlightly

17 days ago

AI generated based on my previous (non-AI generated) sketch https://t.co/or9hvPMH99 Too lazy to draw a new one myself these days.

0

5

2

2

762

Jack Vanlightly

17 days ago

AI in the hands of non-coders

vanlightly's tweet photo. AI in the hands of non-coders https://t.co/e7O5I7Qs2g

2

121

15

28

7K

Jack Vanlightly

21 days ago

Seeing the recent NVMe/S3 debate, the use of the word "fast", plus I've been doing benchmarking for the last month, I thought I'd republish my "benchmarketing" guide from 2022:

vanlightly's tweet photo. Seeing the recent NVMe/S3 debate, the use of the word "fast", plus I've been doing benchmarking for the last month, I thought I'd republish my "benchmarketing" guide from 2022: https://t.co/egTAdhuY5N

1

56

4

23

16K

Jack Vanlightly

22 days ago

Seen some discussion of unit tests vs end-to-end/integrations tests in the age of AI. A few years ago I proposed the "testing cafetiere" to replace the "test pyramid". These days I think the randomized system testing could be replaced by, or least augmented with deterministic simulation testing.

Jack Vanlightly

about 7 years ago

In my #heisenbug talk on #distributedsystems testing I introduced the "Testing Cafetiere" as a distributed systems specific test strategy. - Use randomness to find your edge cases --> add your edge cases to your regression tests - Test your implementation - Verify your design

vanlightly's tweet photo. In my #heisenbug talk on #distributedsystems testing I introduced the "Testing Cafetiere" as a distributed systems specific test strategy.

- Use randomness to find your edge cases --> add your edge cases to your regression tests
- Test your implementation
- Verify your design https://t.co/Co14gekWFa

1

15

3

4

0

0

7

1

6

2K

Last Seen Users on Sotwe

Trends for you

Most Popular Users