Daniel Ritter @dritter_hd - Twitter Profile

Pinned Tweet

over 4 years ago

Great project on Cloud Databases and Hardware Acceleration (verified achievement from @sap @SAPInMemory). https://t.co/u7uMMHbozn

1

3

0

dritter_hd retweeted

Peter Kraft

@petereliaskraft

over 1 year ago

How did Google build the world’s most scalable database? Spanner is an incredibly impressive system–the first data store to provide globally distributed transactions at scale. However, when it was first built, it was hard to use, offering only key-value semantics. This paper tells the story of how Spanner evolved into a full database, with support for SQL and all the features application developers expect, at a scale at which they had never been offered before. Like any database, Spanner compiles a SQL query into a physical query plan then optimizes it. However, to run queries at massive scale, it needs new distributed operators. The most important of these is distributed union, which ships a subquery to each shard of the underlying data and concatenates the results. This is a building block for performing distributed aggregations or joins over sharded tables. To make Spanner work, a distributed union has to be inserted above every table in a query plan. Because a distributed union is expensive, they do a lot of work to push operations into the union (particularly filters). Moreover, joins are aggressively rewritten to minimize the number and size of the distributed unions performed. Spanner optimizes the performance of distributed queries using a coprocessor framework: each remote call is addressed not to a particular server, but to a range of data. This gives the runtime leeway to execute each query in the most efficient manner, routing each subquery request to the nearest replica that can serve the request. It also gives the runtime freedom to filter which shards are queried based on the requested keys, so shards don’t receive irrelevant requests. Moreover, it allows transparent masking of transient failures, as any subquery is automatically served from an available replica, even if other replicas are offline. What are the main takeaways? First, SQL semantics help adoption. Spanner always allowed users to reliably store and query data at scale, but adding SQL made it easier to write faster queries (thanks to the optimizer) and more complex queries. Second, even with full SQL support, using databases at scale is hard. The optimizer makes it much easier to write performant queries, but even then there are many pitfalls, and the paper makes it clear that (as of 2017), the Spanner team still needs to work with internal customers to make sure their queries won’t’ hit scaling bottlenecks.

petereliaskraft's tweet photo. How did Google build the world’s most scalable database?

Spanner is an incredibly impressive system–the first data store to provide globally distributed transactions at scale. However, when it was first built, it was hard to use, offering only key-value semantics. This paper tells the story of how Spanner evolved into a full database, with support for SQL and all the features application developers expect, at a scale at which they had never been offered before.

Like any database, Spanner compiles a SQL query into a physical query plan then optimizes it. However, to run queries at massive scale, it needs new distributed operators. The most important of these is distributed union, which ships a subquery to each shard of the underlying data and concatenates the results. This is a building block for performing distributed aggregations or joins over sharded tables. To make Spanner work, a distributed union has to be inserted above every table in a query plan. Because a distributed union is expensive, they do a lot of work to push operations into the union (particularly filters). Moreover, joins are aggressively rewritten to minimize the number and size of the distributed unions performed.

Spanner optimizes the performance of distributed queries using a coprocessor framework: each remote call is addressed not to a particular server, but to a range of data. This gives the runtime leeway to execute each query in the most efficient manner, routing each subquery request to the nearest replica that can serve the request. It also gives the runtime freedom to filter which shards are queried based on the requested keys, so shards don’t receive irrelevant requests. Moreover, it allows transparent masking of transient failures, as any subquery is automatically served from an available replica, even if other replicas are offline.

What are the main takeaways? First, SQL semantics help adoption. Spanner always allowed users to reliably store and query data at scale, but adding SQL made it easier to write faster queries (thanks to the optimizer) and more complex queries. Second, even with full SQL support, using databases at scale is hard. The optimizer makes it much easier to write performant queries, but even then there are many pitfalls, and the paper makes it clear that (as of 2017), the Spanner team still needs to work with internal customers to make sure their queries won’t’ hit scaling bottlenecks.

2

378

55

337

27K

dritter_hd retweeted

Pınar Tözün (@[email protected]) @pinartozun

over 1 year ago

📢📢We have a new data systems faculty position @ITUkbh @dasyaITU. Application deadline: Nov 28. For more information, see the link below. Reach out to me if you have any questions. https://t.co/Er5TZ5kQQb

pinartozun's tweet photo. 📢📢We have a new data systems faculty position @ITUkbh @dasyaITU. Application deadline: Nov 28. For more information, see the link below. Reach out to me if you have any questions.
https://t.co/Er5TZ5kQQb https://t.co/8OJ8v4tt21

0

17

7

1

1K

dritter_hd retweeted

Lukas Vogel @VogelLu

almost 2 years ago

Don't miss your first chance to try out CedarDB right in your browser!

0

1

0

233

Who to follow

Gabor Szarnyas

@szarnyasg

Developer relations @DuckDB, benchmark lead @GraphCouncil

Danica Porobic

@danicaporobic

CMTS@Oracle Database, PhD Databases EPFL, opinions are my own; find me on bsky for technical topics

Arnab Phani

@ArnabPhani

PhD student at TU Berlin | Large-scale Data Systems | PMC member of @ApacheSystemDS | Past: Sr. SWE @ Teradata Database

dritter_hd retweeted

Andy Pavlo (@andypavlo.bsky.social) @andy_pavlo

about 2 years ago

CedarDB: The 🇩🇪German-powered, PostgreSQL-compatible freak-of-nature database management system based on TUM's Umbra (Thomas Neumann + team) is out of stealth and now available: https://t.co/c7BoxGnD02 /cc @cedar_db

9

335

63

185

50K

dritter_hd retweeted

Immanuel Trummer

@ImmanuelTrummer

about 2 years ago

Honored to be promoted to Associate Professor with Indefinite Tenure at Cornell University. My thanks go to my amazing students, collaborators, mentors, colleagues, and my beloved family. @Cornell @CornellCIS #CornellUniversity #Tenure #AssociateProfessor

ImmanuelTrummer's tweet photo. Honored to be promoted to Associate Professor with Indefinite Tenure at Cornell University. My thanks go to my amazing students, collaborators, mentors, colleagues, and my beloved family.
@Cornell @CornellCIS #CornellUniversity #Tenure #AssociateProfessor https://t.co/xg3tmOUNE8

31

189

3

1

11K

Daniel Ritter @dritter_hd

about 2 years ago

@ImmanuelTrummer @andy_pavlo @Cornell @CornellCIS Congratulations!

1

0

116

dritter_hd retweeted

Manos Athanassoulis @manathan1984

about 2 years ago

#NEDB24 program is out https://t.co/5y0GiFG8wg We have 3 exciting keynotes, 14 talks, and 44 posters! Registration is now open: https://t.co/Zli6x5UNoV Enjoy this wonderful program at NEDB24, held at BU's new iconic CCDS! #NEDB #BU #CCDS photo credit: https://t.co/7dczdmHnVA

manathan1984's tweet photo. #NEDB24 program is out https://t.co/5y0GiFG8wg

We have 3 exciting keynotes, 14 talks, and 44 posters!

Registration is now open: https://t.co/Zli6x5UNoV

Enjoy this wonderful program at NEDB24, held at BU's new iconic CCDS!

#NEDB #BU #CCDS
photo credit: https://t.co/7dczdmHnVA https://t.co/JyEKJOtj2i

0

14

4

1

8K

Daniel Ritter @dritter_hd

about 2 years ago

Accepted paper with @SRinderleMa in Information Systems journal @ElsevierConnect: "Responsible composition and optimization of integration processes under correctness preserving guarantees": https://t.co/sylW5Yja78. More information on: https://t.co/pBHPKV9K8P. @sapbtp @SAP

0

4

0

196

dritter_hd retweeted

Jeremy Taylor @refset

about 2 years ago

SQL is turning 50 years old later this week 🎉 In your opinion, which are the best bits? Which are the worst?

5

94

23

22

16K

dritter_hd retweeted

Matthias Boehm @matthiasboehm7

over 2 years ago

Just returned from an awesome (and my first) Dagstuhl seminar on "Robust Query Processing in the Cloud" last week - lost of memories, new friendships, and great topics. https://t.co/2RpK6jFPhL

matthiasboehm7's tweet photo. Just returned from an awesome (and my first) Dagstuhl seminar on "Robust Query Processing in the Cloud" last week - lost of memories, new friendships, and great topics.
https://t.co/2RpK6jFPhL https://t.co/svFmomKpWo

4

37

1

0

1K

dritter_hd retweeted

Carsten Binnig @cbinnig

over 2 years ago

We have an opening for a professorship @CS_TUDarmstadt @Hessian_AI in the intersection of #Systems & #AI. We seek outstanding researchers (open rank) who work on Systems for AI / AI for Systems: https://t.co/Qi9ZV6gEhT. Please share or reach out to me if you have any questions.

3

43

18

4

10K

dritter_hd retweeted

Manos Athanassoulis @manathan1984

over 2 years ago

Our book on "Data Structures for Data-Intensive Applications" co-authored with Stratos Idreos (@HarvardDASlab) and @DennisShasha can be downloaded for free from the publisher for the next few days (until Feb 12): https://t.co/0lsOjSC6hy #FNT #DataStructures #Textbook #Databases

2

172

24

144

15K

Daniel Ritter @dritter_hd

over 2 years ago

I am happy and proud to share that my first recruited and supervised PhD student @SAP Jonas Dann successfully defended his thesis at @UniHeidelberg: see full article here https://t.co/oRpjEATkaM

0

3

0

52

dritter_hd retweeted

Andy Pavlo (@andypavlo.bsky.social) @andy_pavlo

over 2 years ago

I'm back again with my annual retrospective of the last year in the world of databases. Major highlights include vector databases, @MariaDB problems, SQL:2023, the FAA database crash, and the most expensive password change ever: https://t.co/BoHTfX5QOW

6

379

94

142

45K

Daniel Ritter @dritter_hd

about 3 years ago

Don't miss out on our new work on "Elastic Use of Far Memory for In-Memory Database Management Systems" @SIGMODConf conference in Seattle: https://t.co/kO7PLY1gTy, which will be presented at DaMoN https://t.co/At6kSkPjqD this week.

0

6

3

0

701

Daniel Ritter @dritter_hd

over 3 years ago

Some open challenges to focus on.

0

72

Daniel Ritter @dritter_hd

over 3 years ago

We had a great keynote today by @andy_pavlo from @OtterTuneAI and @CMUDB at @BTWconference in @dresden on "Why Machine Learning for Automatically Optimizing Databases Doesn't Work" or does it? https://t.co/ymdy2S2vVG

1

2

0

222

Daniel Ritter @dritter_hd

over 3 years ago

@conor_power23 @joe_hellerstein @conor_power23 in action. Well done!

0

1

0

114

dritter_hd retweeted

Peter Boncz @peterabcz

over 3 years ago

Many attended the memorial, but many also missed it, so it is good this is online. Martin Kersten instilled in me core values (e.g. systems impact over papers, and DB architecture as foundational research). Also 🙏 @arjenpdevries Arno @ailamaki Yannis & @andy_pavlo for speaking!

peterabcz's tweet photo. Many attended the memorial, but many also missed it, so it is good this is online. Martin Kersten instilled in me core values (e.g. systems impact over papers, and DB architecture as foundational research).
Also 🙏 @arjenpdevries Arno @ailamaki Yannis & @andy_pavlo for speaking! https://t.co/atf1Lar2KK

0

27

5

0

6K

dritter_hd retweeted

Peter Boncz @peterabcz

over 3 years ago

@cidrdb is a wrap! Keynotes by Gustavo Alonso on DB research in the hardware age and @hfmuehleisen on creating DB systems from academia; that were thought provoking. Great in-person presentations of 32 papers and 6 sponsor talks, a gong show+quiz by @andy_pavlo & a startup panel.

peterabcz's tweet photo. @cidrdb is a wrap! Keynotes by Gustavo Alonso on DB research in the hardware age and @hfmuehleisen on creating DB systems from academia; that were thought provoking. Great in-person presentations of 32 papers and 6 sponsor talks, a gong show+quiz by @andy_pavlo & a startup panel. https://t.co/KD0wwK4pbI

1

24

2

3K

Daniel Ritter

@dritter_hd

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users