Want to ship and deploy faster?
Web agencies, developers, plugin creators—this is for you
Join us at #DevStudioCon2024🇮🇳 and come see how you can
🤝 Take on more clients with less manpower
💰 Earn more with apps and widgets
🪩 Network and have fun
Save your spot now
I hear a lot of criticism of the work I did on DynamoDB Single Table Design these days. I even saw @alexbdebrie apologizing for being an advocate the other day. The core concept behind STD, however, has always been the same as it is for all #NoSQL databases.
What is accessed together should be stored together, and how that data is stored should be influenced by how it is accessed.
Anyone who tells you a different story than the one that follows is wrong. I led the team that invented the Single Table Design pattern. I have facts, they have at best half-informed opinions. There are 3 people in this world who could tell this story. The other two I am sure could add more color, but the picture they draw will look the same.
8 years ago we had 5 GSI's, every index had its own capacity allocation, and there was no such thing as on-demand pricing. Even when GSI's were increased to 25 you still had to allocate capacity for each one. It was a major pain. Eventually all of these things were fixed, but to make things work before any of that happened we introduced a design pattern called Index Overloading to our data modeling guidance on the NoSQL Blackbelt team at Amazon.
Summarizing the pattern, an index only needs to know what attributes to use as alternate partition and sort keys. If users specify generic attribute names like "GSI1PK" and "GSI1SK" then different Item types could store different values in those attributes and be indexed using type specific sorting on shared GSI's. The concept was a workaround that enabled teams to index each type of Item stored on the table 5 different ways, effectively eliminating the 5 GSI limit.
The drawback of doing this was that indexes became more and more polluted with unrelated Item types as the number of access patterns they supported increased. Because of this it was not easy to drop and recreate indexes without table scans and batch updates which became expensive at scale. The pattern also introduced heavy cognitive load on developers as using abstract naming for index attributes meant it was not always immediately apparent when looking at the data how the data was being indexed unless the values assigned to the generic keys were self-explanatory.
All of these things were tradeoffs for applying the Index Overloading pattern, not core issues with Single Table Design itself. They were often deemed acceptable inconveniences considering the benefit of having effectively unlimited GSI's. Most of the problems that drove the need for Index Overloading have been resolved over the years as DynamoDB has added support for 25 GSI's, introduced on demand pricing, and eliminated the need to allocate capacity individually for each index. As a result the pattern should really be considered deprecated today.
Additionally, many people over the years have also taken STD to an extreme that was never intended. Mixing configuration and operational data, maintaining a single table across service boundaries, or storing unrelated data that is not accessed together in the same table. Despite the fact that there are some people out there trying very hard to rewrite history around this, none of these things were ever recommended as best practices.
It is easy to look at the product as it exists today and criticize the design patterns of yesterday that were invented to deal with API deficiencies that no longer exist. I read some serious garbage every now and then written by people I feel should know better, Those people really had very little exposure to the process of solving the problems we faced when the patterns they criticize were introduced.
Would I tell teams today to do some of the things I advocated for 6 years ago? Of course not. Does that make guidance we gave invalid years before features were introduced obviating that guidance? Of course not, and none of that invalidates Single Table Design. The core concepts of STD cannot be ignored without incurring cost and performance hits. It is the same with all #NoSQL databases:
Data that is accessed together should be stored together, either embedded in the same Item or stored on the same table/collection where it can be easily indexed. To maximize efficiency of the system, the data structures used should be influenced by how the application accesses the data.
Introducing 'Prompt Engineering with Llama 2' — an interactive guide covering prompt engineering & best practices for developers, researchers & enthusiasts working with large language models.
Access the notebook in the llama-recipes repo ➡️ https://t.co/TbLWc7xlD5