Yeah, you probably should. Same logic, same answer: columns in both tables. There's no reason to treat the meta differently between them & like you said you might query charts by range someday & it compresses too, so charts wants the same column approach as rankings. Consistency is good in this case at least.
The only field that I think of is different between the two is podcastList, the ordered array. That's the actual modeling question, imho, not the meta.
you pretty much made my earlier point for me too, if both tables carry the exact same meta columns, and a chart is just the ordered list while a ranking is one row per position, that's the same data at two zoom levels. makes me think that charts could be a view or continuous aggregate over rankings instead of a second table you write to. Might not need both.
caveat on all of this is some of this is me shooting from a hip a bit making with the tweety tweets, so it's not necessarily 100% gold standard architecture :D
Congrats on getting back on the mic! here's some thoughts although they are a bit disjointed but hopefully helpful :D
On the schema stuff, good news: the write cost you're worried about basically isn't real.
You already figured out the fix! Resolve the chart_id once per chart (your 34k/hour), then hand it to all the rankings underneath. The 10M ranking writse never touch the charts table. And you don't even have to *read* to get the id. `INSERT ... ON CONFLICT (date, country_code, genre_id, type) ... RETURNING id` gets you one round-trip per chart w/ no lookup.
I would maybe suggest for the rankings table, don't normalize the meta, and don't cram it in a single jsonb column either. Promote each key to its own column. country_code, genre_id, platform, type, platform_podcast_id - because compression and segmentby work on columns. Those low-cardinality meta fields repeat constantly, so as flat columns they compress super well and you can segmentby them. Buried in a jsonb blob you lose both the compression and the fast range-filtering. So for your high volume table,columns are both faster to write (no lookup) and cheaper to store. The thing you were worried about just poofs away.
also maybe can save you a whole table with this idea - your "charts" and "rankings" are kind of the same data at two zoom levels. A chart is the ordered podcastList, a ranking is one row per position (rank = index + 1). So rankings is really the relational-native version of charts. You might not need to migrate "charts" as its own time-series thing at all. It could be a view or a continuous aggregate over rankings. Worth a look. I might be wrong though. I"m spitballing a bit :)
also also with platform as a column you probably don't need separate apple_rankings / spotify_rankings tables. One hypertable, segmentby platform, gets you the same isolation without maintaining two of everything. although I know you had other reasons for doing that
@michaelfreedman@theDanielJLewis@Podgagement@TimescaleDB@convex Hi Daniel! I’m a long time fan of your podcast from years past :)
I wrote up a gist for some things to help you get started. I’m also happy to help more as you work on this migration and help answer any questions and give more guidance.
https://t.co/OxqDr4JWrB
WHOA!
I just tested moving Podgagement's 4 TB #podcast-rankings database from @MongoDB with time-series collections to PostgreSQL with @TimescaleDB.
The results absolutely shocked me!
I ran repeated tests generating and inserting essentially identical data on both databases. Both experiments running on identical @Hetzner_Online servers.
MongoDB averaged 11:55, maxing out all 4 vCPUs, and generated about 500 MB of data.
#PostgreSQL with TimescaleDB, however, averaged 15:10, using about 68% of all 4 vCPUs, and generated _34_ MB of data.
WHAT?
I think #MongoDB storage gets "more" efficient at scale, because this test created lots of time-series "buckets" with only 1 record in each. But even then, PostgreSQL and #TimescaleDB compressed the same data so much better.
I can't fully test read performance with such a small dataset, but simply counting items in the database was _much_ faster with Postgres.
Introducing TigerFS - a filesystem backed by PostgreSQL, and a filesystem interface to PostgreSQL.
Idea is simple: Agents don't need fancy APIs or SDKs, they love the file system. ls, cat, find, grep. Pipelined UNIX tools. So let’s make files transactional and concurrent by backing them with a real database.
There are two ways to use it:
File-first: Write markdown, organize into directories. Writes are atomic, everything is auto-versioned. Any tool that works with files -- Claude Code, Cursor, grep, emacs -- just works. Multi-agent task coordination is just mv'ing files between todo/doing/done directories.
Data-first: Mount any Postgres database and explore it with Unix tools. For large databases, chain filters into paths that push down to SQL: .by/customer_id/123/.order/created_at/.last/10/.export/json. Bulk import/export, no SQL needed, and ships with Claude Code skills.
Every file is a real PostgreSQL row. Multiple agents and humans read and write concurrently with full ACID guarantees. The filesystem /is/ the API.
Mounts via FUSE on Linux and NFS on macOS, no extra dependencies. Point it at an existing Postgres database, or spin up a free one on Tiger Cloud or Ghost.
I built this mostly for agent workflows, but curious what else people would use it for. It's early but the core is solid. Feedback welcome.
https://t.co/IPhieopOSP