Today's ¡Databases! Seminar Speaker: Ben Hannel will talk about dynamically typed SQL execution in @RocksetCloud. Zoom talk is open to public at 4:30pm ET. YouTube video will be available afterwards: https://t.co/ov0pq12zA6
I’m absolutely in ❤️ love with Rockset.
We now have streaming ETL using their write API. ( including real-time deletes )
Every field is indexed! Including nested JSON fields.
Queries that took 6sec take 500ms now.
I just bought back so much of our teams time.
The two major trends in data are speed & scale. The team at @RocksetCloud@iamveeve@dhruba_rocks@tudor@igorcanadi have broken the speed/scale barrier to build modern data applications on a real-time analytics database.
@RocksetCloud is on a mission to eliminate all the cost and complexity associated with real-time analytics. Today's Rollup launch is a very important milestone in this journey. Read more👇
Very excited about our Series B today! A great milestone on our mission to build some of the finest database tech, optimized for developers. P.S. We are hiring - DMs are open.
@narayanarjun @MarkCallaghanDB This is how we use it, exactly - we download all files locally. Still investigating if we can keep a portion of the data in the cloud without a local cache. Likely possible for scan queries - S3's throughput is impressive.
@MarkCallaghanDB @iamveeve Surprisingly accurate. ;) We allow updates to existing documents, so we keep PK -> rowid index. We use RocksDB's merge operator, but the efficiency of reading chunks that have merge updates is still an open question for us.
@vfonic Strongly disagree. GraphQL is exposing business logic, which is rarely fully expressible in SQL. Opening up SQL interface to the world is also very risky -- you need strong protection against expensive queries.
What's the fastest we can load the data into @RocksDB? New blog post listing some of the optimizations that resulted in 20x performance win at @RocksetCloud : bigger batches, parallel writes, no memtable, no compactions. https://t.co/ziGHBU09Bp
@janicduplessis Hey @janicduplessis I work at Rockset, would love to have you try it out and share your thoughts. First 2GB are free, happy to help if you get stuck with anything.
@tlipcon@markcallaghan @MongoDBEng @RocksetCloud That’s correct. We have specialized IValue types for arrays of same-type scalars, on which we can do vectorized. For mixed-type arrays (columns) you are out of luck, but when you have an implicit schema the speed should be similar as with explicit schema.
@markcallaghan @MongoDBEng @RocksetCloud Still early, but we have a project to make columnar better by combining many values into a single key-value in RocksDB through merge operators. This is a user-space approach though; haven't explored improvements to the LSM itself.
@tlipcon@markcallaghan @MongoDBEng @RocksetCloud Schemaless is fun, the only trouble is lack of existing schemaless SQL optimizers out there (like Apache Calcite). What made you go the fixed schema route? BTW have you seen https://t.co/MjjpGsaeAa ?
@FranckPachot@andy_pavlo@markcallaghan @MongoDBEng @RocksetCloud@RocksetCloud's converged indexing is simply indexing almost everything, both at the top level and for nested fields. We detect the type of the value and index it in a type-appropriate manner: lexicographic ordering for strings, arithmetic ordering for numbers, etc. 1/2
@FranckPachot@markcallaghan @MongoDBEng @RocksetCloud We build search indexes on columns, which allow us to quickly evaluate conjunction of predicates (intersecting posting lists). There are some queries that would be faster with multi-column indexes, I agree. Future work. :)