The state timeline view just got a bunch of improvements:
๐ธ Support for numbers and boolean components, not just strings.
๐ธ Drag a component right from the streams tree into a state timeline.
๐ธ Highlight the time range of a state by hovering it.
๐ธ Clear state by logging an empty string or a Clear message: the timeline shows a gap until the next state value.
This and more in the latest 0.33 release ๐
Rerun 0.33.0 is out with more focused updates for ML/robotics data workflows.
Usability improvements on dataset review
Faster targeted extraction from large RRDs via push-down filtering
State timeline view now supports numbers/booleans, drag-drop, hover ranges, and gaps
New headless Viewer for automation + screenshots (!!!)
Check out the full release notes ๐
Hardware acquired ๐ซก thanks to @boosterobotics. Now I finally have something I can use for imitation learning. Thanks to @_gijsdj for building out the @rerundotio integration
I'm going to my first @CVPR this year!! I'll be there representing @rerundotio. I'm not entirely sure what to expect, but feel free to say hi ๐. Would love any advice on how to get the most out of CVPR. My DMs are open if you want to reach out!
I also wanted to share a much deeper breakdown of what I've been working on. Take a look and let me know what you think!
TLDW:
Break down the full pipeline of ingestion -> rrd file creation -> catalog registration -> dataset review -> data enrichment using pytorch data loader
All made possible with the latest 0.32 @rerundotio open-source SDK release.
Low-level chunk processing made ergonomic ๐ก
As a roboticist, you constantly run into these papercuts that make working with data difficult. You're dealing with incomplete data requiring preprocessing, custom data types, bugs in the recorded data, data spread across multiple files in different formats, and much more.
We've recently released our Chunk Processing API, a flexible, chunk-centric API for data ingestion, transformation, and conversion pipelines. It covers I/O from common robotics file formats, powerful declarative data-wrangling primitives, and a multithreaded native engine for pipeline execution. The API is designed to support distributed execution in the future.
We've put together an example showing how to assemble a robot recording from multiple file sources, including preprocessing to modify or augment the data, with the API. ๐
What storage format should you use if you are serious about robot learning?
Robotics teams use formats like Parquet, MCAP, Lance, and NCore. Each solves a subset of the problems you face in robot learning.
Parquet is strong for columnar analytics, but dense row groups make it a poor fit for sparse, multi-rate robotics data.
MCAP is write-optimized and excellent for recording robot logs. However, it is essentially a container format of opaque messages (encoded with JSON, protobuf, CBOR, etc.) and not optimized for columnar analytical queries. Big scans and joins are slow, since you need to decode each message.
Lance supports multimodal data, random access, and schema evolution, but still uses row-aligned fragments that can bloat with nulls for multi-rate streams.
Every feature your team needs that a file format doesn't provide means another pipeline to keep synchronized and additional tools to learn for your team to succeed. Rerun's .rrd format is the only option that can serve all the use cases needed to transform robot recordings into intelligence. It allows you to keep data at original timestamps, while still being able to query and view from the same datasource, and stream to training. It has enough structure to build unified data systems on top, but is flexible enough to optimize for different read and write patterns.
Learn more about .rrd ๐
If you're serious about robot learning, .rrd is for you!
Robot learning data is multi-rate and multimodal: cameras, IMUs, joint states, GPS, video, transforms, and derived signals all arrive at different frequencies and sizes.
.rrd is designed around that reality. It keeps physical data at original timestamps, makes chunks addressable through an index, and supports visualization, dataframe/SQL queries, transformations, and training from the same source.
The format is built around column chunks. Each chunk stores a subset of rows and columns. Internally, each column chunk is encoded as an Apache Arrow record batch together with semantic metadata describing how the data should be interpreted. Apache Arrow is the industry standard for data science, and using Arrow gives us a fast, zero-copy path into DataFusion, Pandas, and Polars.
With Rerun 0.32, the file format is stable enough for teams to build on top of it.
Learn more about storing physical data in column chunks ๐
Still buzzing from our SF Community event last evening!
If you want to meet us in person we'll host the next event in Brooklyn, NYC, this evening. Itโs a casual meetup meant for good conversations, meeting new faces, and getting a feel for what others are building.
A new UI for rapid review of datasets for training and evaluation ๐
In 0.32 we're shipping the first version of our (experimental) dataset review tool. It lets you quickly eye-ball many recordings at once, to go anomaly hunting and build intuition for your data. It also lets you flag recordings, making it useful as a simple annotation tool. The view is configured using a normal Rerun blueprint.
New State Timeline View ๐ค
A common request from the community has been the ability to visualize state changes over time. 0.32 brings a new experimental State Timeline View. If you've been waiting for this view in Rerun we'd love your feedback on what more you'd like to see from the view.
The robotics ecosystem has been missing a unified framework flexible enough for the full lifecycle of robot learning data. Rerun 0.32 is the point where the pieces start to come together: record, inspect, query, transform, review, and train without constantly changing representations.
All the details in our release post ๐
Expanded built-in support for MCAP, ROS 2 types, and robotics visualizations ๐
We think it's critical that it's easy to ingest and make all robotics data useful in Rerun. At the same time, there is a lot of data that can be perfectly handled without customization, and we continue to improve that experience on every release.
0.32 brings improved performance and more out-of-the-box support for MCAP and common ROS 2 types, as well as an expansion of available visualizations. See the updated list of messages with built-in support here. Occupancy grids, or 2D maps in 3D are important for mobile robots and have been a heavily requested feature in Rerun for a while. 0.32 therefore adds the new GridMap archetype and visualizer, paired with built-in support for the corresponding ROS 2 messages.
Processing physical data can be a drag. We're here to change that!
We added an example demonstrating Rerun 0.32's new chunk processing API in a robotics use case. It shows how you can build a processing pipeline that reads, modifies, and augments robot data spread across multiple files, then merges them into a coherent recording. The pipeline is non-trivial, but the API allows you to express it in fairly compact code.
๐ธ read, convert and fix MCAP data
๐ธ add data from custom files
๐ธ compute forward kinematics, using URDF and custom joint states from MCAP
๐ธ modify and add URDF data
0.32 has shipped, and it's a massive release from @rerundotio. There's a ton of cool new features, and I wanted to highlight 2 in particular
1. OSS Server streaming from disk
2. Dataset review
I walk you through them in the video, so take a look. I'll have a much longer blog post next week about the entire pipeline. With 0.32, much of the foundation is set for a unified data layer for physical data, and I'll be getting into the details of it with all that I've built over the past year. This will cover
1. Raw Data Collection
2. Data Ingestion
3. Catalog Registration
4. Query and Review
5. Post Process
6. Training
so lots to share
We just shipped our biggest @rerundotio open source release ever, and our commercial product Rerun Hub is now available as private preview. Iโm deeply proud of what the team has done here and very excited to share more publicly what weโve been working on for the last year and a half.
Weโre building a new data layer for robot learning