The early design decisions for the Categorical type were under strain because of our streaming engine. Every data chunk carried its own mapping between the categories and their underlying physical values, forcing constant re-encoding. The global StringCache we built to solve it caused lock contention and wasn't designed for a distributed architecture.
The new Categories object, released in 1.31, solves this, and gives you:
• Control over the physical type (UInt8/16/32)
• Named categories with namespaces
• Parallel updates without locks
• Automatic garbage collection
When you know the categories up front you can use Enums. They're faster because of their immutability and allow you to define the sorting order of values.
The StringCache is now a no-op, but the code will keep working how it used to (with global Categories). You can also migrate by replacing it with explicit Categories where needed.
The result is a Categoricals data type that works well on the streaming engine without performance degradation, and is compatible with a distributed architecture.
Read the full deep dive: https://t.co/kkikdqxrER
pandas 3 has been released and marks the most significant evolution of #pandas in over ten years.
No more `copy()` everywhere, and no more `lambda` gymnastics.
Want examples? Read this hands-on article with the main changes: https://t.co/vnzMbCks0l
We're happy to announce the release of #pandas 2.3.0. You can install it with `pip install pandas` or `conda install -c conda-forge pandas`. Thanks to all contributors and sponsors who made this release possible! The release notes can be found at: https://t.co/y9w1kBiqtP
Today we are launching the first open Crash Course training sessions with a limited time discount. These instructor-led sessions are open to everyone looking to get up and running with Polars.
Find a date and sign up via our Academy: https://t.co/2hVIPQDLMO
I've written a blog about Tonbo's research on async Rust and io_uring:
https://t.co/oRT1DrCaKM
We need to be careful to avoid the cancellation problem when using async Rust and io_uring together.