you are not prepared @vancluse - Twitter Profile

vancluse retweeted

8 days ago

Run Polars' distributed engine on your own infrastructure. Deploy a distributed Polars cluster on any Kubernetes setup (EKS, AKS, GKE, or minikube) and get a query dashboard with past queries, advanced query profiling, Open-lineage support, and more. Sign up and install with a single Helm command. Connect via `ClusterContext` and run distributed queries. Read all about it at https://t.co/PnKfBF2wZ2

0

36

4

13

7K

vancluse retweeted

polars data

@DataPolars

16 days ago

We've released Python Polars 1.41. Some of the highlights: • Faster Parquet metadata decoding Parquet metadata is now decoded with a hand-written, specialized Thrift parser instead of the generic auto-generated one. Speedup scales with table width: 1.6× for 100-column tables, up to 3.3× for 10,000-column tables. • Nested common subplan elimination The query optimizer now eliminates duplicate subplans at all nesting depths. • LazyFrame.gather() Row selection by integer index is now available in lazy mode, without collecting first. Blog post: https://t.co/r5vnzQ4HkJ

2

41

10

7

4K

vancluse retweeted

polars data

@DataPolars

about 2 months ago

We've released Python Polars 1.40. Some of the highlights: • Streaming grouped AsOf join AsOf joins with a `by` argument are now supported in the streaming engine, extending last release's streaming AsOf support to grouped time-series joins. • Basic over() in the streaming engine Elementwise window expressions using over() can now run in the streaming engine. • More expressions lowered to streaming cov(), corr(), interpolate(), skew(), kurtosis(), and entropy() are now natively supported in the streaming engine. Link to the complete changelog: https://t.co/P7pkxZrNuk

2

40

4

6

3K

vancluse retweeted

polars data

@DataPolars

3 months ago

We've released Python Polars 1.39. Some of the highlights: • Streaming AsOf join join_asof() is now supported in the streaming engine, enabling memory-efficient time-series joins. • sink_iceberg() for writing to Iceberg tables A new LazyFrame sink that writes directly to Apache Iceberg tables. Combined with the existing scan_iceberg(), Polars now supports full read/write workflows for Iceberg-based data lakehouses. • Streaming cloud downloads scan_csv(), scan_ndjson(), and scan_lines() can now stream data directly from cloud storage instead of downloading the full file first. Link to the complete changelog: https://t.co/62Mx2ZJWVh

2

181

24

53

9K

Who to follow

Persona de Gustos sencillos, muslos, culos y tetas. Nagatoro Lover 🤍

vancluse retweeted

3 months ago

pl.from_repr() constructs a DataFrame or Series directly from its printed string representation. This can be useful in unit tests: instead of rebuilding expected DataFrames through dictionaries with typecasting, the schema is encoded in the header and the values are right there in the table. You can see at a glance what the test is asserting.

DataPolars's tweet photo. pl.from_repr() constructs a DataFrame or Series directly from its printed string representation. This can be useful in unit tests: instead of rebuilding expected DataFrames through dictionaries with typecasting, the schema is encoded in the header and the values are right there in the table. You can see at a glance what the test is asserting.

2

45

5

13

3K

you are not prepared @vancluse

4 months ago

Rust programming survey https://t.co/XgjGsBYR2w

0

1

0

48

vancluse retweeted

Charlie Marsh

@charliermarsh

4 months ago

Increasingly feel the need to build faster Rust tooling

43

566

8

13

33K

vancluse retweeted

polars data

@DataPolars

4 months ago

We've released Python Polars 1.38. Some of the highlights: • (De)Compression support on text based sources and sinks zstd and gzip are now supported for write_csv(), sink_csv(), scan_ndjson(), and sink_ndjson(). • scan_lines() to read text files This new function constructs a LazyFrame by scanning lines from a file into a string column. This is particularly useful for working with (compressed) log files. • Merge join in the Streaming engine When join columns are sorted in both DataFrames, we now use a merge join, which can improve performance 2-4x and in some cases even up to 10x. To unlock these performance gains, use the Lazy API and apply set_sorted(col) to let Polars know the data is sorted. Link to the complete changelog: https://t.co/wRdWzdc7iT

3

61

7

8

4K

vancluse retweeted

polars data

@DataPolars

4 months ago

The early design decisions for the Categorical type were under strain because of our streaming engine. Every data chunk carried its own mapping between the categories and their underlying physical values, forcing constant re-encoding. The global StringCache we built to solve it caused lock contention and wasn't designed for a distributed architecture. The new Categories object, released in 1.31, solves this, and gives you: • Control over the physical type (UInt8/16/32) • Named categories with namespaces • Parallel updates without locks • Automatic garbage collection When you know the categories up front you can use Enums. They're faster because of their immutability and allow you to define the sorting order of values. The StringCache is now a no-op, but the code will keep working how it used to (with global Categories). You can also migrate by replacing it with explicit Categories where needed. The result is a Categoricals data type that works well on the streaming engine without performance degradation, and is compatible with a distributed architecture. Read the full deep dive: https://t.co/kkikdqxrER

DataPolars's tweet photo. The early design decisions for the Categorical type were under strain because of our streaming engine. Every data chunk carried its own mapping between the categories and their underlying physical values, forcing constant re-encoding. The global StringCache we built to solve it caused lock contention and wasn't designed for a distributed architecture.

The new Categories object, released in 1.31, solves this, and gives you:
• Control over the physical type (UInt8/16/32)
• Named categories with namespaces
• Parallel updates without locks
• Automatic garbage collection

When you know the categories up front you can use Enums. They're faster because of their immutability and allow you to define the sorting order of values.

The StringCache is now a no-op, but the code will keep working how it used to (with global Categories). You can also migrate by replacing it with explicit Categories where needed.

The result is a Categoricals data type that works well on the streaming engine without performance degradation, and is compatible with a distributed architecture.

Read the full deep dive: https://t.co/kkikdqxrER

1

65

9

11

6K

vancluse retweeted

Python Hub

@PythonHub

4 months ago

Learn NumPy in 40 Minutes The video introduces the core concepts of NumPy and shows how its array operations form the foundation of numerical computing in Python. It emphasizes why NumPy is a must-learn tool for data science, AI, machine learning, and scientific workflows. https://t.co/0gj2URh0tw

0

14

4

9

2K

vancluse retweeted

polars data

@DataPolars

6 months ago

We've just released 1.36.0 with a couple of big features. Here are the highlights: Highlights: 🧩 Extension Types: Allows for custom data types within the Polars ecosystem. You can see an example in the image below. 🛟 Float16 Support: First-class support for model parameters and half-precision floating point data. ↪️ Lazy Pivot: LazyFrame.pivot() is finally here, allowing for query optimization on reshape operations. 👀 show(): Easily preview the first rows of a DataFrame or LazyFrame. 🗄️ SQL Parity: Added Window functions (ROW_NUMBER, RANK, DENSE_RANK) and CROSS JOIN UNNEST to the SQL API. Performance: ⏱️ Parquet writer improvement: 2.2x runtime improvement with a 20% peak memory usage reduction, which is even 39% for partitioned sinks (on a synthetic benchmark). 🚀 Support for group_by_dynamic and Sorted Group-By on the streaming engine. Find the full release notes here: https://t.co/cKr7V8purg

DataPolars's tweet photo. We've just released 1.36.0 with a couple of big features. Here are the highlights:

Highlights:
🧩 Extension Types: Allows for custom data types within the Polars ecosystem. You can see an example in the image below.
🛟 Float16 Support: First-class support for model parameters and half-precision floating point data.
↪️ Lazy Pivot: LazyFrame.pivot() is finally here, allowing for query optimization on reshape operations.
👀 show(): Easily preview the first rows of a DataFrame or LazyFrame.
🗄️ SQL Parity: Added Window functions (ROW_NUMBER, RANK, DENSE_RANK) and CROSS JOIN UNNEST to the SQL API.

Performance:
⏱️ Parquet writer improvement: 2.2x runtime improvement with a 20% peak memory usage reduction, which is even 39% for partitioned sinks (on a synthetic benchmark).
🚀 Support for group_by_dynamic and Sorted Group-By on the streaming engine.

Find the full release notes here: https://t.co/cKr7V8purg

2

94

11

18

5K

vancluse retweeted

polars data

@DataPolars

6 months ago

It’s been a year since the last Polars in Aggregate. Since then, we've shipped 37 releases, merged over 2,300 PRs, and built two new engines. Here is what you need to know: ☁️ Polars Cloud is Live: Write code once, run it anywhere. Whether processing hundreds of rows locally or billions of records distributed, the API remains the same, and data stays within your cloud. 🚀 Next-Gen Streaming: Our rewritten streaming engine is showing performance gains of 3x-7x in benchmarks compared to the default in-memory engine. 🔢 Stable Decimals & Int128: For financial and scientific contexts where 0.1 + 0.2 must exactly equal 0.3, we now offer full precision control and a massive integer range. We also cover the Categorical overhaul, collect_batches() for generators, and the new Common Subplan Elimination optimizer. Read the full breakdown here: https://t.co/kVGBD9VUPl

DataPolars's tweet photo. It’s been a year since the last Polars in Aggregate. Since then, we've shipped 37 releases, merged over 2,300 PRs, and built two new engines.

Here is what you need to know:

☁️ Polars Cloud is Live: Write code once, run it anywhere. Whether processing hundreds of rows locally or billions of records distributed, the API remains the same, and data stays within your cloud.

🚀 Next-Gen Streaming: Our rewritten streaming engine is showing performance gains of 3x-7x in benchmarks compared to the default in-memory engine.

🔢 Stable Decimals & Int128: For financial and scientific contexts where 0.1 + 0.2 must exactly equal 0.3, we now offer full precision control and a massive integer range.

We also cover the Categorical overhaul, collect_batches() for generators, and the new Common Subplan Elimination optimizer.

Read the full breakdown here:

https://t.co/kVGBD9VUPl

0

62

4

14

4K

vancluse retweeted

JPX J-Quants【公式】 @jpx_JQuants

7 months ago

【新サービス登場】 J-Quantsに「DataCube」が加わりました！高品質な株価・企業情報などを、必要な分だけ簡単ダウンロードできる買い切り型データサービスです。 https://t.co/ngS9CuQ1Jm リリース詳細：https://t.co/mD7aV6nH4t

0

61

17

35

26K

you are not prepared @vancluse

8 months ago

The AI coding trap https://t.co/0LNuZ0nGmf via @chrisloy

0

57

vancluse retweeted

polars data

@DataPolars

12 months ago

We've updated our benchmarks run. It has been more than a year since we ran them. Since then we've designed and implemented a complete novel streaming engine that can deal with Polars' data model. The future of Polars looks bright and very, very fast! https://t.co/li9LYERkGe

0

27

4

9

3K

you are not prepared @vancluse

about 2 years ago

@tjCjrBYNai9 良い情報発信ありがとうございます＾＾

0

8

vancluse retweeted

Carl Carrie (@🏠) @carlcarrie

over 2 years ago

#AMM Paper with some embedded Python source: Market making model analysis in High Frequency Trading for the north American stock market A simple approach without performance analysis, but still a good read for the #hft and #amm uninitiated. https://t.co/jDOEEnqKyd

carlcarrie's tweet photo. #AMM Paper with some embedded Python source:

Market making model analysis in High Frequency Trading for the north American stock
market

A simple approach without performance analysis, but still a good read for the #hft and #amm uninitiated.

https://t.co/jDOEEnqKyd https://t.co/irthFcWvSm

0

76

6

89

7K

vancluse retweeted

Valeriy M., PhD, MBA, CQF

@predict_addict

over 2 years ago

RIP Monte Carlo @GoogleDeepMind releases the code for Conformal Monte Carlo. https://t.co/1c1jj1AXRo #conformalprediction

predict_addict's tweet photo. RIP Monte Carlo @GoogleDeepMind releases the code for Conformal Monte Carlo.

https://t.co/1c1jj1AXRo

#conformalprediction https://t.co/Odh4IpKpY0

8

1K

249

1K

203K

vancluse retweeted

Ralph Sueppel

@macro_synergy

over 2 years ago

"A Formalized Approach to Validation of Parametric Quantitative Trading Models": "Parametric trading models represent mathematical operators that act upon time series of features and a set of parameters to generate trading signals." https://t.co/Z21lTfABS8

macro_synergy's tweet photo. "A Formalized Approach to Validation of Parametric Quantitative Trading Models": "Parametric trading models represent mathematical operators that act upon time series of features and a set of parameters to generate trading signals." https://t.co/Z21lTfABS8 https://t.co/ezbG6ENRj5

2

77

11

76

9K

vancluse retweeted

Ralph Sueppel

@macro_synergy

over 2 years ago

Post & Python: "Wavelet transform provides a unique lens to analyze stock market data, balancing both time and frequency insights... capturing both its large trends and minute fluctuations." https://t.co/hfG3gI08MX

macro_synergy's tweet photo. Post & Python: "Wavelet transform provides a unique lens to analyze stock market data, balancing both time and frequency insights... capturing both its large trends and minute fluctuations." https://t.co/hfG3gI08MX https://t.co/DfQUyMVovH

0

68

7

71

9K

you are not prepared

@vancluse

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users