Want to speed up your Pandas code by 10-1000x?
With no code change?
The folks from @nvidia have created cuDF pandas accelerator mode. By using this line in Jupyter, you automatically leverage your GPU to run Pandas code:
%load_ext cudf.pandas
From command-line:
python -m cudf.pandas https://t.co/E9XbowBV0h
(It also accelerates 3rd party libraries that leverage Pandas.)
Check out this demo notebook: https://t.co/mS5cTK9Rve
I've had a chance to use the pre-release version and am very impressed!
@scottedgar They installed one of these on the bike route near my house 3 weeks ago - and its already seriously beat up, looks like its been hit by cars multiple times.
Better the barrier than a kid on a bike though, and I’m glad it’s there.
In #recsys you might want to predict different types of user events (click, like, share, purchase).
In this post we introduce multi-task learning (MTL) for ranking models and how to easily build and train such models using #Merlin and #tensorflow.
https://t.co/pMMNRt3Ig6
Love to see the new RAPIDS release 23.02 out now! Awesome improvements for anyone who wants their data science code to run faster in Python (https://t.co/xWZ5t28J8q). We also overhauled the https://t.co/tCqOh7ONC2 site. Hopefully it's much easier to find things now!
An in-depth comparison of using
• Petastorm
• TensorFlow
• Merlin Dataloader
for loading data on DataBricks!
Merlin is 10x - 1000x faster 🙂
Thank you so much @Andompesta90 for this fascinating study!
Read more here: https://t.co/p3tL2WhWSo
For all my 🇺🇸 friends getting over their 🦃hangovers here's a black friday blog for you. Ever wanted to scale your recommender system to the Terabyte level. Check out this amazing work by Hao Wu and Deya Fu on the Merlin Distributed Embeddings library: https://t.co/QdC378diz9
Merlin Dataloader is 119x faster than my own PyTorch Dataset + Dataloader combo!
This is revolutionary for tabular data 🥳
Let's take a closer look at what is going on.
I am launching a new blog -- TabularMusings 🥳
Here is the first blog post: https://t.co/J8EaO5uXUM
And here is the technology I am using and the reasons for starting the blog:
1/ Expensive A100 GPUs being underutilized due to CPU bottlenecks? 💰
TensorRT speedup being held back by a busy Python thread? 🐍
Learn more about our journey getting a 20-40% boost by removing CPU as a bottleneck when applying LLM's to millions of pages in our web index. 🧵
Before I start a new project or build a large feature I like to spend some time outlining the announcement blog post/talk abstract.
This helps me figure out exactly which bits I'm excited to tell folks about so I can focus my efforts there!
https://t.co/4pF30h0ZTO