Andrea Soria Jimenez @andrejanysa - Twitter Profile

andrejanysa retweeted

merve

@mervenoyann

12 months ago

Dataset Viewer for PDFs just landed on @huggingface 🤗 check all the document datasets on Hub🤝

5

136

11

49

7K

Andrea Soria Jimenez @andrejanysa

12 months ago

📄 New on Hugging Face Hub: native PDF dataset support! You can now render PDFs directly in the Dataset Viewer — with thumbnails, in-browser previews, and full integration with datasets + pdfplumber. Perfect for document-based ML workflows → https://t.co/BXQygtnHVZ

andrejanysa's tweet photo. 📄 New on Hugging Face Hub: native PDF dataset support!

You can now render PDFs directly in the Dataset Viewer — with thumbnails, in-browser previews, and full integration with datasets + pdfplumber.

Perfect for document-based ML workflows →
https://t.co/BXQygtnHVZ https://t.co/YkRIgH4Gy7

1

6

3

1

841

Andrea Soria Jimenez @andrejanysa

over 1 year ago

@huggingface Link to the library: https://t.co/WLEAT1WP5r

0

2

0

2

60

Andrea Soria Jimenez @andrejanysa

over 1 year ago

🚀 Synthetic data is revolutionizing AI & ML! DataDreamer, an open-source Python library, makes generating synthetic data seamless & integrates effortlessly with @huggingface . Easily push datasets to the Hub and share them with the community 🔍 Learn how: https://t.co/oyZ6bpqXU1

andrejanysa's tweet photo. 🚀 Synthetic data is revolutionizing AI & ML!
DataDreamer, an open-source Python library, makes generating synthetic data seamless & integrates effortlessly with @huggingface . Easily push datasets to the Hub and share them with the community
🔍 Learn how: https://t.co/oyZ6bpqXU1 https://t.co/GTslG0M4EP

1

28

9

24

2K

Who to follow

Ingeniero electrónico, papá, hijo

Jannett I.

@IbaezJannett

#mama 2 human@s 2 perrhij@s 1 gathijo 2 gatnietas 1 perrnieto #vegetarian ex carnívora anónima hacia #veg. #womanintech #openknowledge #guardianadelatierra

andrejanysa retweeted

Daniel van Strien

@vanstriendaniel

over 1 year ago

You only need a few extra lines to write generated datasets directly to the @huggingface Hub.

8

42

4

24

21K

andrejanysa retweeted

Quentin Lhoest 🤗 @lhoestq

over 1 year ago

Hugging Face is now officially in the pandas Ecosystem page 🎉 Let me know what you'd like to see next for HF + pandas

3

194

23

45

20K

Andrea Soria Jimenez @andrejanysa

over 1 year ago

@huggingface 📚 Quick Tutorial: https://t.co/czytFuUI6h

0

10

1

4

238

Andrea Soria Jimenez @andrejanysa

over 1 year ago

Synthetic data generation has never been easier! 🎉 Generate structured output effortlessly with #fastdata and @huggingface 🚀 Steps: 1️⃣ Define your schema 📝 2️⃣ Add a generation prompt 💡 3️⃣ Input your data 🔄 4️⃣ Share it freely on Hugging Face 🌍

andrejanysa's tweet photo. Synthetic data generation has never been easier! 🎉
Generate structured output effortlessly with #fastdata and @huggingface 🚀
Steps:
1️⃣ Define your schema 📝
2️⃣ Add a generation prompt 💡
3️⃣ Input your data 🔄
4️⃣ Share it freely on Hugging Face 🌍 https://t.co/3lykpxbBh8

3

123

15

92

6K

andrejanysa retweeted

Quentin Lhoest 🤗 @lhoestq

over 1 year ago

Damn this is cool Semantic operations for pandas dataframes using open models from @huggingface. Brought to you by @lianapatel_ and the LOTUS team at Stanford and Berkeley Semantic search, Group by topic, Top K semantic sorting etc. with LLama 3.3 70B

lhoestq's tweet photo. Damn this is cool

Semantic operations for pandas dataframes using open models from @huggingface. Brought to you by @lianapatel_ and the LOTUS team at Stanford and Berkeley

Semantic search, Group by topic, Top K semantic sorting etc. with LLama 3.3 70B https://t.co/3B0vEDInev

4

32

9

15

4K

andrejanysa retweeted

Quentin Lhoest 🤗 @lhoestq

over 1 year ago

🤗 Datasets 3.2 is out ! With faster Parquet streaming (up to +100% speed) and faster filtering via predicate pushdown ⚡ Example: fast streaming of recent FineWeb-2 data from @huggingface

lhoestq's tweet photo. 🤗 Datasets 3.2 is out !

With faster Parquet streaming (up to +100% speed) and faster filtering via predicate pushdown ⚡

Example: fast streaming of recent FineWeb-2 data from @huggingface https://t.co/yLgK6hjtkN

2

88

11

23

5K

andrejanysa retweeted

Quentin Lhoest 🤗 @lhoestq

over 1 year ago

Things are getting interesting 🤗✨👀

0

4

1

0

209

Andrea Soria Jimenez @andrejanysa

over 1 year ago

@huggingface has released a new feature that makes interacting with datasets even easier. 🌟 Introducing the #Text2SQL feature for the SQL Console – now you can talk to your dataset like never before! 🗣️💻

0

1

0

70

andrejanysa retweeted

Caleb

@calebfahlgren

over 1 year ago

The amazing, new Qwen2.5-Coder 32B model can now write SQL for any @huggingface dataset ✨

9

191

38

98

30K

Andrea Soria Jimenez @andrejanysa

over 1 year ago

Link to the repo: https://t.co/69aGvZvuVQ

0

11

Andrea Soria Jimenez @andrejanysa

over 1 year ago

🚀 Fastdata (by @answerdotai) + @huggingface: Synthetic Data Made Simple! 🤖📊 Generate data for deep learning 📜🛠️🎯 and push it directly to Hugging Face Hub 🌐. With Incremental Uploads, fastdata handles large-scale projects effortlessly!

andrejanysa's tweet photo. 🚀 Fastdata (by @answerdotai) + @huggingface: Synthetic Data Made Simple! 🤖📊

Generate data for deep learning 📜🛠️🎯 and push it directly to Hugging Face Hub 🌐. With Incremental Uploads, fastdata handles large-scale projects effortlessly! https://t.co/l2k6WVGerg

1

17

5

7

1K

Andrea Soria Jimenez @andrejanysa

over 1 year ago

✨ How it works: 1️⃣ Define your output schema 📜 2️⃣ Craft your data generation prompt 🛠️ 3️⃣ Prepare your inputs 🎯 4️⃣ Generate and push to Hugging Face Hub directly 🚀

2

0

40

Andrea Soria Jimenez @andrejanysa

over 1 year ago

💡 Pro Tip: With Incremental Uploads, fastdata can automatically push updates to the Hub every N minutes, making it perfect for large-scale synthetic data projects.

0

2

0

36

andrejanysa retweeted

Quentin Lhoest 🤗 @lhoestq

over 1 year ago

My new app is out !! ✨The Common Crawl Pipeline Creator ✨ Create your pipeline easily: ✔Run Text Extraction✂️ ✔Define Language Filters🌐 ✔Customize text quality💯 ✔See Live Results👀 ✔Get Python code 🐍 Based on famous LLM research like Gopher, C4 or FineWeb

5

105

23

73

15K

andrejanysa retweeted

SomosNLP @SomosNLP_

over 1 year ago

🔥 Presentamos #LaLeadeboard, la primera leaderboard open-source para evaluar automáticamente #LLM en las variedades del español y lenguas oficiales de España y LATAM. https://t.co/EklRbCex8m

6

240

75

59

54K

andrejanysa retweeted

Clémentine Fourrier 🍊 is off till Dec 2026 (🪂) @clefourrier

over 1 year ago

There is now an LLM Leaderboard for one of the most spoken language worldwide: Spanish! 🚀 (+ Catalan, Basque and Galician) Congrats to @mariagrandury for setting it up, and to @SomosNLP_ for gathering super high quality datasets from many partners! https://t.co/UCZQo1S2gl

1

80

20

15

20K

Andrea Soria Jimenez

@andrejanysa

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users