A personalised newspaper, content curation and publishing platform. Find, publish and promote content to engage and grow your community or just read your news.
"Metadata marts could play a key role in making video data more accessible and structured for model training and analysis" - Simon Thelin (@synthesiaIO, creator of the DataPains blog) reviewed DataChain 👇
A small DataChain video on processing audio data from @huggingface with 🤗 models. We need more tools to do ETLs, analytics, governance, preparation for unstructured data at scale!
- stream files from tar or wds archives! 🤯
- enrich, prepare, version, publish datasets 🚀
- bonus! 🤗 is natively integrated like a storage provider!
1/N DataChain hit 2000 stars ⭐ on GitHub a week ago. Thanks for your interest and support 🤗 It was built to address those needs and pain points we saw in the DVC community when people have to deal with millions of files (e.g. images, pdfs, audio, etc).
DataChain is a modern Pythonic data-frame library to efficiently organize unstructured data.
I haven't tested but it looks really interesting especially because it supports multimodal data and cares about efficiency.
Datasets + LLMs + Pydantic = DataChain
...now with @huggingface !💛
DataChain by @DVCorg just added @huggingface support ! Create, Load, Transform HF Datasets with LLMs easily.
- Pydantic for dataset schema
- Use your own or public HF Datasets
- Run your own or public HF Models
🔬 LLM Project: Process PDFs at scale w/ DataChain & @UnstructuredIO
✂ Extract & parse text
⚙️Create vector embeddings
🚀Scale processing
🔄Version datasets
All in <70 lines of code! 🤯
Perfect if you're working w/ docs.
🎥 https://t.co/IDKWFZi4HZ
🦉Today we launch the DVC Extension for @code in @ProductHunt!
Join us in the celebration of a year's worth of improvements since the original release that turns your IDE into your own personal ML experimentation platform!
https://t.co/sXQYcQSp4u
🧵1/5
DVC 3.0 goes beyond the command line!
Introducing the DVC Stack! This release improves DVC's core versioning and experiments functionality and enables new workflows like model registry and cloud experiments, improving the end-to-end model development journey!
🧵1/7
@hafelg @paper_li @TwtTimes@boldakov hey, Twitter disabled their API (made it a paid feature and expensive), since the service was free we are not able to maintain and run it anymore, unfortunately
Woah! Been here? Is deep learning model training going horribly wrong? 🙋🏽♂️
Iterative Studio makes this easy to see so you don't waste time and resources!
🧵 1/7
I love how all these new tools and technologies come nicely together. @flydotio@streamlit and @modal_labs (and our open source MLEM to package a model and deploy) ...
... all of them being extremely easy to use + serverless, even GPUs.
The future is here 🚀🚀🚀