Observers: A Lightweight SDK for AI Observability
TLDR;
- Track and record interactions with AI models
- Store observations in multiple backends @huggingface, @argilla_io@duckdb
- Query and analyse your AI interactions with ease
GitHub:
https://t.co/F2F6aoAgRe
We're turning @huggingface Hub's files into content-defined chunks to speed up your workflows!โก๏ธ
This means:
- ๐ง We store your file as deduplicated chunks
- โฉ You only upload changed chunks when iterating!
- ๐ Pulling changes? Only download changed chunks!
@huggingface In our benchmarks, we found that using CDC to store iterative model and dataset version led to transfer speedups of ~2x - we'd love to learn about more real world examples to see how we perform!
Super excited to introduce Halo: A beginner's guide to DIY health tracking with wearables! ๐คโจ
Using an $11 smart ring, I'll show you how to build your own private health monitoring app. From basic metrics to advanced features like:
- Activity tracking
- Heart rate monitoring
- Sleep analysis
- and more!
The @huggingface SQL Console now has Embeds!
๐ Nice URL to Share / Save your Query Results
๐ผ๏ธ Embed Results into Web Pages via IFrame
In this example, I use handy DuckDB regex functions to find the Code Feedback conversations with the most markdown code blocks
How do you release an impactful dataset on the @huggingface Hub?
We're enhancing how we track dataset downloads on the Hub, so I wanted to share some common themes I've noticed for datasets with high downloads. ๐งต
Glad to see the @OpenSourceOrg release their OSAI definition process after an extensive collaborative process, and especially happy to see the role of training data enshrined!
Head over to the OSI HF org page if you want to discuss the definition on @huggingface ๐ค
1/2๐งต
@ExceptionAtAll@lvwerra@EugeneVinitsky We're updating the Hugging Face backend right now to address Git LFS limits!
If you have patterns you'd like to see addressed, reach out or start a discussion on our team page:
๐ https://t.co/myfPmebFXW
๐ ๐๐๐๐๐๐๐๐๐ก: a new open-source library from the Gradio team ๐
This library is a product of our collaboration with @TrailOfBits and allows you to make asynchronous GET requests while avoiding Server Side Request Forgery.
A ๐งต on why this is important!
The source code for HFChat macOS๐คis now fully open source and accepting PRs! Looking forward to see what folks will build.
You'll also find some hidden features that never made it to the release: https://t.co/804uBWU1Nm
Did you know that you can load the newest checkpoints (like Llama 3.2) into Keras directly from the original HuggingFace release (safetensors)?
I tried - and lived to tell the tale: https://t.co/o3yvGYpXjq
What a great day for Open Science! @AIatMeta released models, datasets, and code for many of its research artefacts! ๐ฅ
> Meta Segment Anything Model 2.1: An updated checkpoint with improved results on visually similar objects, small objects and occlusion handling. A new developer suite will be added to make it easier for developers to build with SAM 2.
Model checkpoints: https://t.co/FWx9qs4nwR
> Layer Skip: Inference code and fine-tuned checkpoints demonstrating a new method for enhancing LLM performance.
Model checkpoints: https://t.co/x1k7gcZxgX
> SALSA: New code enables researchers to benchmark AI-based attacks to validate security for post-quantum cryptography.
Repo: https://t.co/AKqUjCVZrT
> Meta Lingua: A lightweight and self-contained codebase designed to train language models at scale.
Repo: https://t.co/BIw2zMeaw4
> Meta Open Materials: New open source models and the largest dataset to accelerate AI-driven discovery of new inorganic materials.
Model checkpoints: https://t.co/KS3eVdRmTn
> MEXMA: A new research paper and code for our novel pre-trained cross-lingual sentence encoder covering 80 languages.
Model checkpoint: https://t.co/3IFeupCXJM
> Self-Taught Evaluator: a new method for generating synthetic preference data to train reward models without relying on human annotations.
Model checkpoint: https://t.co/j09g0COe7r
> Meta Spirit LM: An open-source language model for seamless speech and text integration.
Repo: https://t.co/hoERsWT7FD
@huggingface Views like this help us understand real-world access patterns so we can architect a more efficient, geo-distributed system for the Hub's storage backend. What else should we be looking at?
Did you know that @Huggingface Hub holds over 29 PB of Git LFS files across datasets, models, and spaces? ๐
That's the equivalent of 64 @CommonCrawl downloads - and it's growing every day. So what's inside? ๐งต