🎉 Excited to release Selectra (Spanish Electra), a new set of models on the @huggingface Hub
3-5x times smaller than current SOTA Spanish models while achieving competitive results
🧵Overview below (1/4)
Thanks @GoogleAI TPU RC for their support
#python#opensource#nlproc
🥳 We're extremely excited to announce we're now Argilla
Please don't forget to follow us @argilla_io.
There are many more exciting things coming up!
Read more at: https://t.co/JiCU7iSOJb
#python#opensource#nlproc
Get started with NLP with custom datasets
Create and label datasets for text classification, token classification and text generation
https://t.co/VlMC8wkqHB
#python#opensource#datascience
Don't have a lot of time to annotate data?
SetFit + Rubrix, few-shot classification with custom data 🤓
https://t.co/U1FEM5zn9j
#nlproc#datascience#opensource
⚡ New release 0.18.0
> Better token classification validation
> Delete records by id & query for better dataset management
> New tutorials!
Thanks to our community contributors @AnkushChander, Tom Aarsen, & others
https://t.co/FmBpyHdGKk
#python#nlproc#opensource
SetFit: Efficient few-shot learning with Sentence Transformers
So exciting!
Train robust models with very few examples, fast training, fast inference, and comparable/better than other LLMs and prompt-based methods.
https://t.co/sbFh3EcSK3
#python#opensource#NLProc
Active learning for text classification with @rubrixml and the wonderful small-text library by @webis_de
Learn how to build a custom active learning loop and teach a 🤗 transformers model
https://t.co/iux9l0rp7M
#python#opensource#NLProc
Want to analyze prediction explanations from your Transformer models? At the dataset level?
A new tutorial using SHAP and Transformers interpret!
https://t.co/3wP3B0MaCK
#python#opensource#xai
humap: Hierarchical Uniform Manifold Approximation and Projection
A very cool method and library by @EstecioJunior
Reduces visual burden when exploring clusters in large datasets and enables drill-down with hierarchical levels
https://t.co/zx0LNUCOTh
#python#opensource#umap
Rubrix: the open-source framework for data-centric NLP
Build human-in-the-loop workflows for data annotation, monitoring, and review.
https://t.co/DOrtloj95f
Follow @rubrixml for updates
#python#nlp#opensource
What can we learn from model predictions vs. training data labels?
* Ambiguous examples
* (Some) wrong labels
* Model improvement patterns
A reproducible example using the @stanfordnlp sentiment treebank dataset & @rubrixml
https://t.co/fMRiIw2MB9
#python#opensource#NLProc
Weak supervision for multilabel text classification.
Get instant statistics about heuristics' coverage and precision with @rubrixml UI
Define rules programmatically with Python
Tutorial:
https://t.co/ovPVLLcyMT
#opensource#datacentricai#python
Every good model starts with good quality datasets.
Iteration and collaboration are key ingredients to achieve this.
Here's how you can iterate on data and models using the Hugging Face Hub.
https://t.co/agDomR7bnM
#nlproc#datascience#opensource
Fine-tuning a sentiment classifier starting with no labeled data with @rubrixml
https://t.co/mBFZrqA8jw
Follow @rubrixml for more resources like this one
If you love NLP & open-source join our friendly community:
https://t.co/O0jE08KGdX
#python#opensource#nlp#transformers
BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision
1️⃣ Noisy labels using Wikidata and gazetteers (distant labels)
2️⃣ Fine-tune Roberta for NER with distant labels
3️⃣ Self-training
https://t.co/q17JEAWcLl
https://t.co/Cp0Kfba9WN
#python#NLProc
Stanza by @stanfordnlp is powerful for NER
Want to see how well it performs with your data? 👇
https://t.co/fhSzDk90HP
New to @rubrixml?
https://t.co/DOrtloiBfH
Join the community:
https://t.co/O0jE08KGdX
#python#nlp#opensource#datascience