With Haystack it is very easy to make use of the power of @OpenAI GPT 3 over any document base - and also compare its performance to open source models available from @huggingface by just changing a line. Read more how to set-up such a pipeline 👇
https://t.co/vKzrXaL1zq
🧵 Generative models have taken the world of NLP by storm. But LLMs do not know about your personal data. This makes personal assistants, enterprise knowledge management and many other applications challenging. Retrieval augmented pipelines are the answer 👇
#nlp#llm
Big news 🎉 Announcing deepset Cloud early access! But also - a funding round led by @GVteam! More here https://t.co/P5pUwolA04 Thanks everyone who helped us to achieve this - our community, our customers, our partners, friends and families🖖 #machinelearning#opensource#nlproc
@srchvrs @stefan_it_ @BramVanroy @deepset_ai Bram and me had trouble with both training from scratch and finetuning to downstream tasks. But there are of course Albert models that are fine-tuned on downstream and work well... For fine-tuning on downstream you can of course iterate much faster on different hyperparameters.
@saitej786@deepset_ai@philipvollet Hey Sai, I guess you mean a tutorial for the extraction of tables from PDF?
We don't yet have a dedicated tutorial for this, but this test shows you how to use the TableExtractor Parsr (we also have a connector to Azure in the test above): https://t.co/S83wxx9Yd6
1/3 Big news!! After more than 750 pull requests from 89 contributors and 19 months since our first release in May 2020 we are happy to announce the 1.0 release of #Haystack! 🎉🎁🖖 https://t.co/gUEVZISoko (release notes here: https://t.co/UJCsHD1Wgb)
love seeing the innovation of deepset on top of @elastic, its already very exciting, still more work left on our end to make Elasticsearch and Lucene even better for such use cases
@Nils_Reimers@huggingface Congratulations Nils! Looking forward seeing some well performing sentence-transformers with native HF transformers support. And much more on the IR front!
@OS101Series@rusic_milos@deepset_ai Proud to have Milos share some of our insights into open source. We are excited to combine what is good for everybody (access to latest technology) with a profitable business model for ourselves. Win win
We're excited to have Milos Rusic (@rusic_milos), CEO of @deepset_ai, presenting "Building a Machine Learning Company Around an Open Source Project - Insights into Strategy, Culture and Process" at #OS101! https://t.co/dvHuUp9QLz
Optimist: AI has achieved human-level performance!
Realist: “AI” is a collection of brittle hacks that, under very specific circumstances, mimic the surface appearance of intelligence.
Pessimist: AI has achieved human-level performance.
Happy to announce our new SOTA German BERT and ELECTRA language models! Trained together with Stefan Schweter and already available on @huggingface's model hub:
👉 deepset/gbert-base
👉 deepset/gbert-large
👉 deepset/gelectra-base
👉 deepset/gelectra-large
(1/2)
Thrilled to release Haystack 0.3.0!
- Dense Passage Retrieval
- Evaluation of the whole Retriever-Reader-Pipeline
- Indexing of PDF / Docx
- Better integration with transformers & @huggingface model hub
- More #QA Models
...
👉 https://t.co/z3COprIKwc
#NLP#QuestionAnswering
Excited to release #Haystack incl. core features for a practical #QA system:
📈 Scalable backend (Elasticsearch)
🚀 Fast Retrievers (BM25, Embeddings ...)
👓 Flexible Readers (@huggingface's Transformers / FARM)
🔄 API for Inference & Feedback
👉🏻 Code: https://t.co/bwBzWKUmvd
@seb_ruder - arabic - SOQAL: https://t.co/IYJAuVdTqb
- MLQA also open sourced their automatically translated train and test sets: https://t.co/WakyLjHSY3
We are currently working on an overview over non-English QA datasets. Will let you know once it is finished
@dl_weekly Hey, we would love our framework FARM featured in your newsletter. The framework makes Transfer Learning in NLP easy - you can find it here: https://t.co/zvAY6S3UfD
FARM lets non-NLP experts create PoCs with their own data and showcase it to their colleagues.
@javifreemind @seb_ruder@huggingface@explosion_ai@deepset_ai Our model already converged to decent performance after 50k steps and batch size 1024 - we observed a loss curve flattening out. You could try training locally for a week and see if the same happens to your Spanish model's loss. Good luck : )
@javifreemind @seb_ruder@huggingface@explosion_ai@deepset_ai I think the major difficulty will be the batch size with "only" 12GB of RAM in your 1080ti. The google-ai Bert code for training from scratch doesn't have error aggregation over several batches. You should try with the highest batch size possible and max_seq_length 128.