Markus Stoll @DocBrownMS - Twitter Profile

over 2 years ago

Visualize RAG Data with Renumics-RAG — A Q&amp;A GUI with Interactive Data Exploration by @DocBrownMS at #ITNEXT. #largelanguagemodels #machinelearning #retrievalaugmented #datascience #python https://t.co/k5ezTKRJtR

0

3

2

0

161

DocBrownMS retweeted

Towards Data Science

@TDataScience

over 2 years ago

How to use UMAP dimensionality reduction for Embeddings to show multiple evaluation Questions and their relationships to source documents with Ragas, OpenAI, Langchain and ChromaDB by @DocBrownMS https://t.co/QhmQzTof2g

0

25

6

10

4K

DocBrownMS retweeted

Towards Data Science

@TDataScience

over 2 years ago

Visualize your RAG Data — Evaluate your Retrieval-Augmented Generation System with Ragas by @DocBrownMS https://t.co/QhmQzTof2g

0

126

29

70

13K

DocBrownMS retweeted

Leonie

@helloiamleonie

over 2 years ago

Advanced Retrieval-Augmented Generation (RAG) techniques address the limitations of naive RAG pipelines. A recent survey on RAG classifies advanced RAG techniques into pre-retrieval, retrieval, and post-retrieval optimizations. 🔗 Paper: https://t.co/dWkf0Uc587 My latest article gives an overview of advanced RAG techniques: 🦙 Pre-retrieval includes techniques like sliding windows, enhancing data granularity, adding metadata, or optimizing index structures, such as sentence window retrieval. 🦙 Retrieval includes optimizing the embedding models (e.g., fine-tuning) or advanced retrieval techniques like hybrid search 🦙 Post-retrieval includes reranking or prompt compression. We also implement a naive RAG pipeline using @llama_index and then enhance it to an advanced RAG pipeline using the following: • Sentence window retrieval (as a pre-retrieval optimization) • Hybrid search (as a retrieval optimization) • Re-ranking (as a post-retrieval optimization) 💻 Jupyter Notebooks: https://t.co/MFiz00RQHb Read more on @TDataScience: https://t.co/zgD02G1Rn7

helloiamleonie's tweet photo. Advanced Retrieval-Augmented Generation (RAG) techniques address the limitations of naive RAG pipelines.

A recent survey on RAG classifies advanced RAG techniques into pre-retrieval, retrieval, and post-retrieval optimizations.
🔗 Paper: https://t.co/dWkf0Uc587

My latest article gives an overview of advanced RAG techniques:

🦙 Pre-retrieval includes techniques like sliding windows, enhancing data granularity, adding metadata, or optimizing index structures, such as sentence window retrieval.

🦙 Retrieval includes optimizing the embedding models (e.g., fine-tuning) or advanced retrieval techniques like hybrid search

🦙 Post-retrieval includes reranking or prompt compression.

We also implement a naive RAG pipeline using @llama_index and then enhance it to an advanced RAG pipeline using the following:
• Sentence window retrieval (as a pre-retrieval optimization)
• Hybrid search (as a retrieval optimization)
• Re-ranking (as a post-retrieval optimization)

💻 Jupyter Notebooks: https://t.co/MFiz00RQHb

Read more on @TDataScience: https://t.co/zgD02G1Rn7

12

750

130

744

117K

DocBrownMS retweeted

Lior Alexander

@LiorOnAI

over 2 years ago

Game changer. You can now visualize your RAG Data. See how questions, answers, and sources are related. The animation below shows the UMAP of the embeddings of document snippets, colored by their relevance to the question "Who built the Nürburgring?" UMAP is dimensionality reduction techniques that transforms complex, high-dimensional data into a clear and interactive 2D map. It can also be used for debugging and improving the performance of your RAG models.

7

661

127

618

83K

DocBrownMS retweeted

ITNEXT @ITNEXT_io

over 2 years ago

Visualize your RAG Data — EDA for Retrieval-Augmented Generation by @DocBrownMS at #ITNEXT. #artificialintelligence #machinelearning #largelanguagemodels #nlp #chatgpt https://t.co/tAG2xRDEuF

0

5

2

0

144

DocBrownMS retweeted

Philipp Schmid

@_philschmid

over 2 years ago

Use big models to specialize small models! Thats the way. 💫 🚀 7B Text-to-SQL model outperforms @OpenAI GPT-4 (Turbo)! @defogdata released a new fine-tuned @AIatMeta Code Llama 7B model outperforming the latest GPT-4 & GPT-4 Turbo Models! The team boosted the 7B performance by leveraging distillation from a fine-tuned 70B Code LLama model! 👨‍🏫 Models can be commercially used (CC-by-SA-4.0) and are available on @huggingface 👉 https://t.co/O0zglaYd4t @rishdotblog mentioned they are now pushing for improved performance on joins and group-bys! 🤌🏻 Big Shoutout to Defog for pushing open Code Models! 🧑🏻‍💻

_philschmid's tweet photo. Use big models to specialize small models! Thats the way. 💫 🚀

7B Text-to-SQL model outperforms @OpenAI GPT-4 (Turbo)! @defogdata released a new fine-tuned @AIatMeta Code Llama 7B model outperforming the latest GPT-4 & GPT-4 Turbo Models!
The team boosted the 7B performance by leveraging distillation from a fine-tuned 70B Code LLama model! 👨‍🏫

Models can be commercially used (CC-by-SA-4.0) and are available on @huggingface
👉 https://t.co/O0zglaYd4t

@rishdotblog mentioned they are now pushing for improved performance on joins and group-bys! 🤌🏻 Big Shoutout to Defog for pushing open Code Models! 🧑🏻‍💻

4

182

45

116

27K

DocBrownMS retweeted

ITNEXT @ITNEXT_io

over 2 years ago

How to Explore and Visualize ML-Data for Object Detection in Images by @DocBrownMS at #ITNEXT. #datascience #artificialintelligence #machinelearning #ai #datavisualization https://t.co/UxfGRF0to1

0

3

2

0

135

DocBrownMS retweeted

ITNEXT @ITNEXT_io

over 2 years ago

How to build an interactive Hugging Face Space for an Image Dataset by @DocBrownMS at #ITNEXT. #artificialintelligence #datavisualization #datascience #python #machinelearning https://t.co/XUNU185zhb

0

3

1

0

107

DocBrownMS retweeted

renumics @renumics

over 2 years ago

Spotlight, Spotlight Pro, and Spotlight API Docs 1.5.5 have been released at https://t.co/qW49pbV0SF Features: - Transmit errors via websockets and display them - Disable caching of frontend files. - Rouge score lens. - Toggle between continuous and discrete coloring - Rebuild old-style H5 datasets using dataset.rebuild(). - Apply filters on the confusion matrix widget. - Utilize Mel scale for spectrogram visualization. Bug Fixes: -Ensure browser always opens on localhost. -Adjust color scaling for spectrogram decibel levels. -Prevent failure when no simple converter is available -Explicitly fail when data source is not supported. -Improve appearance of ns-datetimes. -Avoid reusing the viewer when a new port is assigned.

0

4

1

0

109

DocBrownMS retweeted

ITNEXT @ITNEXT_io

over 2 years ago

Hacktoberfest Machine Learning Projects for JS/TS Developers by @DocBrownMS at #ITNEXT. #typescript #machinelearning #datascience #javascript #opensource https://t.co/fLt2L1lOWa

1

3

0

294

DocBrownMS retweeted

ITNEXT @ITNEXT_io

over 2 years ago

Fine-tuning image classification models from image search by @DocBrownMS at #ITNEXT. #datascience #python #datavisualization #startup #machinelearning https://t.co/lIJ8xTxT7D

0

2

3

0

220

Markus Stoll @DocBrownMS

over 2 years ago

@skeptrune @renumics @huggingface @CleanlabAI With PCA from the scikit-learn package and an additional Procrustes Analysis [3] from the SciPy package. This enables smoother transitions in the animation. More details and source code in https://t.co/wNNkCgKxpb

1

0

79

DocBrownMS retweeted

ITNEXT @ITNEXT_io

almost 3 years ago

How to navigate issues in your machine learning image data by @DocBrownMS at #ITNEXT. #artificialintelligence #datavisualization #datascience #computervision #machinelearning https://t.co/Qj5dWpxctR

0

2

1

0

159

DocBrownMS retweeted

AI News Clips by Morris Lee: News to help your R&D @morris_phd

almost 3 years ago

Twitter https://t.co/i81NtdQAjP Renumics Spotlight interactively explore dataframe.GitHub https://t.co/MSgypQhfzS Newsletter https://t.co/lLfwtmvXkM More https://t.co/yFb3Ds4tXm LinkedIn https://t.co/FC5hpfOlxr #AINewsClips #AI #ML #ArtificialIntelligence #MachineLearning #Pandas