๐ ๐ฉ๐ถ๐๐๐ฎ๐น ๐๐๐ถ๐ฑ๐ฒ ๐ข๐ป ๐๐ผ๐ ๐ง๐ผ ๐๐ต๐ผ๐ผ๐๐ฒ ๐ง๐ต๐ฒ ๐ฅ๐ถ๐ด๐ต๐ ๐๐ฎ๐๐ฎ๐ฏ๐ฎ๐๐ฒ
Choosing the right data store can be confusing with so many options around. This diagram shows a selection choice for a datastore based on a use case (๐ฆ๐ค๐ ๐๐ ๐ก๐ผ๐ฆ๐ค๐).
Data can be ๐๐๐ฟ๐๐ฐ๐๐๐ฟ๐ฒ๐ฑ (๐ฆ๐ค๐ ๐๐ฎ๐ฏ๐น๐ฒ ๐๐ฐ๐ต๐ฒ๐บ๐ฎ), ๐๐ฒ๐บ๐ถ-๐๐๐ฟ๐๐ฐ๐๐๐ฟ๐ฒ๐ฑ (๐๐ฆ๐ข๐ก, ๐ซ๐ ๐, ๐ฒ๐๐ฐ.), ๐ฎ๐ป๐ฑ ๐๐ป๐๐๐ฟ๐๐ฐ๐๐๐ฟ๐ฒ๐ฑ (๐๐น๐ผ๐ฏ). In the case of structure, they can be relational or columnar, while in the case of semi-structured, there is a wide range of possibilities, from key-value to graph.
Credits Satish Chandra Gupta.
Back to you, which database have you used for which workload?
Check the full source in the comments.
_______
If you like my posts, please follow me, @milan_milanovic, and hit the ๐ on my profile to get a notification for all my new posts.
Learn something new every day ๐!
#SQL #NoSQL #Data #Database #AWS
How do you build a ๐๐๐ ๐ฏ๐ฎ๐๐ฒ๐ฑ ๐๐ต๐ฎ๐๐ฏ๐ผ๐ ๐๐ผ ๐พ๐๐ฒ๐ฟ๐ ๐๐ผ๐๐ฟ ๐ฃ๐ฟ๐ถ๐๐ฎ๐๐ฒ ๐๐ป๐ผ๐๐น๐ฒ๐ฑ๐ด๐ฒ ๐๐ฎ๐๐ฒ?
Letโs find out.
First step is to store the knowledge of your internal documents in a format that is suitable for querying. We do so by embedding it using an embedding model:
๐ญ: Split text corpus of the entire knowledge base into chunks - a chunk will represent a single piece of context available to be queried. Data of interest can be from multiple sources, e.g. Documentation in Confluence supplemented by PDF reports.
๐ฎ: Use the Embedding Model to transform each of the chunks into a vector embedding.
๐ฏ: Store all vector embeddings in a Vector Database.
๐ฐ: Save text that represents each of the embeddings separately together with the pointer to the embedding (we will need this later).
Next we can start constructing the answer to a question/query of interest:
๐ฑ: Embed a question/query you want to ask using the same Embedding Model that was used to embed the knowledge base itself.
๐ฒ: Use the resulting Vector Embedding to run a query against the index in the Vector Database. Choose how many vectors you want to retrieve from the Vector Database - it will equal the amount of context you will be retrieving and eventually using for answering the query question.
๐ณ: Vector DB performs an Approximate Nearest Neighbour (ANN) search for the provided vector embedding against the index and returns previously chosen amount of context vectors. The procedure returns vectors that are most similar in a given Embedding/Latent space.ย
๐ด: Map the returned Vector Embeddings to the text chunks that represent them.
๐ต: Pass a question together with the retrieved context text chunks to the LLM via prompt. Instruct the LLM to only use the provided context to answer the given question. This does not mean that no Prompt Engineering will be needed - you will want to ensure that the answers returned by LLM fall into expected boundaries, e.g. if there is no data in the retrieved context that could be used make sure that no made up answer is provided.
To make it a real Chatbot - face the entire application with a Web UI that exposes a text input box to act as a chat interface. After running the provided question through steps 1. to 9. - return and display the generated answer. This is how most of the chatbots that are based on a single or multiple internal knowledge base sources are actually built nowadays.
We will build such a chatbot as an upcoming hands on SwirlAI Newsletter series so stay tuned in!
--------
Follow me to upskill in #MLOps, #MachineLearning, #DataEngineering, #DataScience and overall #Data space.
Also hit ๐to stay notified about new content.
๐๐ผ๐ปโ๐ ๐ณ๐ผ๐ฟ๐ด๐ฒ๐ ๐๐ผ ๐น๐ถ๐ธ๐ฒ ๐, ๐๐ต๐ฎ๐ฟ๐ฒ ๐ฎ๐ป๐ฑ ๐ฐ๐ผ๐บ๐บ๐ฒ๐ป๐!
Join a growing community of Data Professionals by subscribing to my ๐ก๐ฒ๐๐๐น๐ฒ๐๐๐ฒ๐ฟ: https://t.co/qgNCnGtF4A
AWS Config can:
๐ฅ Record all of the configuration data that runs through the system.
๐ฅ Build rules to help us ensure compliance.
As a bare minimum, here are 12 recommended Config rules courtesy of cloud architect and security engineer @DonMagee. https://t.co/4fakx6TQRI