Neum RAG analytics provide visibility into the state of files within the pipeline as well as auditability of actions like searches. Check out a quick walkthrough: https://t.co/YnTislJsgN
Observability and auditing over your RAG pipeline is crucial. We have added several APIs to provide this visibility. We will be including all these metrics as part of the Neum Dashboard shortly, but in the meantime, you can play with them here: https://t.co/EHkkAnuc8q https://t.co/EHkkAnuc8q
Neum AI compatibility with Langchain and LlamaIndex
To provide a wide range of options to our developers, @neum_ai is adding helpers to help bridge the gap between @langchain and @llama_index provided document transformers and Neum AI. These helpers, translate Document objects that are outputted by document transformers in both frameworks into NeumDocument objects which are used within the Neum AI framework.
We want to provide flexibility to developers to be able to leverage many of the existing data connectors that have been built by the community across both of those frameworks while taking advantage of the scalable architecture provided by the Neum AI framework for generating vector embeddings and storing them.
How does it work?
To leverage the helpers, we first need to install the neumai-tools package (https://t.co/0dLcqndvaU).
Then we can leverage existing document transformers and translate their results.
For example for Langchain TextLoader:
or for LlamaIndex SimpleDirectoryReader
The document_transformer helpers can also be used with outputs from chunkers across both frameworks.
Once we have a NeumDocument object, we can connect it to Neum AI embed and sink connectors to paralleize the generation of vector embeddings and their storing actions.
We continue to recommend the usage of Neum Data Connectors, Loaders and Chunkers where possible to get the best experience including configuration validations, full and delta sync support and rich metadata tracking along the way. But we understand that developers might need data source that we are yet to support.
Caveats
- document_transformer helpers are currently in beta.
- They are not currently supported by the Neum cloud offering. (We hope to add this support soon!)
- Some of the Neum AI constructs like Pipelines, SourceConnectors don't support Langchain and LlamaIndex loaders.
Resources
If you want to read more, check out our documentation:
- Repo: https://t.co/dT0NNJ1qmn
- Langchain: https://t.co/HIzo2asNfD
- LlamaIndex: https://t.co/RnPUawRBXJ
@neum_ai is built from the ground up to scale. We recently open-sourced a sample architecture that uses the Neum framework in conjunction with Celery and @Redisinc to parallelize workloads and scale the throughput.
https://t.co/Dkssgn2ymo
Evaluating RAG pipelines is crucial to build production ready applications. Today, we are showcasing tools built into @neum_ai to run evaluations.
https://t.co/u6xmGQdyta
How-to open source a project - Python SDK
As @ddematheu and I worked on @neum_ai during our @ycombinator S23 Batch we knew we had to create an amazing developer experience in this new AI world. We wanted to make it seamless and easy to use. At first, we tried different ideas and we floated the idea of Open Sourcing but never actually got to it.. “Oh what will they say about our code”, “unit tests? lol”, “What’s the process of doing it, seems complicated”. And as such, we went with our UX and “closed-source” platform.
While we got some customers that helped shape the platform and drove feature requests, we knew we were missing something. We weren’t targeting the developers and getting more of the feedback we wanted. We needed to change something.
Enter open source. Though we've used countless of open-source projects in the past, we never really jumped in and created one because of all of the questions above and more, sort of like an impostor syndrome for it. Until two weeks ago when we sat down and did it. Immediately people asked us “how did you guys do it”, “what’s the best way to open-source a project”, “how can I show it to the world”. And, as trite as it may sound, the answer boils down to just doing it.
In this post I’d like to show how easy it is, stuff to watch out for if building a python SDK, and encourage you to do it!
--
- Mise en place -
Mise en place, or “putting in place”, is a French term used by chefs when they refer to the setup that is needed before cooking. When developing a Python SDK that will be open source we also need to gather and put in place some requirements beforehand. While these are obvious, it’s good to list them.
- @github account - duh
- @pypi account - (pip install… anyone?)
- General structure of the project, reviewing other open-source python SDKs for best practices. You can throw in some spaghetti code but would be good to have a sense of how you can best model your code for ease of use and experience. Remember, it’s not only you who will use it but lots of other people, it needs to be self-explanatory.
- Preparation -
The simplest thing you can do is create a Github repository, throw in some code and mark it as public! People can clone your repo and modify as needed, maybe even making suggestions by contributing back. Congratulations, you’ve open-sourced a project!
As mentioned, I wanted to focus on building a Python SDK specifically. So, go ahead and write some Python code and structure it in a way that’s easy to understand.
For us at @neum_ai, we have basically divided our models into three main components:
1. Data Connectors
2. Embed Connectors
3. Sink Connectors
4. Shared components
Opening up those sub packages will yield specific classes all pertaining to the right component. Single responsibility principle can come in handy when building SDKs!
Won’t go too much in detail on the components as it’s RAG-framework specific and the point of this post is to focus on encouraging to open source your project by just doing it.
Once you have your python code nicely structured, it’s time to package it to be used as an SDK.
(btw, you might think it’s nicely structured but, us developers, are very opinionated and will always find something to improve. So, another important thing is to be very receptive to feedback, especially if you post in @hackernews)
Python packages are mainly stored in @pypi - the Python Package Index - which is "a repository of software for the Python Programming Language".
@pypi's great because it stores the versions, the changelog, the release history, links to the @github repo, and more.
Simply create an account and link your repo. In your repo you will have a pyproject.toml file that essentially contains the specification and details of this package, like the version, the name, the dependencies, and more.
Once you are ready you can build your package by running poetry build in the folder where the .toml file exists, and deploy it by running poetry publish (make sure you have poetry installed and your username and password for Pypi)
That’s it, it will show up in your Pypi’s package page and ready to be installed!
- Main Course -
Similar to how chefs make final touches before serving the food, so should you.
Make sure to install your package and test it out locally. Running pip install <yourpackagename> will install it in your machine and you will be able to use it by building a quick python test program.
Now it’s time to serve your dish!
Here are some outlets where you can share your open source project with:
1. Developer friends/coworkers are great to give you some immediate feedback. Thank YOU to everyone who gave us feedback!
2. @hackernews - Hard to get to the front page but if you do, expect a lot of traffic, also, be very receptive of feedback and don’t take things personally.
3. @X - A lot of developers wander around in @X these days. Whether it’s Crypto, AI, or others, developers are on the lookout for what’s new and what’s hot.
4. @LinkedIn - Leverage your connections to amplify the reach!
5. @Reddit - This can be a huge boost if you post in the right channels. For the general open source community check out r/opensource, however, if you can find a community that’s specifically tailored to your project, make sure to post it there.
- Dessert -
Almost done. Time for a couple of follow ups!
Other languages
While we focused on building a Python package, the same mentality and structure applies if you want to build Javascript/Go/React, whatever language or framework you are building for: Just do it. Don’t focus on the negative questions that pop up. Tackle them by building, shipping, and announcing them. Stay busy building, not thinking.
License
An open-source license is a way to show and disclose what kind of permissions and policies are allowed when using, modifying and sharing your software. The @OpenSourceOrg maintains a list of all the available licenses. Some common ones are MIT and Apache.
Readme and Contributor’s guide
A lot of people decide whether to read your project in Github depending on the Readme page.
The Readme is the entry-point for anyone coming into your project. You want to make sure it’s clean, organized, and has a clear call to action on how to use your package/service/code.
Take a look at @neum_ai's one! We also like @langchain's one and the one built by the great engineers @thewizardlucas and @commadangle over at @GetErgomake's Layerform repo.
Similarly, make sure you have a contributor's guide for instructions on how to suggest changes to your project!
Fancy stuff to add as you grow
- Specifically for Python, we wanted to incorporate best practices and latest libraries people are using to increase the familiarity with it, that is why we actively use @pydantic for our models to structure the code in a cleaner and more readable way. @pydantic is great for those coming with a typed-language and object-oriented background.
- Enforcing the need to require forking of your repository before creating a pull request helps to add structure to contributions made by the community.
- Enabling Continuous Integration and Continuous Deployment (CI-CD) for your package. For example, every time there is something merged to master, automatically deploy a new version of your package.
- Documenting well. Consider having a well-organized documentation page. It can be in the repository itself or as a separate page. We at Neum ❤️@mintlify.
- More that I probably missed adding or don’t know!
- Conclusion -
Food’s served!!
While it’s a long post it really boils down to having the mindset of just putting in the code and opening up a @github repository publicly.
Iterating is part of the general software development lifecycle and there will always be places to improve and bugs to fix, you just have to start.
Thanks for reading this far! If you enjoyed this, drop a comment and follow for more!
It's Blog Tuesday! (we just made that up) Dive into the details of how to semantically search over your data by creating an application using the Neum AI SDK!
https://t.co/W74LzqoAJj
For this one we used @weaviate_io as the vector database powering the vector search!
New day, new blog post!🚀
In this one, we share how to use @neum_ai to build a Real time Retrieval Augmented Generation pipeline.
With @neum_ai 's latest integration with @supabase we show how to get the most up-to-date context in your application
https://t.co/J117Mm0a6Y
#RAG #LLM #AI #Supabase #postgres #realtime
Happy Thanksgiving 🦃! Sharing part 2/n of our series for building large scale RAG applications with Neum AI. In this blog, we go deeper into leveraging built-in parallelization capabilities alongside @Celery and @Redisinc.
Blog: https://t.co/NhMmHYbx4R
Full code: https://t.co/ffPFesK82B
Releasing Part 1/n of a blog series showcasing the use of @neum_ai open-source framework to build large scale RAG data pipelines.
Part 1: Setting up a pipeline configuration using components from @OpenAI , @weaviate_io , @langchain and @UnstructuredIO
https://t.co/vHpHMsYO0D