How to Use Cohere Embeddings and Rerank Modules with MongoDB Atlas
Rate this tutorial
The daunting task that developers currently face while developing solutions powered by the retrieval augmented generation (RAG) framework is the choice of retrieval mechanism. Augmenting the large language model (LLM) prompt with relevant and exhaustive information creates better responses from such systems.. One is tasked with choosing the most appropriate embedding model in the case of semantic similarity search. Alternatively, in the case of full-text search implementation, you have to be thorough about your implementation to achieve a precise recall and high accuracy in your results. Sometimes, the solutions require a combined implementation that benefits from both retrieval mechanisms.
If your current full-text search scoring workflow is leaving things to be desired, or if you find yourself spending too much time writing numerous lines of code to get semantic search functionality working within your applications, then Cohere and MongoDB can help. To prevent these issues from holding you back from leveraging powerful AI search functionality or machine learning within your application, Cohere and MongoDB offer easy-to-use and fully managed solutions.
- With a powerful tool for embedding natural language in their projects, it can help you represent more accurate, relevant, and engaging content as embeddings. The Cohere language model also offers a simple and intuitive API that allows you to easily integrate it with your existing workflows and platforms.
- The Cohere Rerank module is a component of the Cohere natural language processing system that helps to select the best output from a set of candidates. The module uses a neural network to score each candidate based on its relevance, semantic similarity, theme, and style. The module then ranks the candidates according to their scores and returns the top N as the final output.
MongoDB Atlas is a fully managed developer data platform service that provides scalable, secure, and reliable data storage and access for your applications. One of the key features of MongoDB Atlas is the ability to perform vector search and full-text search on your data, which can enhance the capabilities of your AI/ML-driven applications. MongoDB Atlas can help you build powerful and flexible AI/ML-powered applications that can leverage both structured and unstructured data. You can easily create and manage search indexes, perform queries, and analyze results using MongoDB Atlas's intuitive interface, APIs, and drivers. MongoDB Atlas Vector Search provides a unique feature — pre-filtering and post-filtering on vector search queries — that helps users control the behavior of their vector search results, thereby improving the accuracy and retrieval performance, and saving money at the same time.
Therefore, with Cohere and MongoDB Atlas, we can demonstrate techniques where we can easily power a semantic search capability on your private dataset with very few lines of code. Additionally, you can enhance the existing ranking of your full-text search retrieval systems using the Cohere Rerank module. Both techniques are highly beneficial for building more complex GenAI applications, such as RAG- or LLM-powered summarization or data augmentation.
- Use the Cohere Embed Jobs to generate vector embeddings for the first time on large datasets in an asynchronous and scheduled manner.
- Add vector embeddings into MongoDB Atlas, which can store and index these vector embeddings alongside your other operational/metadata.
- Finally, prepare the indexes for both vector embeddings and full-text search on our private dataset.
- Write a simple Python function to accept search terms/phrases and pass it through the Cohere embed API again to get a query vector.
- Take these resultant query vector embeddings and perform a vector search query using the $vectorsearch operator in the MongoDB Aggregation Pipeline.
- Pre-filter documents using meta information to narrow the search across your dataset, thereby speeding up the performance of vector search results while retaining accuracy.
- The retrieved semantically similar documents can be post-filtered (relevancy score) to demonstrate a higher degree of control over the semantic search behaviour.
- Write a simple Python function to accept search terms/phrases and prepare a query using the $search operator and MongoDB Aggregation Pipeline.
- Take these resultant documents and perform a reranking operation of the retrieved documents to achieve higher accuracy with full-text search results using the Cohere rerank module.
This will be a hands-on tutorial that will introduce you to how you can set up MongoDB with sample_movies dataset (the link to the file is in the code snippets). You’ll learn how to use the Cohere embedding jobs API to schedule a job to process all the documents as a batch job and update the dataset to add a new field by the name embedding that is stored alongside the other metadata/operational data. We will use this field to create a vector search index programmatically using the MongoDB Python drivers. Once we have created this index, we can then demonstrate how to query using the vector embedding as well as perform full-text search using the expressive and composable MongoDB Aggregation Pipeline (Query API).
pandas
: Helps with data preprocessing and handlingcohere
: For embedding model and rerank modulepymongo
: For the MongoDB Atlas vector store and full text searchs3fs
: To load files directly from s3 bucket
The following line of code is to be run on Jupyter Notebook to install the required packages.
If you have not created an API key on the Cohere platform, you can sign up for a Cohere account and create an API key, which you can generate from one of the following interfaces:
Also, if you have not created a MongoDB Atlas instance for yourself, you can follow the tutorial to create one. This will provide you with your
MONGODB_CONNECTION_STR
.Run the following lines of code in Jupyter Notebook to initialize the Cohere secret or API key and MongoDB Atlas connection string.
Run the following lines of code in Jupyter Notebook to read data from an AWS S3 bucket directly to a pandas dataframe.
Here we will create a movies dataset in Cohere by uploading our sample movies dataset that we fetched from the S3 bucket and have stored locally. Once we have created a dataset, we can use the Cohere embed jobs API to schedule a batch job to embed all the entire dataset.
You can run the following lines of code in your Jupyter Notebook to upload your dataset to Cohere and schedule an embedding job.
Now that we have created the vector embeddings for our sample movies dataset, we can initialize the MongoDB client and insert the documents into our collection of choice by running the following lines of code in the Jupyter Notebook.
With the latest update to the Pymongo Python package, you can now create your vector search index as well as full-text search indexes from the Python client itself. You can also create vector indexes using the MongoDB Atlas UI or
mongosh
.Run the following lines of code in your Jupyter Notebook to create search and vector search indexes on your new collection.
MongoDB Atlas brings the flexibility of using vector search alongside full-text search filters. Additionally, you can apply range, string, and numeric filters using the aggregation pipeline. This allows the end user to control the behavior of the semantic search response from the search engine. The below lines of code will demonstrate how you can perform vector search along with pre-filtering on the year field to get movies earlier than 1990. Plus, you have better control over the relevance of returned results, so you can perform post-filtering on the response using the MongoDB Query API. In this demo, we are filtering on the score field generated as a result of performing the vector similarity between the query and respective documents, using a heuristic to retain only the accurate results.
Run the below lines of code in Jupyter Notebook to initialize a function that can help you achieve vector search + pre-filter + post-filter.
Run the below lines of code in Jupyter Notebook cell and you can see the following results.
Cohere Rerank is a module in the Cohere suite of offerings that enhances the quality of search results by leveraging semantic search. This helps elevate the traditional search engine performance, which relies solely on keywords. Rerank goes a step further by ranking results retrieved from the search engine based on their semantic relevance to the input query. This pass of re-ranking search results helps achieve more appropriate and contextually similar search results.
To demonstrate how the Rerank module can be leveraged with MongoDB Atlas full-text search, we can follow along by running the following line of code in your Jupyter Notebook.
Output post reranking the full-text search results:
In this tutorial, we were able to demonstrate the following:
- Using the Cohere embedding along with MongoDB Vector Search, we were able to show how easy it is to achieve semantic search functionality alongside your operational data functions.
- With Cohere Rerank, we were able to search results using full-text search capabilities in MongoDB and then rank them by semantic relevance, thereby delivering richer, more relevant results without replacing your existing search architecture setup.
- The implementations were achieved with minimal lines of code and showcasing ease of use.
- Leveraging Cohere Embeddings and Rerank does not need a team of ML experts to develop and maintain. So the monthly costs of maintenance were kept to a minimum.
- Both solutions are cloud-agnostic and, hence, can be set up on any cloud platform.
The same can be found on a notebook which will help reduce the time and effort following the steps in this blog.
To learn more about how MongoDB Atlas is helping build application-side ML integration in real-world applications, you can visit the MongoDB for AI page.