In this tutorial, we’ll demonstrate how to use Upstash Vector for semantic search. We will upload several documents and perform a search query to find the most semantically similar documents using embeddings generated automatically by Upstash.

Installation and Setup

First, we need to create a Vector Index in the Upstash Console. Once we have our index, we will copy the UPSTASH_VECTOR_REST_URL and UPSTASH_VECTOR_REST_TOKEN and paste them to our .env file. To learn more about index creation, you can check out this page.

Add the following content to your .env file (replace with your actual URL and token):

UPSTASH_VECTOR_REST_URL=your_upstash_url
UPSTASH_VECTOR_REST_TOKEN=your_upstash_token

We now need to install the upstash-vector library via PyPI. Additionally, we will install python-dotenv to load environment variables from the .env file.

pip install upstash-vector python-dotenv

Code

Create a Python script (e.g., main.py) and add the following code to perform semantic search using Upstash Vector:

main.py
from upstash_vector import Index
from dotenv import load_dotenv
import time

# Load environment variables from a .env file
load_dotenv()

# Initialize the index from environment variables (URL and token)
index = Index.from_env()

# Example documents to be indexed
documents = [
    {"id": "1", "text": "Python is a popular programming language."},
    {"id": "2", "text": "Machine learning enables computers to learn from data."},
    {"id": "3", "text": "Upstash provides low-latency database solutions."},
    {"id": "4", "text": "Semantic search is a technique for understanding the meaning of queries."},
    {"id": "5", "text": "Cloud computing allows for scalable and flexible resource management."}
]

# Reset the index to remove previous data
index.reset()

# Upsert documents into Upstash (embeddings are generated automatically)
for doc in documents:
    index.upsert(
        vectors=[
            (doc["id"], doc["text"], {"text": doc["text"]})
        ]
    )
    print(f"Document {doc['id']} inserted.")

# Wait for the documents to be indexed
time.sleep(1)

# Search for documents similar to the query
query = "What is Python?"
results = index.query(data=query, top_k=3, include_metadata=True)

# Display search results
print("Search Results:")
for result in results:
    print(f"ID: {result.id}")
    print(f"Score: {result.score:.4f}")
    print(f"Metadata: {result.metadata}")
    print("-" * 40)  # Separator line between results

Running the Code

To run the code, execute the following command in your terminal:

python main.py

Here is an example output for the search query “What is Python?”:

Document 1 inserted.
Document 2 inserted.
Document 3 inserted.
Document 4 inserted.
Document 5 inserted.
Search Results:
ID: 1
Score: 0.9080
Metadata: {'text': 'Python is a popular programming language.'}
----------------------------------------
ID: 2
Score: 0.7592
Metadata: {'text': 'Machine learning enables computers to learn from data.'}
----------------------------------------
ID: 4
Score: 0.7388
Metadata: {'text': 'Semantic search is a technique for understanding the meaning of queries.'}
----------------------------------------

Code Breakdown

  1. Environment Setup: We use python-dotenv to load our environment variables and use the Index.from_env() method to initialize the index client.

  2. Document Insertion: We define a list of documents, each with a unique ID and text content. The upsert() function inserts these documents into our index. These documents are automatically converted into embeddings. To learn more about Upstash Embedding Models, you can check out this page.

  3. Index Reset: Before inserting documents, the reset() function clears any existing data in the index.

  4. Search Query: After inserting the documents, we perform semantic search. The query() function returns the top_k most similar documents to the query along with their metadata if include_metadata is set to True.