In this tutorial, we’ll learn how to parse a document using LlamaParse and then query it using an LLM with Upstash Vector.

We’ll split this guide into two parts: parsing a document and then querying the parsed document.

Installation and Setup

To get started, we need to set up our environment. You can install the necessary libraries using the following command in your terminal:

pip install llama-index upstash-vector llama-index-vector-stores-upstash python-dotenv

We also need to create a Vector Index in the Upstash Console. Make sure to set the index dimensions to 1536 and the distance metric to Cosine. To learn more about index creation, you can check out our getting started page.

Once we have our index, we will copy the UPSTASH_VECTOR_REST_URL and UPSTASH_VECTOR_REST_TOKEN and paste them into our .env file.

Environment Variables

Create a .env file in your project directory and add the following content:

UPSTASH_VECTOR_REST_URL=your_upstash_url
UPSTASH_VECTOR_REST_TOKEN=your_upstash_token
OPENAI_API_KEY=your_openai_api_key
LLAMA_CLOUD_API_KEY=your_llama_cloud_api_key

To get your LLAMA_CLOUD_API_KEY, you can follow the instructions in the LlamaCloud documentation.

Part 1: Parsing a Document

We can now move on to parsing a document. In this example, we’ll parse a file named global_warming.txt.

from llama_parse import LlamaParse
from llama_index.core import SimpleDirectoryReader

# Initialize the LlamaParse parser with the desired result format
parser = LlamaParse(result_type="markdown")  # "markdown" and "text" are available

# Parse the document using the parser
file_extractor = {".txt": parser}
documents = SimpleDirectoryReader(input_files=["./documents/global_warming.txt"], file_extractor=file_extractor).load_data()

If you are using Jupyter Notebook, you need to allow nested event loops to parse the document.

You can do this by adding the following code snippet to your file:

import nest_asyncio
nest_asyncio.apply()

Now that we have our parsed data, we can query it.

Part 2: Querying the Parsed Document with an LLM

In this part, we’ll use the UpstashVectorStore to create an index, and query the content. We’ll use OpenAI as the language model to interpret the data and respond to questions based on the document. You can use other LLMs that are supported by LlamaIndex as well.

from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.upstash import UpstashVectorStore
from llama_index.core import StorageContext
import openai

# Load environment variables for API keys and Upstash configuration
from dotenv import load_dotenv
import os
load_dotenv()

# Set up OpenAI API key
openai.api_key = os.getenv("OPENAI_API_KEY")

# Set up Upstash Vector Store
upstash_vector_store = UpstashVectorStore(
    url=os.getenv("UPSTASH_VECTOR_REST_URL"),
    token=os.getenv("UPSTASH_VECTOR_REST_TOKEN"),
)

# Create a storage context for Upstash Vector and index the parsed document
storage_context = StorageContext.from_defaults(vector_store=upstash_vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

# Create a query engine for the index and perform a query
query_engine = index.as_query_engine()
query = "What are the main points discussed in the document?"
response = query_engine.query(query)
print(response)

Here’s the code output:

The main points discussed in the document include the impact of global warming on agriculture 
and food production systems, the importance of adopting sustainable food practices to mitigate 
these effects, the role of agriculture in contributing to global warming through GHG emissions, 
deforestation, and the use of synthetic fertilizers, and the need for sustainable food systems 
to address environmental challenges and ensure food security for future generations.

Conclusion

With the ability to parse and query documents, you can efficiently summarize content, extract essential information, and answer questions based on the document’s details.

To learn more about LlamaIndex and its integration with Upstash Vector, you can visit the LlamaIndex documentation.