RAG with a PDF using LlamaIndex and SimpleVectorStore on Vertex AI

LlamaIndex and Vertex AI

Previously, I showed how to do RAG with a PDF using LangChain and Annoy Vector Store and RAG with a PDF using LangChain and Firestore Vector Store. Both used a PDF as the RAG backend and used LangChain as the LLM framework to orchestrate RAG ingestion and retrieval.

LlamaIndex is another popular LLM framework. I wondered how to set up the same PDF based RAG pipeline with LlamaIndex and Vertex AI but I didn’t find a good sample. I put together a sample and in this short post, I show how to set up the same PDF based RAG pipeline with LlamaIndex.

Ingestion

Before you can use a PDF in a RAG pipeline, you need to first load the PDF into documents with SimpleDirectoryReader. Each page in the PDF becomes a separate document:

print("Read PDF into documents")
documents = SimpleDirectoryReader(
    input_files=["./cymbal-starlight-2024.pdf"]
).load_data()

Next, you need to split the documents into smaller chunks, create and store vector embeddings for each chunk.

First, you need an embedding model:

print("Initialize embedding model")
credentials, project_id = google.auth.default()
embed_model = VertexTextEmbedding(
    credentials=credentials,
    model_name= "text-embedding-005",
)

You also need a SentenceSplitter to split documents into smaller chunks:

text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=20)

Finally, use VectorStoreIndex.from_documents to split the documents into smaller chunks and turn them into embeddings:

print("Index documents")
index = VectorStoreIndex.from_documents(
    documents, embed_model=embed_model, transformations=[text_splitter]
)

Querying

Now, we’re ready to query.

Create a query engine from the index and with a Vertex AI model:

print("Initialize the query engine with the model")
llm = Vertex(model="gemini-2.0-flash-001")
query_engine = index.as_query_engine(llm=llm)

Start asking questions to the query engine:

question = "What is the cargo capacity of Cymbal Starlight?"
response = query_engine.query(question)
print(f"Question: {question}")
print(f"Response: {str(response)}")

Run the full sample in main.py:

Read PDF into documents
Initialize embedding model
Index documents
Initialize the query engine with the model
Question: What is the cargo capacity of Cymbal Starlight?
Response: The cargo capacity of the Cymbal Starlight 2024 is 13.5 cubic feet.

As you can see, you get responses based on the PDF.

Storage

Once the index is built, you can storage it to disk as follows:

print("Store index locally")
index.storage_context.persist(persist_dir="index")

Next time, you can simply load the index:

print("Reload index locally")
storage_context = StorageContext.from_defaults(persist_dir="index")
index = load_index_from_storage(storage_context, embed_model=embed_model)

Conclusion

In this short blog post, I showed a sample on how to setup a PDF based RAG pipeline with LlamaIndex and Vertex AI. For more information, you can check out these resources:

GenAI GoogleAI VertexAI Gemini Google Cloud Platform

Ingestion

Querying

Storage

Conclusion

See also