9. Chat with PDFs#

In this tutorial I’d like to introduce you to the API to the LLMs, and in the end, we will build something useful for ourselves, a conversation agent that we can chat with about a certain paper.

9.1. Set up#

9.1.1. OpenAI_API_KEY#

We also need the API key from Openai to get access to their models.

https://platform.openai.com/account/api-keys

%pip install openai
%pip install langchain
%pip install pypdf
%pip install chromadb
%pip install tiktoken
import os
import openai
os.environ["OPENAI_API_KEY"] = "" # Openai API
openai.api_key = os.environ["OPENAI_API_KEY"]

9.1.2. Install and Import Libraries#

import openai
import textwrap
from os.path import join
import pickle as pkl

9.2. OpenAI LLM APIs#

Useful links for checking price and cost

Sample text from https://arxiv.org/abs/2304.02643

We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive – often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at this https URL to foster research into foundation models for computer vision.

text = """ We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at this https URL to foster research into foundation models for computer vision. """

9.2.1. OpenAI text embedding#

OpenAI provides API to embed chunk of texts into a single vector. This vector embedding can be used to retrieve these text chunks.

response = openai.Embedding.create(
  model="text-embedding-ada-002",
  input=text
)

embeddings = response['data'][0]['embedding']

9.2.2. OpenAI text completion#

This is the API for GPT3 models, so we can send prompts to them and let it complete the text. Note that, the task is just to complete the text, so it may not answer your question. To really do so, Prompt Engineering is important.

See official examples in https://platform.openai.com/examples/default-chat

Generally, for these prompts

  • Prepending system messages before the question will condition the GPT model and make the answer more helpful.

  • Adding one or a few examples of what type of output do you expect and what format that is will also help.

prompt = "I'd like to continue generating a paper's introduction paragraph based on the abstract. \n\nAbstract\n"+text+"\nMain Text\nIntroduction\n"

complete_resp = openai.Completion.create(
  # model="text-davinci-003", # more powerful and expensive
  model="text-curie:001", # cheaper
  prompt=prompt,
  max_tokens=256,
  temperature=0.5
)

Prompt it to answer questions

pre_prompt = "I am a highly intelligent question answering bot. If you ask me a question that is rooted in truth, I will give you the answer. If you ask me a question that is nonsense, trickery, or has no clear answer, I will respond with \"Unknown\".\n\nQ: What is human life expectancy in the United States?\nA: Human life expectancy in the United States is 78 years.\n\nQ: Who was president of the United States in 1955?\nA: Dwight D. Eisenhower was president of the United States in 1955.\n\nQ: Which party did he belong to?\nA: He belonged to the Republican Party.\n\nQ: What is the square root of banana?\nA: Unknown\n\nQ: How does a telescope work?\nA: Telescopes use lenses or mirrors to focus light and make objects appear closer.\n\nQ: Where were the 1992 Olympics held?\nA: The 1992 Olympics were held in Barcelona, Spain.\n\nQ: How many squigs are in a bonk?\nA: Unknown\n\n"
question = "Q: When is the Eiffel Tower built?"
response = openai.Completion.create(
  model="text-davinci-003",
  prompt=pre_prompt + question + "\nA:",
  temperature=0,
  max_tokens=256,
  top_p=1,
  frequency_penalty=0.0,
  presence_penalty=0.0,
  stop=["\n"]
)

Prompt it to chat with you as friendly AI

pre_prompt = "The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.\n\n"
prompt = """
Human: Hello, who are you?
AI: I am an AI created by OpenAI. How can I help you today?
Human: I'd like to cancel my subscription.
AI:
"""

response = openai.Completion.create(
  model="text-davinci-003",
  prompt=pre_prompt + prompt + "\nA:",
  temperature=0,
  max_tokens=256,
  top_p=1,
  frequency_penalty=0.0,
  presence_penalty=0.0,
  stop=["\n"]
)

9.2.3. OpenAI ChatGPT#

This is the API for ChatGPT, and we can more directly ask questions or have a conversation with it without much prompt tuning.

completion = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": "Hello! Can you tell me a really funny joke to light my day up?"}
  ]
)

print(completion.choices[0].message)

9.3. Chat with PDF#

The idea is like this

  1. First, divide the paper / doc into many chunks.

  2. Embed each chunk into a vector via OpenAI GPT embedding

  3. Store these vectors in a database (chromedb)

  4. When a question comes, embed it and retrieve the most similar text chunks.

  5. Send these chunks and the questions to ChatGPT API, and let it answer the question.

This is a complex pipeline, so langchain works as the glue that connect every parts.

Chat with PDF

Required libraries

  • openai provides API of openai GPT models

  • langchain a recent framework that chain together the GPT models to build applications.

  • pypdf pdf parser in python.

  • chromadb a vector database that manage the embedding vectors and handle the info retrieval.

9.3.1. Tool Chaining with LangChain#

from langchain.document_loaders import PyPDFLoader # for loading the pdf
from langchain.embeddings import OpenAIEmbeddings # for creating embeddings
from langchain.vectorstores import Chroma # for the vectorization part
from langchain.chains import ChatVectorDBChain # for chatting with the pdf
from langchain.chains import ConversationalRetrievalChain
from langchain.llms import OpenAI # the LLM model we'll use (CHatGPT)
from langchain.chat_models import ChatOpenAI

9.3.2. Parse and Split the PDF#

Download the pdf and parse it through python PDF reader

import requests
def download_pdf(arxiv_id, save_root=""):
    url = f"https://arxiv.org/pdf/{arxiv_id}.pdf"
    r = requests.get(url, allow_redirects=True,)
    open(join(save_root, f"{arxiv_id}.pdf"), 'wb').write(r.content)
arxiv_id = "1706.03762"#"2304.02643"
# This is the Segment Everything paper from Meta https://arxiv.org/abs/2304.02643,
# change to sth you are curious about!

download_pdf(arxiv_id)
pdf_path = f"{arxiv_id}.pdf"
loader = PyPDFLoader(pdf_path)
pages = loader.load_and_split()

Let’s look at what lies within the pages

print(pages[0].metadata)
print(pages[0].page_content)
{'source': '1706.03762.pdf', 'page': 0}
Attention Is All You Need
Ashish Vaswani
Google Brain
avaswani@google.comNoam Shazeer
Google Brain
noam@google.comNiki Parmar
Google Research
nikip@google.comJakob Uszkoreit
Google Research
usz@google.com
Llion Jones
Google Research
llion@google.comAidan N. Gomezy
University of Toronto
aidan@cs.toronto.eduŁukasz Kaiser
Google Brain
lukaszkaiser@google.com
Illia Polosukhinz
illia.polosukhin@gmail.com
Abstract
The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks that include an encoder and a decoder. The best
performing models also connect the encoder and decoder through an attention
mechanism. We propose a new simple network architecture, the Transformer,
based solely on attention mechanisms, dispensing with recurrence and convolutions
entirely. Experiments on two machine translation tasks show these models to
be superior in quality while being more parallelizable and requiring significantly
less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-
to-German translation task, improving over the existing best results, including
ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task,
our model establishes a new single-model state-of-the-art BLEU score of 41.8 after
training for 3.5 days on eight GPUs, a small fraction of the training costs of the
best models from the literature. We show that the Transformer generalizes well to
other tasks by applying it successfully to English constituency parsing both with
large and limited training data.
1 Introduction
Recurrent neural networks, long short-term memory [ 13] and gated recurrent [ 7] neural networks
in particular, have been firmly established as state of the art approaches in sequence modeling and
Equal contribution. Listing order is random. Jakob proposed replacing RNNs with self-attention and started
the effort to evaluate this idea. Ashish, with Illia, designed and implemented the first Transformer models and
has been crucially involved in every aspect of this work. Noam proposed scaled dot-product attention, multi-head
attention and the parameter-free position representation and became the other person involved in nearly every
detail. Niki designed, implemented, tuned and evaluated countless model variants in our original codebase and
tensor2tensor. Llion also experimented with novel model variants, was responsible for our initial codebase, and
efficient inference and visualizations. Lukasz and Aidan spent countless long days designing various parts of and
implementing tensor2tensor, replacing our earlier codebase, greatly improving results and massively accelerating
our research.
yWork performed while at Google Brain.
zWork performed while at Google Research.
31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.arXiv:1706.03762v5  [cs.CL]  6 Dec 2017

For other document types that are loadable via langchain, see https://python.langchain.com/en/latest/modules/indexes/document_loaders.html

Including

  • Notion

  • Markdown

  • Word

  • PowerPoint

  • epubs

  • WhatsAPP

  • Telegram

9.3.3. Embedding Database#

Create embedding (not computed yet, just a place holder), and then save them in a database.

Chroma is the database lib to collect the embeddings and retrieve content with vector similarity.

embeddings = OpenAIEmbeddings() # GPT3-Ada-Embedding
vectordb = Chroma.from_documents(pages, embedding=embeddings,
              persist_directory=pdf_path.replace(".pdf", ""), )
vectordb.persist()
WARNING:chromadb:Using embedded DuckDB with persistence: data will be stored in: 1706.03762

9.3.4. Retrieval Chain#

Create a conversation retrieval chain

pdf_qa = ConversationalRetrievalChain.from_llm(
      ChatOpenAI(temperature=0.9, model_name="gpt-3.5-turbo"), # ChatGPT API
              vectordb.as_retriever(),
              return_source_documents=True,
              max_tokens_limit=4097)

9.3.5. Saving function#

This is the utility function to append new questions and answers to a md and pkl file. So it’s easier to look at chat histroy post hoc.

def save_qa_history(query, result, qa_path,):
    uid = 0
    while os.path.exists(join(qa_path, f"QA{uid:05d}.pkl")):
        uid += 1
    pkl.dump((query, result), open(join(qa_path, f"QA{uid:05d}.pkl"), "wb"))

    pkl_path = join(qa_path, "chat_history.pkl")
    if os.path.exists(pkl_path):
        chat_history = pkl.load(open(pkl_path, "rb"))
    else:
        chat_history = []
    chat_history.append((query, result))
    pkl.dump(chat_history, open(pkl_path, "wb"))

    # print to a markdown file with formatting for reading
    with open(os.path.join(qa_path, "QA.md"), "a", encoding="utf-8") as f:
        f.write("\n**Question:**\n\n")
        f.write(query)
        f.write("\n\n**Answer:**\n\n")
        f.write(result["answer"])
        f.write("\n\nReferences:\n\n")
        for doc in result["source_documents"]:
            f.write("> ")
            f.write(doc.page_content[:250])
            f.write("\n\n")
        f.write("-------------------------\n\n")

9.3.6. Demo: Q&A#

qa_path = pdf_path.replace(".pdf", "") + "_qa_history"
os.makedirs(qa_path, exist_ok=True)
query = "what are the limitations of this architecture?"
# query = "How is this method related to previous segmentation methods e.g. Mask RCNN"
# More questions?

result = pdf_qa({"question": query, "chat_history": ""})

print("Answer:")
print(textwrap.fill(result["answer"], 80))
print("\nReferences")
for doc in result["source_documents"]:
    print(doc.page_content[:100])
    print("\n")

save_qa_history(query, result, qa_path,)
Answer:
While the Transformer has shown to be effective in various natural language
processing tasks, it still has some limitations. One of the main limitations is
that it requires significant computational resources for training, which can
make it challenging to train on large datasets. Additionally, it may not perform
as well as recurrent models on tasks that require modeling of long-term
dependencies in the input sequence. Finally, the attention mechanism used in the
Transformer can be sensitive to the order of the input sequence, which can be a
problem in tasks where the order of the input is important.

References
Table 3: Variations on the Transformer architecture. Unlisted values are identical to those of the b


constraints and is significantly longer than the input. Furthermore, RNN sequence-to-sequence
models 


Input-Input Layer5
The
Law
will
never
be
perfect
,
but
its
application
should
be
just
-
this
is
what


transduction problems such as language modeling and machine translation [ 35,2,5]. Numerous
efforts