Introduction
Semantic search aims to understand search phrases intent and contextual meaning, rather than just focusing on individual keywords.
What is it? Vector search is a key method to implement RAG to enhance LLM , where RAG is regarded as a 'grounding' solution to help LLM obtain certain knowledge out of its training data. Since LLM brings capability that it can encode text into embeddings, we can vector search in embedding level, means the query is a embedding vector, and we use methods like vector similarity to conduct the query. (very similar to index query)
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, & Douwe Kiela. (2021). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.
How it works with LLM? The basic idea is to retrieval extra data using Vectors & Semantic Search and reshape the prompt by merging the retrieval result and user question. For example, a given prompt will be like this to wrap the retrieval result.
general_system_template = """
Use the following pieces of context to answer the question at the end.
The context contains question-answer pairs and their links from Stackoverflow.
You should prefer information from accepted or more upvoted answers.
Make sure to rely on information from the answers and not on questions to provide accuate responses.
When you find particular answer in the context useful, make sure to cite it in the answer using the link.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----
{summaries}
----
Each answer you generate should contain a section at the end of links to
Stackoverflow questions and answers you found useful, which are described under Source value.
You can only use links to StackOverflow questions that are present in the context and always
add links to the end of the answer in the style of citations.
Generate concise answers with references sources section of links to
relevant StackOverflow questions only at the end of the answer.
"""
And the user question will be attached during the conversation.
Notes: The example above is from code of GenAI Stack project of neo4j, there are some researches working on dynamic / selective / interactive merge of retrieval result of prompts, for more info please refer to my paper memo.
Vector Search Implementation, a Quick Demo
To accomplish the process of RAG , you need to :
- Get your knowledge database ready to go, you need to build a database that stores vectors and support vector query. I will show you the example using neo4j, which allow us to create both graph structure and save vector embeddings for each nodes and conduct vector search.
- Implement your search. You need to implement a method that you can convert query sentence into vector, and conduct vector search, and return the corresponding results. (object that can be indexed from vectors as search result)
- Reshape your prompt and your conversation with LLM. (mostly we use LangChain to to this)
1. Get Database Ready
the following codes are from https://graphacademy.neo4j.com/courses/llm-fundamentals/
Normally if you have nodes defined:
MATCH (m:Movie {title: "Toy Story"})
RETURN m.title AS title, m.plot AS plot
You need firstly create index ready , index will be the additional data structure to support vector search, though you seems to just add another key-value for nodes.
CALL db.index.vector.createNodeIndex(
indexName :: STRING,
label :: STRING,
propertyKey :: STRING,
vectorDimension :: INTEGER,
vectorSimilarityFunction :: STRING)
Following the doc above, create index for nodes with label :Movie named moviePlots which create embedding key-value for nodes which should have 1536 dimensions(based on your embedding service, the text-to-embedding process just need to be conducted once) , and will use cosine to search:
CALL db.index.vector.createNodeIndex(
'moviePlots',
'Movie',
'embedding',
1536,
'cosine'
)
To set embedding value you need to pre-get the embedding value (using your local LLM, openAI API, or sentence_transformer), and load them via csv
Notes: you CAN NOT just add value to embedding because the vector are actually maintained by vector index, you will see the embedding value if you query the nodes, but it it just visually behaves like a value of a node.
LOAD CSV WITH HEADERS
FROM 'https://data.neo4j.com/llm-fundamentals/openai-embeddings.csv'
AS row
MATCH (m:Movie {movieId: row.movieId})
CALL db.create.setNodeVectorProperty(m, 'embedding', apoc.convert.fromJsonList(row.embedding))
RETURN count(*)
2. Implement Query
To conduct vector search:
CALL db.index.vector.queryNodes(
indexName :: STRING,
numberOfNearestNeighbours :: INTEGER,
query :: LIST<FLOAT>
) YIELD node, score
Notes: notice that you will have node returned which means you can combine the search result will other graph query.
MATCH (m:Movie {title: 'Toy Story'})
WITH m LIMIT 1
CALL db.index.vector.queryNodes('moviePlots', 6, m.embedding)
YIELD node, score
RETURN node.title AS title, node.plot AS plot, score
**Notes: RAG irl will be much complex, since you need to deal with the issue **(i) Insufficient retrieval, (ii) conflictual information retrieved, and (iii) misunderstanding of information retrieved. ** For more info , please refer to:
Cunxiang Wang, Xiaoze Liu, Yuanhao Yue, Xiangru Tang, Tianhang Zhang, Cheng Jiayang, Yunzhi Yao, Wenyang Gao, Xuming Hu, Zehan Qi, Yidong Wang, Linyi Yang, Jindong Wang, Xing Xie, Zheng Zhang, & Yue Zhang. (2023). Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity.
Chen J, Lin H, Han X, Sun L. Benchmarking Large Language Models in Retrieval-Augmented Generation. arXiv preprint arXiv:2309.01431. 2023 Sep 4.
3. Reshape the LLM Conversation
A typical RAG chain will be like: (and we will copy a much simpler example, regarding to the movie example we mentioned)
def configure_qa_rag_chain(llm, embeddings, embeddings_store_url, username, password):
# RAG response
general_system_template = """
Use the following pieces of context to answer the question at the end.
The context contains question-answer pairs and their links from Stackoverflow.
You should prefer information from accepted or more upvoted answers.
Make sure to rely on information from the answers and not on questions to provide accuate responses.
When you find particular answer in the context useful, make sure to cite it in the answer using the link.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----
{summaries}
----
Each answer you generate should contain a section at the end of links to
Stackoverflow questions and answers you found useful, which are described under Source value.
You can only use links to StackOverflow questions that are present in the context and always
add links to the end of the answer in the style of citations.
Generate concise answers with references sources section of links to
relevant StackOverflow questions only at the end of the answer.
"""
general_user_template = "Question:```{question}```"
messages = [
SystemMessagePromptTemplate.from_template(general_system_template),
HumanMessagePromptTemplate.from_template(general_user_template),
]
qa_prompt = ChatPromptTemplate.from_messages(messages)
qa_chain = load_qa_with_sources_chain(
llm,
chain_type="stuff",
prompt=qa_prompt,
)
# Vector + Knowledge Graph response
kg = Neo4jVector.from_existing_index(
embedding=embeddings,
url=embeddings_store_url,
username=username,
password=password,
database="neo4j", # neo4j by default
index_name="your_index_name", # vector by default
text_node_property="body", # text by default
# your costume
retrieval_query="""
MATCH <Your Query>
RETURN result AS text, similarity as score, {source: question.link} AS metadata
ORDER BY similarity ASC // so that best answers are the last
""",
)
kg_qa = RetrievalQAWithSourcesChain(
combine_documents_chain=qa_chain,
retriever=kg.as_retriever(search_kwargs={"k": 2}),
reduce_k_below_max_tokens=False,
max_tokens_limit=3375,
)
return kg_qa
Here is an much simpler case, the following code tries to conduct vector search based on langchain.
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.neo4j_vector import Neo4jVector
embedding_provider = OpenAIEmbeddings(openai_api_key="sk-...")
movie_plot_vector = Neo4jVector.from_existing_index(
embedding_provider,
url="bolt://localhost:7687",
username="neo4j",
password="pleaseletmein",
index_name="moviePlots",
embedding_node_property="embedding",
text_node_property="plot",
)
r = movie_plot_vector.similarity_search("A movie where aliens land and attack earth.", k = 4) ##specify the number of documents to return via k
print(r)
Moreover, langchain allows you to build a retrieval qa via:
from langchain.chains import RetrievalQA
from langchain.chat_models.openai import ChatOpenAI
chat_llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY)
retriever=movie_plot_vector.as_retriever()
retrievalQA = RetrievalQA.from_llm(
llm=chat_llm,
retriever=movie_plot_vector.as_retriever()
)
r = retrievalQA("A mission to the moon goes wrong")
print(r)
Notes: the above codes allow you to create a pre-defined RAG prompt Schema provided by LangChain, and it will firstly conduct vector search via your query 'A mission to the moon goes wrong', while unfortunately it will try to match related Semantic content like movies related to moon mission, and it will combine your query with these query result(highly depends on your database quality, e.g. maybe you just provide movie title for query) to the LLM and wait for answer. Therefore, you need to pay attention on decide how much information should be in the search result
The whole code is :
from langchain.prompts import PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain, RetrievalQA
from langchain.chains.conversation.memory import ConversationBufferMemory
from langchain.agents import AgentType, initialize_agent
from langchain.tools import Tool, YouTubeSearchTool
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.neo4j_vector import Neo4jVector
OPENAI_API_KEY = "sk-..."
llm = ChatOpenAI(
openai_api_key=OPENAI_API_KEY
)
youtube = YouTubeSearchTool()
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
prompt = PromptTemplate(
template="""
You are a movie expert. You find movies from a genre or plot.
ChatHistory:{chat_history}
Question:{input}
""",
input_variables=["chat_history", "input"]
)
chat_chain = LLMChain(llm=llm, prompt=prompt, memory=memory, verbose=True)
embedding_provider = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
movie_plot_vector = Neo4jVector.from_existing_index(
embedding_provider,
url="bolt://localhost:7687",
username="neo4j",
password="pleaseletmein",
index_name="moviePlots",
embedding_node_property="embedding",
text_node_property="plot",
)
retrievalQA = RetrievalQA.from_llm(
llm=llm,
retriever=movie_plot_vector.as_retriever(),
verbose=True,
return_source_documents=True
)
def run_retriever(query):
results = retrievalQA({"query":query})
return str(results)
tools = [
Tool.from_function(
name="ChatOpenAI",
description="For when you need to chat about movies, genres or plots. The question will be a string. Return a string.",
func=chat_chain.run,
return_direct=True
),
Tool.from_function(
name="YouTubeSearchTool",
description="For when you need a link to a movie trailer. The question will be a string. Return a link to a YouTube video.",
func=youtube.run,
return_direct=True
),
Tool.from_function(
name="PlotRetrieval",
description="For when you need to compare a plot to a movie. The question will be a string. Return a string.",
func=run_retriever,
return_direct=True
)
]
agent = initialize_agent(
tools, llm, memory=memory,
agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
verbose=True,
handle_parsing_errors=True,
)
while True:
q = input(">")
print(agent.run(q))
Graph Search Implementation
One-Shot Demo
Keep it in mind that since we are using neo4j, we can also conduct graph query. We can use LLM to generate Cypher to do this.
Notes: LLM output is not 100% accurate based on the model you choose, which means the LLM output format may not follow your instruction completely, you MUST implement error handling mechanism using Few-Shot Prompting, which will be in next section, BUT be aware of that there are more methods to do this
from langchain.chat_models import ChatOpenAI
from langchain.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
from langchain.prompts import PromptTemplate
llm = ChatOpenAI(
openai_api_key="sk-..."
)
graph = Neo4jGraph(
url="bolt://localhost:7687",
username="neo4j",
password="pleaseletmein",
)
CYPHER_GENERATION_TEMPLATE = """
You are an expert Neo4j Developer translating user questions into Cypher to answer questions about movies and provide recommendations.
Convert the user's question based on the schema.
Schema: {schema}
Question: {question}
"""
cypher_generation_prompt = PromptTemplate(
template=CYPHER_GENERATION_TEMPLATE,
input_variables=["schema", "question"],
)
cypher_chain = GraphCypherQAChain.from_llm(
llm,
graph=graph,
cypher_prompt=cypher_generation_prompt,
verbose=True
)
cypher_chain.run("What role did Tom Hanks play in Toy Story?")
In the code above, we ask the LLM must output as the following format:
Schema: {schema}
Question: {question}
The cypher_generation_prompt
will be a wrapped prompt template, asking the LLM about the output format above
and if we are LUCKY, the following GraphCypherQAChain
will get the Cyber from output and conduct neo4j query. GraphCypherQAChain
is the most important code here. The output will be like:
Generated Cypher:
MATCH (a:Actor)-[r:ACTED_IN]->(m:Movie {title: 'Toy Story'})
WHERE a.name = 'Tom Hanks'
RETURN r.role
Full Context:
[{'r.role': 'Woody (voice)'}]
And the GraphCypherQAChain
require:
- Base LLM
- Neo4j Instance
- Prompt Template
- A User Query for each call
Few-shot
To enhance the LLM (Teach LLM to output better query, following target format better), we can show some example as few-shot finetuning. The most simple way to do this is to 'HARDCODE' your example:
CYPHER_GENERATION_TEMPLATE = """
You are an expert Neo4j Developer translating user questions into Cypher to answer questions about movies and provide recommendations.
Convert the user's question based on the schema.
Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.
For movie titles that begin with "The", move "the" to the end, For example "The 39 Steps" becomes "39 Steps, The" or "The Matrix" becomes "Matrix, The".
If no data is returned, do not attempt to answer the question.
Only respond to questions that require you to construct a Cypher statement.
Do not include any explanations or apologies in your responses.
Examples:
Find movies and genres:
MATCH (m:Movie)-[:IN_GENRE]->(g)
RETURN m.title, g.name
Schema: {schema}
Question: {question}
"""
But, keep it in mind that:
- This will provide limited buff.
- The example you pick will effect the LLM since you somehow bring extra 'noise' info, e.g. if your example actually does not match the task, it will create negative effect.