-
Notifications
You must be signed in to change notification settings - Fork 859
Open
Description
Feature request
The provided examples that leverage LangChain to create a representation all make use of langchain.chains.question_answering.load_qa_chain
and the implementation is not very transparent to the user, leading to inconsistencies and difficulties to understand how to provide custom chains.
Motivation
Some of the issues in detail
langchain.chains.question_answering.load_qa_chain
is now depricated and will be removed at some point.- The current LangChain integration is not very clear because
- a
prompt
can be specified in the constructor of theLangChain
class. However this is not a prompt but rather a custom instruction that is passed to the provided chain through thequestion
key. - in the case of
langchain.chains.question_answering.load_qa_chain
(which is the provided example), thisquestion
key is added as part of a larger, hard-coded (and not transparent to a casual user) prompt. - if a user wants to fully customize the instructions to create the representation, it would be best not to use the
langchain.chains.question_answering.load_qa_chain
chain to avoid this hard-coded prompt (this is currently not very clearly documented). In addition, if that specific chain is not used, the use of aquestion
key can be confusing. - the approach to add keywords in the prompt (by adding
"[KEYWORDS]"
inself.prompt
and then performing some string manipulation) is confusing.
- a
- Some imports to LangChain are outdated (e.g. Documents, OpenAI).
Example of workarounds in current implementation
With the current implementation, a user wanting to use a custom LangChain prompt in a custom LCEL chain and add keywords to that prompt would have to do something like (ignoring that documents are passed as Document objects and not formatted into a str).
from bertopic.representation import LangChain
from langchain_core.prompts import ChatPromptTemplate
custom_prompt = ChatPromptTemplate.from_messages(
[
("system", "Custom instructions."),
("human", "Documents: {input_documents}, Keywords: {question}"),
]
)
chain = some_custom_chain_with_above_prompt
representation_model = LangChain(chain, prompt="[KEYWORDS]")
Related issues:
- The approach I propose to fix this would include an example to specify a system prompt Add system prompt for OpenAI representation model #2146
Your contribution
I propose several changes, which I have started working on in a branch (made a PR to make the diff easy to see).
- Update the examples so that
langchain.chains.question_answering.load_qa_chain
is replaced bylangchain.chains.combine_documents.stuff.create_stuff_documents_chain
as recommended in the migration guide. - This new approach still takes care of formatting the Document objects into the prompt, but the prompt must now be specified explicitly (instead of the implicit, hard-coded prompt of
langchain.chains.question_answering.load_qa_chain
). - Remove the ability to provide a prompt in the constructor of
LangChain
as the prompt must now be explicitly created with the chain object. - Rename the keys for consistency to
documents
,keywords
, andrepresentation
(note thatlangchain.chains.combine_documents.stuff.create_stuff_documents_chain
does not have aoutput_text
output key and therepresentation
key must thus be added). - Make it so that the
keywords
key is always provided to the chain (but it's up to the user to include a placeholder for it in their prompt).
Questions:
- Should we provide a new example prompt to replace
DEFAULT_PROMPT
? For examplehowever it could only be used directly inEXAMPLE_PROMPT = "What are these documents about? {documents}. Here are some keywords about them {keywords} Please give a single label."
langchain.chains.combine_documents.stuff.create_stuff_documents_chain
which takes care of formatting the documents.
youngwow
Metadata
Metadata
Assignees
Labels
No labels