Skip to content

Update LangChain Support #2187

@Skar0

Description

@Skar0

Feature request

The provided examples that leverage LangChain to create a representation all make use of langchain.chains.question_answering.load_qa_chain and the implementation is not very transparent to the user, leading to inconsistencies and difficulties to understand how to provide custom chains.

Motivation

Some of the issues in detail

  • langchain.chains.question_answering.load_qa_chain is now depricated and will be removed at some point.
  • The current LangChain integration is not very clear because
    • a prompt can be specified in the constructor of the LangChain class. However this is not a prompt but rather a custom instruction that is passed to the provided chain through the question key.
    • in the case of langchain.chains.question_answering.load_qa_chain (which is the provided example), this question key is added as part of a larger, hard-coded (and not transparent to a casual user) prompt.
    • if a user wants to fully customize the instructions to create the representation, it would be best not to use the langchain.chains.question_answering.load_qa_chain chain to avoid this hard-coded prompt (this is currently not very clearly documented). In addition, if that specific chain is not used, the use of a question key can be confusing.
    • the approach to add keywords in the prompt (by adding "[KEYWORDS]" in self.prompt and then performing some string manipulation) is confusing.
  • Some imports to LangChain are outdated (e.g. Documents, OpenAI).

Example of workarounds in current implementation

With the current implementation, a user wanting to use a custom LangChain prompt in a custom LCEL chain and add keywords to that prompt would have to do something like (ignoring that documents are passed as Document objects and not formatted into a str).

from bertopic.representation import LangChain
from langchain_core.prompts import ChatPromptTemplate

custom_prompt = ChatPromptTemplate.from_messages(
        [
            ("system", "Custom instructions."),
            ("human", "Documents: {input_documents}, Keywords: {question}"),
        ]
    )

chain = some_custom_chain_with_above_prompt

representation_model  = LangChain(chain, prompt="[KEYWORDS]")

Related issues:

Your contribution

I propose several changes, which I have started working on in a branch (made a PR to make the diff easy to see).

  • Update the examples so that langchain.chains.question_answering.load_qa_chain is replaced by langchain.chains.combine_documents.stuff.create_stuff_documents_chain as recommended in the migration guide.
  • This new approach still takes care of formatting the Document objects into the prompt, but the prompt must now be specified explicitly (instead of the implicit, hard-coded prompt of langchain.chains.question_answering.load_qa_chain).
  • Remove the ability to provide a prompt in the constructor of LangChain as the prompt must now be explicitly created with the chain object.
  • Rename the keys for consistency to documents, keywords, and representation (note that langchain.chains.combine_documents.stuff.create_stuff_documents_chain does not have a output_text output key and the representation key must thus be added).
  • Make it so that the keywords key is always provided to the chain (but it's up to the user to include a placeholder for it in their prompt).

Questions:

  • Should we provide a new example prompt to replace DEFAULT_PROMPT? For example
    EXAMPLE_PROMPT = "What are these documents about? {documents}. Here are some keywords about them {keywords} Please give a single label."
    however it could only be used directly in langchain.chains.combine_documents.stuff.create_stuff_documents_chain which takes care of formatting the documents.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions