Open
Description
I want to use streaming with chat history
# Import the Llama class of llama-cpp-python and the LlamaCppPythonProvider of llama-cpp-agent
from llama_cpp import Llama
from llama_cpp_agent.providers import LlamaCppPythonProvider
from llama_cpp_agent.chat_history import BasicChatHistory, BasicChatMessageStore, BasicChatHistoryStrategy
# Create an instance of the Llama class and load the model
llama_model = Llama("gemma-2-2b-it-IQ3_M.gguf", n_batch=1024, n_threads=10, n_gpu_layers=0)
# llama_model = Llama("gemma-2-9b-it-IQ2_M.gguf", n_batch=1024, n_threads=10, n_gpu_layers=40)
# Create the provider by passing the Llama class instance to the LlamaCppPythonProvider class
provider = LlamaCppPythonProvider(llama_model)
from llama_cpp_agent import LlamaCppAgent
from llama_cpp_agent import MessagesFormatterType
# Pass the provider to the LlamaCppAgentClass and define the system prompt and predefined message formatter
agent = LlamaCppAgent(provider,
system_prompt="You are a helpful assistant.",
predefined_messages_formatter_type=MessagesFormatterType.CHATML)
settings = provider.get_provider_default_settings()
settings.stream = True
settings.temperature = 0.1
# Create a message store for the chat history
chat_history_store = BasicChatMessageStore()
# Create the actual chat history, by passing the wished chat history strategy, it can be last_k_message or last_k_tokens. The default strategy will be to use the 20 last messages for the chat history.
# We will use the last_k_tokens strategy which will include the last k tokens into the chat history. When we use this strategy, we will have to pass the provider to the class.
chat_history = BasicChatHistory(message_store=chat_history_store, chat_history_strategy=BasicChatHistoryStrategy.last_k_tokens, k=7000, llm_provider=provider)
agent_output = agent.get_chat_response("neler yapabiliyorsun", llm_sampling_settings=settings)
agent_output.strip()
Metadata
Assignees
Labels
No labels
Activity