Closed
Description
Checked other resources
- I added a very descriptive title to this issue.
- I searched the LangChain documentation with the integrated search.
- I used the GitHub search to find a similar question and didn't find it.
- I am sure that this is a bug in LangChain rather than my code.
- The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
schema_to_be_extracted = {
"type": "object",
"items": {
"type": "object",
"required": [],
"properties": {
"title": {
"type": "string",
"description": "item title"
},
"due_date": {
"type": "string",
"description": "item due date, could be as closing date, due date, open until or any other format that indicates the end date of that item"
}
},
"description": "foo"
},
"description": "A list of data."
}
from typing import Callable, Dict, Any, List
from langchain_core.tools import tool
def get_extraction_tools(schema_to_be_extracted: Dict[str, Any]) -> List[Callable]:
"""
Get the extraction tools for extracting structured data.
Returns:
A list of LangChain tool functions for extraction
"""
@tool(args_schema=schema_to_be_extracted)
def extract_data(extracted_data: Dict[str, Any]) -> Dict[str, Any]:
"""
Extract structured data from the provided content according to the JSON schema.
Args:
extracted_data: Dictionary containing the extracted structured data
"""
return extracted_data
return [extract_data]
system_prompt = """
You are a helpful assistant that extracts structured data from a given content.
"""
content = """
# Task List
| Title | Start Date | Due Date |
|-----------------|------------|------------|
| Project Alpha | 2025-03-01 | 2025-03-15 |
| Design Update | 2025-03-05 | 2025-03-20 |
| Code Review | 2025-03-10 | 2025-03-17 |
| Marketing Plan | 2025-03-12 | 2025-03-25 |
| Final Testing | 2025-03-18 | 2025-03-28 |
"""
from langchain.chat_models import init_chat_model
from langchain_core.messages import SystemMessage, HumanMessage
from langgraph.func import entrypoint
@entrypoint()
async def extract_data(
input: Dict[str, Any]
) -> Any:
"""
Extract structured data from content using LangChain Anthropic
Args:
schema_to_be_extracted: The JSON schema defining the structure of data to be extracted
"""
schema_to_be_extracted: Dict[str, Any] = input.get("schema_to_be_extracted")
# Get the appropriate LLM based on the model name
language_model = init_chat_model(model="claude-3-5-sonnet-20241022")
# Get the extraction tools
extraction_tools = get_extraction_tools(schema_to_be_extracted)
# Language model with tools
language_model_with_tools = language_model.bind_tools(
tools=extraction_tools,
tool_choice="any"
)
messages = [
SystemMessage(content=system_prompt),
HumanMessage(content=content)
]
# Invoke the model with the messages
response = await language_model_with_tools.ainvoke(messages)
return response
extraction_result = await extract_data.ainvoke(input={"schema_to_be_extracted": schema_to_be_extracted})
print(extraction_result.content[0]['input']['items'])
Error Message and Stack Trace (if applicable)
Here is Langsmith Trace
Description
Actual Results
[
{"due_date": "2025-03-15"},
{"due_date": "2025-03-20"},
{"due_date": "2025-03-17"},
{"due_date": "2025-03-25"},
{"due_date": "2025-03-28"}
]
Expected Results
[
{"due_date": "2025-03-15", "title": "Project Alpha"},
{"due_date": "2025-03-20", "title": "Design Update"},
{"due_date": "2025-03-17", "title": "Code Review"},
{"due_date": "2025-03-25", "title": "Marketing Plan"},
{"due_date": "2025-03-28", "title": "Final Testing"}
]
System Info
Package Information
langchain_core: 0.3.45
langchain: 0.3.20
langsmith: 0.3.15
langchain_anthropic: 0.3.9
langchain_google_genai: 2.0.10
langchain_openai: 0.3.9
langchain_text_splitters: 0.3.7
langgraph_sdk: 0.1.57