Skip to content

[Maxim PY] Fixes media type handling in Langchain#113

Open
roroghost17 wants to merge 1 commit into
mainfrom
12-17-_maxim_py_fixes_media_type_handling_in_langchain
Open

[Maxim PY] Fixes media type handling in Langchain#113
roroghost17 wants to merge 1 commit into
mainfrom
12-17-_maxim_py_fixes_media_type_handling_in_langchain

Conversation

@roroghost17

@roroghost17 roroghost17 commented Dec 17, 2025

Copy link
Copy Markdown
Contributor

TL;DR

Fixed Langchain media type support for attachments in the Maxim logger.

What changed?

  • Added support for the media type in the parse_attachments_from_messages function
  • Updated tests to include examples of using the media type for attachments
  • Added test cases for image and audio file attachments using raw bytes
  • Updated model names in tests to use the latest available models
  • Bumped version from 3.14.2 to 3.14.3

How to test?

  1. Run the new test cases that demonstrate attachment handling:

    • test_human_message_with_image_file_attachment
    • test_human_message_with_image_url_attachment
    • test_human_message_with_multiple_image_attachments
    • test_human_message_with_audio_file_attachment
  2. Verify that attachments with media type are properly processed and included in the trace.

Why make this change?

Langchain uses the media type for certain attachments, but our logger wasn't handling this type correctly. This fix ensures that all attachment types from Langchain are properly captured and included in traces, improving compatibility with the Langchain ecosystem.

Copy link
Copy Markdown
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@coderabbitai

coderabbitai Bot commented Dec 17, 2025

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Enhanced support for media attachments including image and audio files
    • Improved image URL handling with automatic format conversion
  • Tests

    • Added comprehensive test coverage for multimodal message attachments
  • Chores

    • Updated LangChain dependency

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

The changes extend attachment processing in the logger utility to support media-type items with raw data and URL conversions, adds four new multimodal test cases covering image and audio attachment handling in LangChain, and updates the langchain dev dependency version.

Changes

Cohort / File(s) Summary
Media Attachment Handling
maxim/logger/components/utils.py
Enhanced attachment processing: introduces media_types set {"media", "file"} for identifying items, adds support for raw data attachments via item.get("data") with corresponding mime_type, converts non-data URLs to UrlAttachment (default image/png), preserves base64 data URI handling, consolidates attachment creation paths
Multimodal Test Coverage
maxim/tests/test_logger_langchain_03x.py
Added four new test methods for image/audio attachments (test_human_message_with_image_file_attachment, test_human_message_with_image_url_attachment, test_human_message_with_multiple_image_attachments, test_human_message_with_audio_file_attachment); refactored logger config from LoggerConfig(id=...) to dict form {"id": ...}; includes test file assets (png_image.png, wav_audio.wav); adds Maxim singleton cleanup in setUp; uses updated model variants (gpt-4o-mini, gpt-4o-audio-preview)
Dependencies
pyproject.toml
Bumps langchain dev dependency from 0.3.25 to 0.3.26

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hop hop, the attachments now flow,
With media types and data below,
Images, audio, URLs too—
New tests confirm what the logger can do!
A fluffy advance in the multimodal crew! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.75% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding media type handling support to the Langchain integration in the Maxim logger utility.
Description check ✅ Passed The description is directly related to the changeset, clearly explaining the media type support addition, test cases, and compatibility improvements with Langchain.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 12-17-_maxim_py_fixes_media_type_handling_in_langchain

Comment @coderabbitai help to get the list of available commands and usage tips.

@roroghost17 roroghost17 marked this pull request as ready for review December 17, 2025 12:03

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa387d2 and a51b666.

📒 Files selected for processing (4)
  • README.md (1 hunks)
  • maxim/logger/components/utils.py (2 hunks)
  • maxim/tests/test_logger_langchain_03x.py (27 hunks)
  • pyproject.toml (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • README.md
  • maxim/logger/components/utils.py
  • pyproject.toml
  • maxim/tests/test_logger_langchain_03x.py
🧬 Code graph analysis (2)
maxim/logger/components/utils.py (1)
maxim/logger/components/attachment.py (1)
  • FileDataAttachment (41-74)
maxim/tests/test_logger_langchain_03x.py (2)
maxim/maxim.py (2)
  • Maxim (128-1076)
  • logger (928-973)
maxim/logger/langchain/tracer.py (1)
  • MaximLangchainTracer (51-964)
🪛 Ruff (0.14.8)
maxim/tests/test_logger_langchain_03x.py

16-16: Redefinition of unused ChatPromptTemplate from line 14

(F811)


1074-1074: Local variable image_url is assigned to but never used

Remove assignment to unused variable image_url

(F841)

🔇 Additional comments (9)
pyproject.toml (1)

7-7: LGTM!

The version bump to 3.14.3 and the LangChain dependency update to 0.3.26 are appropriate for this bug fix release.

Also applies to: 41-41

README.md (1)

185-187: LGTM!

The changelog entry clearly documents the fix for Langchain media type attachment handling.

maxim/logger/components/utils.py (2)

55-55: LGTM!

The addition of media_types set is consistent with the existing image_types pattern.


105-107: LGTM!

The removal logic for media items is consistent with the existing image handling pattern and appropriately preserves tool messages.

maxim/tests/test_logger_langchain_03x.py (5)

8-8: LGTM!

The new imports for Path and HumanMessage are necessary for the multimodal attachment test cases.

Also applies to: 22-22


64-65: LGTM!

The setUp method properly ensures a clean Maxim singleton state between tests, which is good practice for test isolation.


69-73: LGTM!

The switch from LoggerConfig(id=repoId) to the dictionary form {"id": repoId} is valid and results in cleaner, more Pythonic code.

Also applies to: 86-90, 118-122, 134-138, 160-164


1019-1054: LGTM!

This test correctly demonstrates the standard LangChain image_url format with a URL-based image.


1099-1147: LGTM!

This test correctly demonstrates audio file attachment using the media type with an audio-capable model.

Comment on lines +99 to +104
attachment = FileDataAttachment(
data=item.get("data", ""),
mime_type=item.get("mime_type", ""),
tags={"attach-to": "input"},
)
attachments.append(attachment)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix type mismatch in default values.

Line 100 uses an empty string "" as the default for data, but FileDataAttachment expects data: bytes. This type mismatch could cause runtime errors if the data field is missing or invalid.

Apply this diff to use proper types and add validation:

-            attachment = FileDataAttachment(
-                data=item.get("data", ""),
-                mime_type=item.get("mime_type", ""),
-                tags={"attach-to": "input"},
-            )
-            attachments.append(attachment)
+            data = item.get("data")
+            mime_type = item.get("mime_type")
+            if data and isinstance(data, bytes):
+                attachment = FileDataAttachment(
+                    data=data,
+                    mime_type=mime_type if mime_type else None,
+                    tags={"attach-to": "input"},
+                )
+                attachments.append(attachment)
+            else:
+                scribe().debug(
+                    f"[MaximSDK] Error while parsing attachment: Invalid or missing data for media type"
+                )
+                continue

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In maxim/logger/components/utils.py around lines 99 to 104, the default value
for item.get("data", "") is an empty string but FileDataAttachment.data expects
bytes; change the default to b"" and ensure the value passed is bytes: if the
item value is already bytes use it, if it's a str attempt to base64-decode or at
minimum .encode("utf-8") (choose base64 decoding if data is encoded), and if it
is neither raise a TypeError; update the code to perform this
validation/conversion before constructing FileDataAttachment so attachments
always receive bytes.

Comment thread maxim/tests/test_logger_langchain_03x.py Outdated
Comment on lines +989 to +1006
message_with_image = HumanMessage(
content=[
{
"type": "text",
"text": "What do you see in this image? Describe it briefly.",
},
{
"type": "media",
"mime_type": "image/png",
"data": file_bytes, # Raw bytes for LangChain
},
{
"type": "image",
"mime_type": "image/png",
"data": file_bytes, # Raw bytes for LangChain
},
]
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "test_logger_langchain_03x.py" -type f 2>/dev/null | head -5

Repository: maximhq/maxim-py

Length of output: 102


🏁 Script executed:

sed -n '985,1010p' ./maxim/tests/test_logger_langchain_03x.py

Repository: maximhq/maxim-py

Length of output: 978


🏁 Script executed:

sed -n '989,1006p' ./maxim/tests/test_logger_langchain_03x.py

Repository: maximhq/maxim-py

Length of output: 678


🏁 Script executed:

sed -n '969,1018p' ./maxim/tests/test_logger_langchain_03x.py

Repository: maximhq/maxim-py

Length of output: 1774


Remove the duplicate "image" type block—the test explicitly targets "media" type only.

The test's docstring states it demonstrates "LangChain's support for raw bytes using the 'media' content type," yet the code includes both "media" and "image" types with identical bytes. LangChain's ImageContentBlock uses type "image", but it expects a url field, not a data field. The "image" type with a raw data field is non-standard and contradicts the test's stated purpose.

Remove the redundant block:

                 {
                     "type": "media",
                     "mime_type": "image/png",
                     "data": file_bytes,  # Raw bytes for LangChain
                 },
-                {
-                    "type": "image",
-                    "mime_type": "image/png",
-                    "data": file_bytes,  # Raw bytes for LangChain
-                },
             ]
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
message_with_image = HumanMessage(
content=[
{
"type": "text",
"text": "What do you see in this image? Describe it briefly.",
},
{
"type": "media",
"mime_type": "image/png",
"data": file_bytes, # Raw bytes for LangChain
},
{
"type": "image",
"mime_type": "image/png",
"data": file_bytes, # Raw bytes for LangChain
},
]
)
message_with_image = HumanMessage(
content=[
{
"type": "text",
"text": "What do you see in this image? Describe it briefly.",
},
{
"type": "media",
"mime_type": "image/png",
"data": file_bytes, # Raw bytes for LangChain
},
]
)
🤖 Prompt for AI Agents
In maxim/tests/test_logger_langchain_03x.py around lines 989 to 1006, the
message_with_image payload includes a redundant "image" type block with raw data
that contradicts the test's purpose and LangChain expectations; remove the third
dict (the {"type":"image",...} entry) so only the "media" block with raw bytes
remains, keeping the "text" and "media" entries intact and preserving the test
docstring that demonstrates raw bytes via the "media" content type.

Comment on lines +1055 to +1098
def test_human_message_with_multiple_image_attachments(self):
"""Test LangChain chat model with multiple images attached in human message.

This test demonstrates mixing raw bytes (local file) with URL-based images.
"""
logger = self.maxim.logger(
{
"id": repoId,
}
)

# Get the path to the test image file
test_files_dir = Path(__file__).parent / "files"
image_path = test_files_dir / "png_image.png"

# Read the file as raw bytes
file_bytes = image_path.read_bytes()

# Use a public image URL as second image
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/PNG_transparency_demonstration_1.png/300px-PNG_transparency_demonstration_1.png"

# Create a multimodal message with text and multiple images
# Mix raw bytes (local file) with URL-based image
message_with_images = HumanMessage(
content=[
{
"type": "media",
"mime_type": "image/png",
"data": file_bytes, # Raw bytes for local file
},
]
)

model = ChatOpenAI(
callbacks=[MaximLangchainTracer(logger)],
api_key=openAIKey,
model="gpt-4o-mini",
)

result = model.invoke([message_with_images])
print(f"Result with multiple image attachments: {result}")

logger.flush()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test implementation incomplete.

The test is named test_human_message_with_multiple_image_attachments and includes a comment stating it will "Mix raw bytes (local file) with URL-based image", but:

  1. Only one image is included in the content array (lines 1078-1086)
  2. The image_url variable defined on line 1074 is never used
  3. The test doesn't actually demonstrate multiple attachments

Apply this diff to complete the test as intended:

         # Create a multimodal message with text and multiple images
         # Mix raw bytes (local file) with URL-based image
         message_with_images = HumanMessage(
             content=[
+                {
+                    "type": "text",
+                    "text": "Compare these two images. What are the differences?",
+                },
                 {
                     "type": "media",
                     "mime_type": "image/png",
                     "data": file_bytes,  # Raw bytes for local file
                 },
+                {
+                    "type": "image_url",
+                    "image_url": {"url": image_url},
+                },
             ]
         )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_human_message_with_multiple_image_attachments(self):
"""Test LangChain chat model with multiple images attached in human message.
This test demonstrates mixing raw bytes (local file) with URL-based images.
"""
logger = self.maxim.logger(
{
"id": repoId,
}
)
# Get the path to the test image file
test_files_dir = Path(__file__).parent / "files"
image_path = test_files_dir / "png_image.png"
# Read the file as raw bytes
file_bytes = image_path.read_bytes()
# Use a public image URL as second image
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/PNG_transparency_demonstration_1.png/300px-PNG_transparency_demonstration_1.png"
# Create a multimodal message with text and multiple images
# Mix raw bytes (local file) with URL-based image
message_with_images = HumanMessage(
content=[
{
"type": "media",
"mime_type": "image/png",
"data": file_bytes, # Raw bytes for local file
},
]
)
model = ChatOpenAI(
callbacks=[MaximLangchainTracer(logger)],
api_key=openAIKey,
model="gpt-4o-mini",
)
result = model.invoke([message_with_images])
print(f"Result with multiple image attachments: {result}")
logger.flush()
def test_human_message_with_multiple_image_attachments(self):
"""Test LangChain chat model with multiple images attached in human message.
This test demonstrates mixing raw bytes (local file) with URL-based images.
"""
logger = self.maxim.logger(
{
"id": repoId,
}
)
# Get the path to the test image file
test_files_dir = Path(__file__).parent / "files"
image_path = test_files_dir / "png_image.png"
# Read the file as raw bytes
file_bytes = image_path.read_bytes()
# Use a public image URL as second image
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/PNG_transparency_demonstration_1.png/300px-PNG_transparency_demonstration_1.png"
# Create a multimodal message with text and multiple images
# Mix raw bytes (local file) with URL-based image
message_with_images = HumanMessage(
content=[
{
"type": "text",
"text": "Compare these two images. What are the differences?",
},
{
"type": "media",
"mime_type": "image/png",
"data": file_bytes, # Raw bytes for local file
},
{
"type": "image_url",
"image_url": {"url": image_url},
},
]
)
model = ChatOpenAI(
callbacks=[MaximLangchainTracer(logger)],
api_key=openAIKey,
model="gpt-4o-mini",
)
result = model.invoke([message_with_images])
print(f"Result with multiple image attachments: {result}")
logger.flush()
🧰 Tools
🪛 Ruff (0.14.8)

1074-1074: Local variable image_url is assigned to but never used

Remove assignment to unused variable image_url

(F841)

🤖 Prompt for AI Agents
In maxim/tests/test_logger_langchain_03x.py around lines 1055 to 1098, the test
claims to mix raw bytes with a URL-based image but only includes one media item
and never uses image_url; update the HumanMessage content to include two media
entries (first with the existing raw bytes and second with the image_url
string), keeping appropriate mime_type for the URL entry (e.g., "image/png"), so
the content list demonstrates multiple attachments and image_url is actually
used before invoking the model.

Comment thread maxim/tests/test_logger_langchain_03x.py
@roroghost17 roroghost17 force-pushed the 12-17-_maxim_py_fixes_media_type_handling_in_langchain branch from a51b666 to a6e52ed Compare December 18, 2025 02:54

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
maxim/tests/test_logger_langchain_03x.py (1)

936-964: Remove unused logger variable.

Line 936 creates a logger instance that is never used in test_multi_node_langchain_chain. The chain invocation at line 961 doesn't pass any callbacks, so the logger serves no purpose.

🔎 Apply this diff to remove the unused logger:
     def test_multi_node_langchain_chain(self):
-        logger = self.maxim.logger(
-            {
-                "id": repoId,
-            }
-        )
         llm = ChatOpenAI(temperature=0.7, model="gpt-4o", n=3, api_key=openAIKey)
 
         # Define a simple function to use in the chain

As per static analysis hints, the logger is assigned but never used.

♻️ Duplicate comments (4)
maxim/tests/test_logger_langchain_03x.py (2)

1133-1133: Inconsistent logger initialization style.

This test uses LoggerConfigDict(id=repoId) while all other tests in this file have been updated to use the simpler dict form {"id": repoId}.

🔎 Apply this diff for consistency:
-        logger = self.maxim.logger(LoggerConfigDict(id=repoId))
+        logger = self.maxim.logger({"id": repoId})

Based on past review comments, this inconsistency was previously flagged.


1038-1081: Test implementation incomplete—unused image_url variable.

The test is named test_human_message_with_multiple_image_attachments and includes a comment about mixing raw bytes with URL-based images (line 1060), but:

  1. Only one image attachment is included in the message content (lines 1064-1067)
  2. The image_url variable defined on line 1057 is never used
🔎 Complete the test by adding the URL-based image:
         # Create a multimodal message with text and multiple images
         # Mix raw bytes (local file) with URL-based image
         message_with_images = HumanMessage(
             content=[
+                {
+                    "type": "text",
+                    "text": "Compare these two images. What are the differences?",
+                },
                 {
                     "type": "media",
                     "mime_type": "image/png",
                     "data": file_bytes,  # Raw bytes for local file
                 },
+                {
+                    "type": "image_url",
+                    "image_url": {"url": image_url},
+                },
             ]
         )

As per static analysis hints and past review comments, image_url is assigned but never used.

maxim/logger/components/utils.py (2)

94-100: Fix type mismatch in default value for data.

Line 96 uses item.get("data", "") which returns an empty string as the default, but FileDataAttachment.data expects bytes. This will cause a runtime type error if the data field is missing or None.

🔎 Apply this diff to validate and convert data to bytes:
-            elif item.get("data") is not None:
-                attachment = FileDataAttachment(
-                    data=item.get("data", ""),
-                    mime_type=item.get("mime_type", ""),
-                    tags={"attach-to": "input"},
-                )
-                attachments.append(attachment)
+            elif item.get("data") is not None:
+                data = item.get("data")
+                if isinstance(data, bytes):
+                    attachment = FileDataAttachment(
+                        data=data,
+                        mime_type=item.get("mime_type") or None,
+                        tags={"attach-to": "input"},
+                    )
+                    attachments.append(attachment)
+                else:
+                    scribe().debug(
+                        f"[MaximSDK] Error while parsing attachment: Expected bytes for data field, got {type(data).__name__}"
+                    )

Based on past review comments, this issue was previously flagged for lines 104-109 but also applies here.


101-109: Eliminate duplicate code and fix type mismatch.

This block duplicates the logic from lines 95-100, creating identical FileDataAttachment objects. Additionally, line 105 has the same type mismatch issue: item.get("data", "") returns a string, but FileDataAttachment.data requires bytes.

🔎 Consolidate the two branches:

Since both image_types items with data fields and media_types items are handled identically, merge the conditions:

-                elif item.get("data") is not None:
-                    attachment = FileDataAttachment(
-                        data=item.get("data", ""),
-                        mime_type=item.get("mime_type", ""),
-                        tags={"attach-to": "input"},
-                    )
-                    attachments.append(attachment)
-            elif isinstance(item, dict) and (
-                item.get("type") in media_types
-            ):
-                attachment = FileDataAttachment(
-                    data=item.get("data", ""),
-                    mime_type=item.get("mime_type", ""),
-                    tags={"attach-to": "input"},
-                )
-                attachments.append(attachment)
+                elif item.get("data") is not None:
+                    data = item.get("data")
+                    if isinstance(data, bytes):
+                        attachment = FileDataAttachment(
+                            data=data,
+                            mime_type=item.get("mime_type") or None,
+                            tags={"attach-to": "input"},
+                        )
+                        attachments.append(attachment)
+                    else:
+                        scribe().debug(
+                            f"[MaximSDK] Error while parsing attachment: Expected bytes for data field, got {type(data).__name__}"
+                        )
+            elif isinstance(item, dict) and item.get("type") in media_types:
+                data = item.get("data")
+                if isinstance(data, bytes):
+                    attachment = FileDataAttachment(
+                        data=data,
+                        mime_type=item.get("mime_type") or None,
+                        tags={"attach-to": "input"},
+                    )
+                    attachments.append(attachment)
+                else:
+                    scribe().debug(
+                        f"[MaximSDK] Error while parsing attachment: Expected bytes for data field, got {type(data).__name__}"
+                    )

Based on past review comments, the type mismatch was previously flagged and remains unresolved.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a51b666 and a6e52ed.

📒 Files selected for processing (4)
  • README.md (1 hunks)
  • maxim/logger/components/utils.py (3 hunks)
  • maxim/tests/test_logger_langchain_03x.py (27 hunks)
  • pyproject.toml (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • README.md
  • pyproject.toml
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • maxim/tests/test_logger_langchain_03x.py
  • maxim/logger/components/utils.py
🧬 Code graph analysis (2)
maxim/tests/test_logger_langchain_03x.py (2)
maxim/maxim.py (1)
  • logger (928-973)
maxim/logger/logger.py (2)
  • LoggerConfigDict (79-82)
  • flush (931-935)
maxim/logger/components/utils.py (2)
maxim/logger/components/attachment.py (2)
  • UrlAttachment (78-111)
  • FileDataAttachment (41-74)
maxim/logger/components/base.py (1)
  • data (355-370)
🪛 Ruff (0.14.8)
maxim/tests/test_logger_langchain_03x.py

936-936: Local variable logger is assigned to but never used

Remove assignment to unused variable logger

(F841)


1057-1057: Local variable image_url is assigned to but never used

Remove assignment to unused variable image_url

(F841)

🔇 Additional comments (7)
maxim/logger/components/utils.py (1)

55-55: LGTM: Media types support added.

The addition of media_types set appropriately extends attachment handling to include LangChain's media and file content types.

maxim/tests/test_logger_langchain_03x.py (6)

5-28: LGTM: Imports updated for LangChain 0.3.x and new functionality.

The imports have been appropriately updated to include base64, Path, HumanMessage, RunnableLambda, and consolidated prompt imports, aligning with the new multimodal test requirements and LangChain 0.3.x API.


61-62: LGTM: Singleton reset ensures test isolation.

Clearing the Maxim singleton instance in setUp prevents state leakage between tests, which is essential for reliable test execution.


66-70: LGTM: Logger initialization migrated to dict form.

The migration from LoggerConfig class to dict-based configuration (e.g., {"id": repoId}) is consistent with the updated API pattern and improves code simplicity.

Also applies to: 83-87, 115-119, 131-135, 157-161


966-1001: LGTM: Image file attachment test validates media type handling.

This test correctly demonstrates raw bytes attachment using the media content type, which validates the new functionality added to parse_attachments_from_messages. The test reads a PNG file as bytes and attaches it directly without base64 encoding, matching LangChain's expected format.


1002-1037: LGTM: Image URL attachment test covers URL-based attachments.

This test validates URL-based image attachments using the image_url content type, ensuring both file-based and URL-based attachment flows are covered.


1082-1130: LGTM: Audio file attachment test extends media type coverage.

This test validates audio attachment handling using the media content type with raw WAV bytes, demonstrating that the new media type support works for non-image formats. The use of gpt-4o-audio-preview is appropriate for audio input processing.

@roroghost17 roroghost17 force-pushed the 12-17-_maxim_py_fixes_media_type_handling_in_langchain branch from a6e52ed to 4bfe813 Compare January 5, 2026 06:43

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @maxim/tests/test_logger_langchain_03x.py:
- Line 464: Replace the invalid Anthropic model identifier "claude-4-sonnet"
used in the test(s) with a valid Claude Sonnet 4 identifier (e.g. use the alias
"claude-sonnet-4-0" or the snapshot "claude-sonnet-4-20250514"); update every
occurrence of the literal "claude-4-sonnet" (the model_name value) so the
instantiated API clients use a supported model id and the tests stop failing
when calling Anthropic.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
maxim/tests/test_logger_langchain_03x.py (1)

935-965: Unused logger: integrate with MaximLangchainTracer or remove.

The logger is created on line 936-940 but never used. Line 961 invokes the chain without passing the logger to callbacks. Either add config={"callbacks": [MaximLangchainTracer(logger)]} to the chain invocation, or remove the logger initialization.

🔎 Proposed fix to use the logger
         # Run the chain with input
         input_text = "This is a test of a multi-node Langchain."
         result = chain.invoke(
-            input_text 
+            input_text,
+            config={"callbacks": [MaximLangchainTracer(logger)]},
         )
♻️ Duplicate comments (4)
maxim/tests/test_logger_langchain_03x.py (2)

1133-1133: Inconsistent logger initialization style.

Line 1133 uses LoggerConfigDict(id=repoId) while all other tests in this file use the dictionary form {"id": repoId}. Update for consistency.

🔎 Proposed fix
-        logger = self.maxim.logger(LoggerConfigDict(id=repoId))
+        logger = self.maxim.logger({"id": repoId})

Based on past review comments.


1038-1081: Test incomplete: add the second image to demonstrate multiple attachments.

The test name and docstring indicate mixing raw bytes with URL-based images, but the image_url variable (line 1057) is never used. The content array only contains one media item. Add the second image to fulfill the test's stated purpose.

🔎 Proposed fix
         # Create a multimodal message with text and multiple images
         # Mix raw bytes (local file) with URL-based image
         message_with_images = HumanMessage(
             content=[
+                {
+                    "type": "text",
+                    "text": "Compare these two images. What are the differences?",
+                },
                 {
                     "type": "media",
                     "mime_type": "image/png",
                     "data": file_bytes,  # Raw bytes for local file
                 },
+                {
+                    "type": "image_url",
+                    "image_url": {"url": image_url},
+                },
             ]
         )

Based on past review comments.

maxim/logger/components/utils.py (2)

94-100: Fix type mismatch: data field expects bytes, not string.

Line 96 uses item.get("data", "") which defaults to an empty string, but FileDataAttachment expects data: bytes. This type mismatch will cause runtime errors when the data field is missing or invalid.

🔎 Proposed fix with type validation
-                elif item.get("data") is not None:
-                    attachment = FileDataAttachment(
-                        data=item.get("data", ""),
-                        mime_type=item.get("mime_type", ""),
-                        tags={"attach-to": "input"},
-                    )
-                    attachments.append(attachment)
+                elif item.get("data") is not None:
+                    data = item.get("data")
+                    mime_type = item.get("mime_type")
+                    if data and isinstance(data, bytes):
+                        attachment = FileDataAttachment(
+                            data=data,
+                            mime_type=mime_type if mime_type else None,
+                            tags={"attach-to": "input"},
+                        )
+                        attachments.append(attachment)
+                    else:
+                        scribe().debug(
+                            f"[MaximSDK] Error while parsing attachment: Invalid or missing data for image type"
+                        )

Based on past review comments, this same issue exists at lines 104-109.


101-109: Fix type mismatch in media_types handling.

Lines 105-106 use item.get("data", "") and item.get("mime_type", "") with string defaults, but FileDataAttachment expects data: bytes. This is the same critical issue flagged in the past review comment.

🔎 Proposed fix with type validation
             elif isinstance(item, dict) and (
                 item.get("type") in media_types
             ):
-                attachment = FileDataAttachment(
-                    data=item.get("data", ""),
-                    mime_type=item.get("mime_type", ""),
-                    tags={"attach-to": "input"},
-                )
-                attachments.append(attachment)
+                data = item.get("data")
+                mime_type = item.get("mime_type")
+                if data and isinstance(data, bytes):
+                    attachment = FileDataAttachment(
+                        data=data,
+                        mime_type=mime_type if mime_type else None,
+                        tags={"attach-to": "input"},
+                    )
+                    attachments.append(attachment)
+                else:
+                    scribe().debug(
+                        f"[MaximSDK] Error while parsing attachment: Invalid or missing data for media type"
+                    )

Based on past review comments.

🧹 Nitpick comments (3)
maxim/logger/components/utils.py (2)

92-93: Consider making the mime_type more flexible for URL attachments.

Line 92 hardcodes mime_type="image/png" for all non-data-URI image URLs. This assumption may not hold for JPEGs, WebP, GIFs, or other formats. Consider using None or attempting to infer the type from the URL extension.

💡 Optional refactor to infer mime_type from URL
                    else:
-                        attachment = UrlAttachment(url=url, mime_type="image/png", tags={"attach-to": "input"})
+                        # Try to infer mime_type from URL extension
+                        import mimetypes
+                        mime_type = mimetypes.guess_type(url)[0] or "image/png"
+                        attachment = UrlAttachment(url=url, mime_type=mime_type, tags={"attach-to": "input"})
                        attachments.append(attachment)

94-109: Consolidate duplicate attachment creation logic.

Lines 94-100 and 101-109 contain nearly identical logic for creating FileDataAttachment from raw data. Consider extracting this into a helper function or consolidating the branches to reduce duplication.

🔎 Proposed consolidation
+                # Handle raw data attachments for both image_types and media_types
                 if url:
                     # Check if its base64 encoded data uri
                     if url.startswith("data:image"):
                         # Extract base64 data from data URI
                         match = re.match(
                             r"data:image/(?P<ext>\w+);base64,(?P<data>.+)",
                             url,
                         )
                         if match:
                             ext = match.group("ext")
                             data = match.group("data")
                             try:
                                 file_data = base64.b64decode(data)
                             except Exception as e:
                                 scribe().error(
                                     f"[MaximSDK] Error while parsing attachment: {str(e)}"
                                 )
                                 continue
                             attachment = FileDataAttachment(
                                 data=file_data,
                                 mime_type=f"image/{ext}",
                                 tags={"attach-to": "input"},
                             )
                             attachments.append(attachment)
                     else:
                         attachment = UrlAttachment(url=url, mime_type="image/png", tags={"attach-to": "input"})
                         attachments.append(attachment)
                 elif item.get("data") is not None:
+                    data = item.get("data")
+                    mime_type = item.get("mime_type")
+                    if data and isinstance(data, bytes):
+                        attachment = FileDataAttachment(
+                            data=data,
+                            mime_type=mime_type if mime_type else None,
+                            tags={"attach-to": "input"},
+                        )
+                        attachments.append(attachment)
+                    else:
+                        scribe().debug(
+                            f"[MaximSDK] Error while parsing attachment: Invalid or missing data"
+                        )
-                    attachment = FileDataAttachment(
-                        data=item.get("data", ""),
-                        mime_type=item.get("mime_type", ""),
-                        tags={"attach-to": "input"},
-                    )
-                    attachments.append(attachment)
             elif isinstance(item, dict) and (
                 item.get("type") in media_types
             ):
+                data = item.get("data")
+                mime_type = item.get("mime_type")
+                if data and isinstance(data, bytes):
+                    attachment = FileDataAttachment(
+                        data=data,
+                        mime_type=mime_type if mime_type else None,
+                        tags={"attach-to": "input"},
+                    )
+                    attachments.append(attachment)
+                else:
+                    scribe().debug(
+                        f"[MaximSDK] Error while parsing attachment: Invalid or missing data for media type"
+                    )
-                attachment = FileDataAttachment(
-                    data=item.get("data", ""),
-                    mime_type=item.get("mime_type", ""),
-                    tags={"attach-to": "input"},
-                )
-                attachments.append(attachment)
maxim/tests/test_logger_langchain_03x.py (1)

966-1001: Add Maxim integration to verify attachment tracing.

This test verifies LangChain's media type support but doesn't integrate with Maxim's logger. Since the PR objective is to ensure media-type attachments are captured in traces, consider adding MaximLangchainTracer(logger) to verify end-to-end functionality.

💡 Suggested enhancement
     def test_human_message_with_image_file_attachment(self):
         """Test LangChain chat model with image file attached in human message using raw bytes.
 
         This test demonstrates LangChain's support for raw bytes using the 'media' content type.
         No base64 encoding is required - just read the file as bytes and pass directly.
         """
+        logger = self.maxim.logger(
+            {
+                "id": repoId,
+            }
+        )
+
         # Get the path to the test image file
         test_files_dir = Path(__file__).parent / "files"
         image_path = test_files_dir / "png_image.png"
 
         # Read the file as raw bytes - no base64 encoding needed!
         file_bytes = image_path.read_bytes()
 
         # Create a multimodal message with text and image using raw bytes
         message_with_image = HumanMessage(
             content=[
                 {
                     "type": "text",
                     "text": "What do you see in this image? Describe it briefly.",
                 },
                 {
                     "type": "media",
                     "mime_type": "image/png",
                     "data": file_bytes,  # Raw bytes for LangChain
                 },
             ]
         )
 
         model = ChatOpenAI(
+            callbacks=[MaximLangchainTracer(logger)],
             api_key=openAIKey,
             model="gpt-4o-mini",
         )
 
         result = model.invoke([message_with_image])
         print(f"Result with image attachment: {result}")
+
+        logger.flush()
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6e52ed and 4bfe813.

📒 Files selected for processing (3)
  • maxim/logger/components/utils.py
  • maxim/tests/test_logger_langchain_03x.py
  • pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

always check the stack if there is one for the current PR. do not give localized reviews for the PR, always see all changes in the light of the whole stack of PRs (if there is a stack, if there is no stack you can continue to make localized suggestions/reviews)

Files:

  • maxim/logger/components/utils.py
  • maxim/tests/test_logger_langchain_03x.py
🧬 Code graph analysis (1)
maxim/tests/test_logger_langchain_03x.py (2)
maxim/maxim.py (1)
  • logger (928-973)
maxim/logger/logger.py (2)
  • LoggerConfigDict (79-82)
  • flush (956-960)
🪛 Ruff (0.14.10)
maxim/tests/test_logger_langchain_03x.py

936-936: Local variable logger is assigned to but never used

Remove assignment to unused variable logger

(F841)


1057-1057: Local variable image_url is assigned to but never used

Remove assignment to unused variable image_url

(F841)

model = AnthropicLLM(
api_key=anthropicApiKey,
model_name="claude-3-5-sonnet-20240620",
model_name="claude-4-sonnet",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Anthropic Claude 4 Sonnet model identifier API 2025 2026

💡 Result:

Claude Sonnet 4 API model identifiers (2025–2026):

  • Specific snapshot: claude-sonnet-4-20250514. [1][2]
  • Alias: claude-sonnet-4-0 (alias pointing to the snapshot). [2]

(You can list/confirm available models via GET /v1/models in Anthropic's API; example responses show the snapshot id above.) [1]

Sources: Anthropic docs — Models list / Models overview / Migrating to Claude 4. [1][2][3]


🏁 Script executed:

# Check the test file to understand context and all usages of model names
cd maxim/tests && head -n 50 test_logger_langchain_03x.py

Repository: maximhq/maxim-py

Length of output: 1768


🏁 Script executed:

# Search for all occurrences of model_name in the test file
rg -n "model_name" maxim/tests/test_logger_langchain_03x.py | head -20

Repository: maximhq/maxim-py

Length of output: 343


🏁 Script executed:

# Check if these are mocked tests or real API calls
rg -B5 -A5 "claude-4-sonnet" maxim/tests/test_logger_langchain_03x.py | head -40

Repository: maximhq/maxim-py

Length of output: 1032


Fix invalid Anthropic model identifier "claude-4-sonnet".

The model name "claude-4-sonnet" (lines 464, 487, 511, 613, 638) is not a valid Anthropic API identifier. According to Anthropic's API documentation, Claude Sonnet 4 uses either "claude-sonnet-4-20250514" (specific snapshot) or "claude-sonnet-4-0" (alias). These tests instantiate real API clients and will fail when attempting to call the API with an invalid model identifier.

Also applies to: 487-487, 511-511, 613-613, 638-638

🤖 Prompt for AI Agents
In @maxim/tests/test_logger_langchain_03x.py around line 464, Replace the
invalid Anthropic model identifier "claude-4-sonnet" used in the test(s) with a
valid Claude Sonnet 4 identifier (e.g. use the alias "claude-sonnet-4-0" or the
snapshot "claude-sonnet-4-20250514"); update every occurrence of the literal
"claude-4-sonnet" (the model_name value) so the instantiated API clients use a
supported model id and the tests stop failing when calling Anthropic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant