🐛 Regression in v1.2.4: Multimodal `input_audio` in `HumanMessage` is flattened into text

### Checked other resources

- [x] This is a bug, not a usage question. For questions, please use the LangChain Forum (https://forum.langchain.com/).
- [x] I added a very descriptive title to this issue.
- [x] I searched the LangChain.js documentation with the integrated search.
- [x] I used the GitHub search to find a similar question and didn't find it.
- [x] I am sure that this is a bug in LangChain.js rather than my code.
- [x] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

### Example Code

### Reproduction

``` ts
const userInput = new HumanMessage({
  content: [
    { type: 'text', text: 'a' },
    {
      type: 'input_audio',
      input_audio: {
        data: castAudioContent.data, // base64
        format: 'wav',
      },
    },
    { type: 'text', text: 'b' },
    {
      type: 'input_audio',
      input_audio: {
        data: castAudioContent.data, // base64
        format: 'wav',
      },
    },
  ],
});
```

### Error Message and Stack Trace (if applicable)

### Behavior before v1.2.4

Observed in `handleLLMStart`:

``` ts
human: [
  { type: 'text', text: 'a' },
  {
    type: 'input_audio',
    input_audio: {
      data: 'base64...',
      format: 'wav',
    },
  },
  { type: 'text', text: 'b' },
  {
    type: 'input_audio',
    input_audio: {
      data: 'base64...',
      format: 'wav',
    },
  },
]
```

The multimodal structure is preserved and audio input works as expected.

------------------------------------------------------------------------

### Behavior since v1.2.4

Observed in `handleLLMStart`:

``` ts
human: "ab"
```

All `input_audio` segments are dropped, the content is flattened into
plain text, and audio input no longer works.


### Description

### Summary

Starting from **LangChain v1.2.4**, multimodal `HumanMessage.content`
(for example, mixing `text` and `input_audio`) is flattened into a plain
text string when observed in `handleLLMStart`.

As a result, `input_audio` segments are dropped entirely and audio input
no longer works.

This behavior is different from versions **before v1.2.4**, where the
original structured content array was preserved.

### Expected Behavior

-   `HumanMessage.content` should preserve its original multimodal
    structure in `handleLLMStart`
-   Alternatively, a new hook or documented mechanism should be provided
    to access raw multimodal prompts
-   If this change is intentional, an explicit migration strategy should
    be documented

------------------------------------------------------------------------

### Impact

-   Multimodal input using `HumanMessage.content[]` is broken in
    versions `>= v1.2.4`
-   `handleLLMStart` can no longer be used to inspect or intercept
    multimodal prompts
-   This appears to be a breaking change, but no clear migration path or
    changelog entry was found

------------------------------------------------------------------------

### Additional Notes

This behavior suggests a prompt normalization or serialization step
introduced in v1.2.4 that concatenates text segments and ignores
non-text segments such as `input_audio`.

If this is an intentional design change, clarification and documentation
would be appreciated.

### System Info

### Environment

-   LangChain version: `>= v1.2.4`
-   Runtime: Node.js
-   Use case: Audio / multimodal input via `HumanMessage`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Regression in v1.2.4: Multimodal `input_audio` in `HumanMessage` is flattened into text #9811

Checked other resources

Example Code

Reproduction

Error Message and Stack Trace (if applicable)

Behavior before v1.2.4

Behavior since v1.2.4

Description

Summary

Expected Behavior

Impact

Additional Notes

System Info

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

🐛 Regression in v1.2.4: Multimodal input_audio in HumanMessage is flattened into text #9811

Description

Checked other resources

Example Code

Reproduction

Error Message and Stack Trace (if applicable)

Behavior before v1.2.4

Behavior since v1.2.4

Description

Summary

Expected Behavior

Impact

Additional Notes

System Info

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

🐛 Regression in v1.2.4: Multimodal `input_audio` in `HumanMessage` is flattened into text #9811