Skip to content

[Bug]: Discord voice messages silently dropped — diagnostic logging needed #817

@dmitriikeler

Description

@dmitriikeler

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • I am using the latest version of Moltis
  • If this happened during a chat session, I included as much full session context as possible and redacted secrets

What happened?

Discord voice messages sent to a MOLTIS bot are systematically swallowed — the agent never sees them, never transcribes them, and produces no log entries confirming the event reached the handler. Regular text messages in the same channel (immediately before and after the voice) log and process normally.

Reproducible pattern: user sends voice message → agent gets no signal → user follows up asking about the message → agent either ignores or hallucinates a transcription ("I saw '?'"). Happens 100% of the time in our testing across multiple voice message lengths (4s, 14s).

This has happened to us consistently since Discord voice support was added in 20260412.01 (commit 8ecf99a3 — "feat(discord): handle inbound voice and image attachments"). Voice messages have never actually worked for us. We initially thought it was rate-dependent; it is not — both short and long voice messages are dropped.

Expected behavior

When a Discord user sends a voice message to the bot, the handler at crates/discord/src/handler/implementation.rs should:

  1. Receive the MESSAGE_CREATE event
  2. Log "discord inbound message received" at line 708
  3. Route the attachment through select_media_attachment → STT → transcription
  4. Inject transcribed text into the agent context with voice provenance marker

The agent should be able to distinguish "user sent a voice message" from "user typed text", which requires the voice path to actually fire.

Steps to reproduce

  1. Set up a MOLTIS bot on Discord with STT configured (OpenAI Whisper / Groq / etc.)
  2. From Discord, send a voice message to the bot (via the native Discord voice-record UI, NOT uploading an audio file)
  3. Observe: agent receives nothing, logs show no inbound event for the voice message

Did this happen during a chat session?

Yes

Chat session context (if applicable)

User: sends 4-second voice message
Agent: no response — agent had no signal
User: "?"
Agent: "Voice messages come in as transcribed text. I read the transcription, write my response as text, and it gets converted back to audio for you to hear."
User: "What was my last message voice"
Agent: "Yeah, your last message was voice. The system transcribed it to text ('?') before I saw it." ← hallucination — the "?" was the user's next text message, not a transcription
User: sends 14-second voice message with content "the secret number is 12345"
Agent: (no signal from voice, but user also sent text "What about now? What's the secret number?") "This one's text. You typed it. There's no secret number."

Error messages / logs

The dispositive evidence is the absence of log events for the voice messages:

08:01:14Z moltis_discord::handler: discord inbound message received ... text_len=30
                                   ← 14-second voice message sent here — ZERO log events
08:02:04Z moltis_discord::handler: discord inbound message received ... text_len=41

Two adjacent text messages both log the standard "discord inbound message received" info line at implementation.rs:708. The voice message between them produces no log output anywhere — not info, not warn, not debug, not error. Same pattern for earlier voice messages in the same session.

Is this a regression?

No, this never worked

Moltis version

20260420.02 (also verified against main / 20260421.02)

Component

Channels (Telegram, Discord, etc.)

Install method

Docker

Operating system

Other Linux

Additional context

We do not yet know the root mechanism. Deeper investigation is needed — this is why we're filing with a request for diagnostic logging, not a proposed fix.

Verified facts:

  • Single impl EventHandler for Handler at crates/discord/src/handler/implementation.rs:656 implementing only message, ready, interaction_create. No message_update handler exists.
  • Gateway intents include MESSAGE_CONTENT (line 42) — privileged intent that Discord requires for reading message content/attachments. This is enabled.
  • Serenity 0.12.5 Attachment struct has duration_secs: Option<f64> and waveform: Option<Vec<u8>>. Both Option, so voice-message fields shouldn't cause deserialize failures.
  • Silent returns in fn message before the logging info!: (1) msg.author.bot, (2) accounts.get(...) miss (logs a warn — we don't see this either), (3) text.is_empty() && msg.attachments.is_empty() at line 701 (genuinely silent — no log)

Ruled out:

  • MESSAGE_CONTENT intent (declared and active — text works, so intent is live)
  • Voice-specific event routing (Serenity has no such event — voice messages ride regular MESSAGE_CREATE with flags & 8192)
  • Account lookup miss (would log warn)
  • Bot auth/scope (would block text too)

Remaining hypotheses (indistinguishable without diagnostic logging):

  1. Gateway-level drop — serenity's shard deserializer silently drops the MESSAGE_CREATE before it reaches EventHandler::message. Serenity has a history of such issues (Some event fails to deserialize, obtaining a DateTime<Utc> string rather than an integer serenity-rs/serenity#1431).
  2. Line-701 silent return — handler is reached but msg.attachments is empty for voice messages (despite Discord docs saying voice attachments are populated at MESSAGE_CREATE time).

Without diagnostic logging in the handler, we cannot distinguish these. The Discord-docs claim that voice messages ride regular MESSAGE_CREATE with a flags & 8192 bit and one audio attachment — if that's what we're receiving, hypothesis 2 should not apply. If we're NOT receiving it, hypothesis 1 is likely.

Suggested diagnostic patch (not a fix — a way for you and us to collect evidence):

// Before the silent return at line 701:
if text.is_empty() && msg.attachments.is_empty() {
    warn!(
        account_id = %self.account_id,
        message_id = ?msg.id,
        flags = ?msg.flags,
        kind = ?msg.kind,
        "discord: dropping message with empty text and empty attachments"
    );
    return;
}

// Also add a catch-all diagnostic info! BEFORE any filtering, at start of fn message:
info!(
    account_id = %self.account_id,
    message_id = ?msg.id,
    author_bot = msg.author.bot,
    content_len = msg.content.len(),
    attachment_count = msg.attachments.len(),
    flags = ?msg.flags,
    "discord: raw message event (diagnostic)"
);

// Add a message_update handler to catch late attachments if Discord ever sends them:
async fn message_update(
    &self,
    _ctx: Context,
    _old: Option<Message>,
    _new: Option<Message>,
    event: serenity::model::event::MessageUpdateEvent,
) {
    info!(
        account_id = %self.account_id,
        message_id = %event.id,
        attachment_count = event.attachments.as_ref().map(|a| a.len()),
        flags = ?event.flags,
        "discord: message_update (diagnostic)"
    );
}

With this patch deployed (plus RUST_LOG=moltis_discord=debug,serenity::gateway=debug), anyone hitting the issue can report concrete evidence — either (a) the raw message event log fires with attachment_count=0 and flags=8192 confirming hypothesis 2, (b) the raw message event log never fires confirming gateway-level drop, or (c) message_update fires with late attachments revealing a latency story nobody documented.

Our offer: we're running this on production Fly.io with real Discord traffic and would gladly apply such a patch and report the evidence. We're not in a position to contribute Rust code directly, but we can run experiments and share logs.

The immediate ask is the diagnostic logging. Once the mechanism is known, the correct fix follows naturally (probably message_update handler, but we want to verify first).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions