Skip to content

Gmail attachments get: unpadded base64url data breaks standard decoders #774

@njt

Description

@njt

Description

gws gmail users messages attachments get returns the data field from the Gmail API as-is — URL-safe base64 without padding. Google's Gmail API intentionally omits = padding from attachment data (this is per spec for base64url). However, many standard base64 decoders (Python's base64.urlsafe_b64decode, etc.) require proper padding and fail on strings whose length mod 4 is not 0.

Steps to Reproduce

gws gmail users messages attachments get --params '{"userId": "me", "messageId": "MESSAGE_ID", "id": "ATTACHMENT_ID"}'

The returned JSON data field has a length that is not a multiple of 4 (e.g., 50038 characters, which is 4n+2).

Attempting to decode in Python:

import base64, json
result = json.loads(output)
decoded = base64.urlsafe_b64decode(result["data"])
# Raises: binascii.Error: Invalid base64-encoded string number of data characters
#         cannot be 1 more than a multiple of 4

Expected Behavior

The data field should be directly decodable by standard base64 libraries. Either:

  1. Re-pad the base64url string before emitting it in the JSON response (data += "=" * (-len(data) % 4)), or
  2. Document that the data field uses unpadded base64url encoding (RFC 4648 §5) and consumers must re-pad

Workarounds

  • Use the -o flag to have gws decode and write the binary directly: gws gmail users messages attachments get --params '...' -o /tmp/file.pdf
  • Manually re-pad before decoding:
    data = response["data"]
    data += "=" * (-len(data) % 4)
    decoded = base64.urlsafe_b64decode(data)

Environment

  • gws version: 0.4.1
  • OS: Ubuntu 24.04 (Linux)
  • Consumer: Python 3.12 base64.urlsafe_b64decode

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions