Skip to content

Consider addition of endpoint to get direct upload URLs #28

@thclark

Description

@thclark

Feature request

Current state

When using BlobField to upload blobs to GCS, the upload is made to a temporary file, with a fixed content-type (application/octet-stream). Then, on successful commit of the transaction (ie once the corresponding row is saved in the database) the temporary blob is assigned its metadata and moved to its ultimate destination.
This is good because:

  • The naming callback can access any and all model fields
  • The naming callback can be deterministic, because the edge case of uploading a file then failing the database transaction does not leave an orphaned file
  • Any orphaned files land up in the _tmp/ directory (or another bucket entirely) so are easily cleaned later.

However, this mechanism limits you to uploading files using BlobField.

Use Case

I want to upload files from another service directly to GCS, using django-gcp as the permissions manager to sign URLs but without registering the files in BlobField

Proposed Solution

Create an endpoint to sign URLS that's accessible by the frontend, given a signing token. Thus the frontend can

Add a view like the following to storage/views.py:

import datetime
import json
import random
import string
import time
import django.core.signing
from django.http import HttpResponse, HttpResponseBadRequest
from django.utils import baseconv, timezone
from django.views.decorators.http import require_POST
from google.cloud.storage import Blob, Bucket

from .bucket_registry import _bucket_registry


URLSAFE_CHARACTERS = string.ascii_letters + string.digits + "-._~"
REQUIRED_PARAMS = ["token", "filename", "content_type"]

signer = django.core.signing.Signer()


@require_POST
def get_direct_upload_url(request):
    """Responds with a pre-signed URL enabling the client to upload an object to the bucket"""

    for p in REQUIRED_PARAMS:
        if not request.POST.get(p):
            return HttpResponseBadRequest(f"'{p}' is a required parameter.")
    try:
        token: str = signer.unsign(request.POST["token"])
    except django.core.signing.BadSignature:
        return HttpResponseBadRequest("Invalid token.")

    bucket_and_path, include_timestamp_indicator, exptime = token.rsplit(":", 2)
    if time.time() > baseconv.base62.decode(exptime):
        return HttpResponseBadRequest("Timeout expired.")

    bucketname, path_prefix = bucket_and_path[5:].split("/", 1)
    bucket: Bucket = _bucket_registry.get("gs://" + bucketname)
    if not bucket:
        return HttpResponseBadRequest(f"Unknown bucket identifier 'gs://{bucketname}'.")

    filename: str = request.POST["filename"]
    content_type: str = request.POST["content_type"]

    timestring: str = f"{timezone.now():%Y-%m-%d_%H-%M-%S/}" if include_timestamp_indicator == "1" else ""
    randomstring: str = "".join(random.choices(URLSAFE_CHARACTERS, k=24))
    path: str = f"{path_prefix}{timestring}{randomstring}/{filename}"
    blob: Blob = bucket.blob(path)

    return HttpResponse(
        json.dumps(
            {
                "url": blob.generate_signed_url(
                    expiration=timezone.now() + datetime.timedelta(minutes=60),
                    method="PUT",
                    content_type=content_type,
                ),
                "path": path,
            }
        )
    )

Then use this code snippet to generate the token and URL enabling the frontend to call the signing endpoint (in storage/utils.py):

import logging
import os
import time
from django.core.signing import Signer
from django.urls import reverse
from django.utils import baseconv
import datetime
import time
from django.utils import baseconv, timezone


signer = Signer()


def get_signing_token_and_url(bucket_name, path_prefix):
    bucket_identifier = f"gs://{bucket_name}"

    # Get signing url and a token to pass to it, allows the frontend to sign on demand
    # NOTE: These are currently not used but are taken from the DDCU library and could be
    include_timestamp_indicator = "1" if self.include_timestamp else "0"
    valid_until = baseconv.base62.encode(int(time.time()) + self.submit_timeout)
    signing_path = os.path.join(bucket_identifier, path_prefix)
    to_sign = f"{signing_path}:{include_timestamp_indicator}:{valid_until}"

    signing_token = signer.sign(to_sign)
    signing_url = reverse("gcp-storage-get-direct-upload-url")

Finally, add the corresponding URL (urlss.py):

from django_gcp.storage.views import get_direct_upload_url
# ...

urlpatterns = [
    # ...
    path(r"storage/get-direct-upload-url", get_direct_upload_url, name="gcp-storage-get-direct-upload-url"),
]

Metadata

Metadata

Assignees

Labels

decision neededA decision is required (e.g. on UX or company policy)featureA new feature of the app

Type

No type

Projects

Status

Priority 1 (Low)

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions