A read-only Django database backend for DuckDB. Query Parquet files on
S3/GCS, local .duckdb files, or an in-memory DuckDB instance through the
Django ORM — .filter(), .get(), .order_by(), .values(), aggregations,
the whole familiar surface — without standing up a Postgres serving layer.
Writes are blocked at the cursor level: INSERT, UPDATE, DELETE, CREATE,
ALTER, DROP all raise NotSupportedError. The point of this backend is to
treat a DuckDB store as a read-only analytical replica that the Django ORM
can query directly.
If your "warehouse" is already Parquet on object storage, you probably don't need a separate transactional database just to serve it. Pipelines write Parquet, DuckDB reads it (with predicate pushdown and column pruning), the ORM treats DuckDB like any other database. You keep Postgres for the parts that actually mutate — users, sessions, billing — and let the analytical read path skip a sync step entirely.
This is the Django side of the DuckDB + Postgres hybrid pattern that projects like GlitchTip and EthicalAds use. The Django ecosystem didn't have a maintained backend for it; this is one.
pip install django-duckdb-readonlyRequires Python 3.10+, Django 4.2+, DuckDB 1.0+.
# settings.py
DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql",
"NAME": "app",
# ...your real DB for users, sessions, billing
},
"warehouse": {
"ENGINE": "django_duckdb_readonly",
"NAME": "/var/data/analytics.duckdb", # or ":memory:"
"OPTIONS": {
"duckdb_read_only": True,
"extensions": ["httpfs"],
"init_sql": [
# Expose remote Parquet as a SQL view. Django models then
# map to "listings" via Meta.db_table.
"CREATE VIEW listings AS "
"SELECT * FROM read_parquet('gs://my-bucket/marts/listings/*.parquet')",
],
},
},
}
DATABASE_ROUTERS = ["myapp.routers.WarehouseRouter"]# myapp/models.py
from django.db import models
class Listing(models.Model):
id = models.IntegerField(primary_key=True)
name = models.CharField(max_length=120)
state = models.CharField(max_length=2)
score = models.FloatField()
class Meta:
managed = False # never migrate this table
db_table = "listings" # matches the view name in init_sql
app_label = "warehouse_app"# myapp/views.py
from django.shortcuts import render
from .models import Listing
def top_listings(request, state):
qs = (
Listing.objects
.filter(state__iexact=state, score__gte=0.5)
.order_by("-score")[:50]
)
return render(request, "listings/top.html", {"listings": qs})That's it. The ORM emits SELECT … WHERE UPPER(state) = UPPER(?) AND score >= ? ORDER BY score DESC LIMIT 50, DuckDB pushes the predicates into the Parquet scan, and you skip the entire mart → Postgres sync.
All options live under DATABASES[<alias>]['OPTIONS']:
| Option | Type | Default | Purpose |
|---|---|---|---|
duckdb_read_only |
bool |
False |
Open the DuckDB file in read-only mode. Use this for .duckdb files that should never be mutated. Note: the Django ORM layer is always read-only — this flag controls the lower-level DuckDB file lock. |
duckdb_config |
dict[str, Any] |
{} |
Passed to duckdb.connect(config=…). Useful for s3_region, memory_limit, threads, etc. |
extensions |
list[str] |
[] |
Names of DuckDB extensions to INSTALL and LOAD on connect (e.g. ["httpfs", "spatial", "h3"]). |
init_sql |
list[str] |
[] |
Raw SQL run once on each new connection, bypassing the read-only check. This is the hook for CREATE VIEW … AS SELECT * FROM read_parquet(…), attaching other DuckDB databases, or seeding fixture data for tests. |
The standard NAME field accepts:
":memory:"— ephemeral in-process DuckDB (great for tests +init_sql)- A filesystem path — opens or creates a
.duckdbfile - Any other path DuckDB's
connect()accepts
DuckDB's httpfs extension reads remote Parquet directly. Two common shapes:
GCS via service-account JSON (matching gcloud auth application-default login):
"OPTIONS": {
"extensions": ["httpfs"],
"duckdb_config": {
# DuckDB picks up GOOGLE_APPLICATION_CREDENTIALS automatically;
# or set the credential here:
# "gcs_credentials": "/path/to/service-account.json",
},
"init_sql": [
"CREATE VIEW listings AS "
"SELECT * FROM read_parquet('gs://my-bucket/marts/listings/*.parquet')",
],
}S3 with explicit credentials:
"OPTIONS": {
"extensions": ["httpfs"],
"duckdb_config": {
"s3_region": "us-east-1",
"s3_access_key_id": os.environ["AWS_ACCESS_KEY_ID"],
"s3_secret_access_key": os.environ["AWS_SECRET_ACCESS_KEY"],
},
"init_sql": [
"CREATE VIEW events AS "
"SELECT * FROM read_parquet('s3://my-bucket/events/*/*.parquet')",
],
}See examples/parquet_on_gcs.py for a fully
runnable example against a tiny local Parquet file (no cloud setup required).
Use a standard Django database router. The typical pattern is "one app per analytical schema":
class WarehouseRouter:
warehouse_app = "warehouse_app"
def db_for_read(self, model, **hints):
if model._meta.app_label == self.warehouse_app:
return "warehouse"
return None
db_for_write = db_for_read
def allow_migrate(self, db, app_label, model_name=None, **hints):
if app_label == self.warehouse_app or db == "warehouse":
return False
return NoneEvery model in warehouse_app should set Meta.managed = False — the
backend will hard-error on any DDL.
| Feature | Status |
|---|---|
.all(), .filter(), .exclude(), .get() |
✅ |
Field lookups (exact, iexact, in, gt, gte, lt, lte, isnull, contains, icontains, startswith, endswith, regex, iregex) |
✅ |
.order_by(), slicing, .distinct() |
✅ |
.values(), .values_list(), .only(), .defer() |
✅ |
Aggregations (Count, Sum, Avg, Min, Max) |
✅ |
Q() and F() expressions |
✅ |
Meta.managed = False models |
✅ (recommended) |
manage.py inspectdb <alias> against the DuckDB store |
✅ |
| Multi-database routers | ✅ |
| Writes through the ORM | 🚫 hard-errors (by design) |
migrate against the warehouse |
🚫 hard-errors (by design) |
select_related() / prefetch_related() across mart tables |
|
| DuckDB-native types (UUID, JSON, INTERVAL, STRUCT, LIST, MAP) | dict/list and you handle them in app code |
DuckDBPyConnection is not thread-safe by itself, but connection.cursor()
returns an independent cursor per call — that's the unit of parallelism
DuckDB documents ("Using Connections in Parallel Python Programs"). This
backend leans on that: every DatabaseWrapper instance (one per thread, per
Django's standard model) hands back the same physical duckdb connection for
a given alias, and create_cursor() issues a fresh DuckDB cursor on every
call.
The practical reason this matters: connection cold-start (INSTALL httpfs
- remote-bucket auth +
CREATE VIEW … read_parquet(…)) can be multi-second. A per-thread connection model undergunicorn --workers 1 --threads Nwould pay that cost N times. Sharing keeps it to once per process per alias.
A few consequences:
init_sqlandconnection_hookrun exactly once per process per alias, not once per thread. Don't put per-request state in them.DatabaseWrapper.close()is a no-op on the underlying connection — other threads may be mid-query. Use the classmethodDatabaseWrapper.close_shared(alias)to actually close the shared connection (test tearDown, credential rotation, etc.).is_usable() → Falsedoes not auto-recover. If the shared connection goes bad, callclose_shared(alias)and let the next request re-open.
Two layers, intentionally independent:
- ORM layer (always on): every statement emitted by Django's compilers
is checked against an allow-list of read keywords (
SELECT,WITH,EXPLAIN,PRAGMA, etc.). Anything else raisesNotSupportedErrorbefore it reaches DuckDB. This catches accidental.save(),.create(),.update(),.delete(), and any raw DDL. - DuckDB connection layer (opt-in via
duckdb_read_only=True): tells DuckDB itself to refuse writes at the file-lock level. Belt-and-braces; use it on production where the warehouse file must never be mutated.
init_sql runs through the raw connection, not the ORM cursor, so
CREATE VIEW works there — that's the configured bootstrap path.
0.1.x— bug fixes, more lookups, broader Django/Python version matrix0.2— async cursor support (async defviews), Django 6 compatibility polish0.3— optionalmotherduck://support (MotherDuck cloud-hosted DuckDB)
This is alpha software, but it's used in production for the read path of Sunnyplans — bug reports and PRs welcome.
git clone https://github.com/eracle/django-duckdb-readonly.git
cd django-duckdb-readonly
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
python runtests.pyMIT. See LICENSE.