Open
Conversation
- Add packages/content-fetch-go: a Go rewrite of packages/content-fetch that is fully API-compatible (same HTTP endpoints, env vars, Redis key schema, and BullMQ v5 job format) but produces a significantly smaller Docker image (Alpine + Chromium only, no Node.js runtime) - Internal packages: - bullmq: BullMQ v5-compatible producer and consumer (Lua moveToActive) - queue/worker: concurrent job worker (4 goroutines, 500 ms poll) - fetch: Chromium page fetch via chromedp replacing puppeteer-parse - handler: processFetchContentJob logic (cache, GCS upload, job queuing) - gcs: Google Cloud Storage upload - analytics: PostHog failure event capture - server: HTTP endpoints (/_ah/health, /metrics, /lifecycle/prestop, /) - redisutil: dual Redis connections (cache + BullMQ MQ) - config: all env var loading - Add CLAUDE.md with build/test/lint commands and architecture overview Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hand-rolled Prometheus text output in /metrics with the official prometheus/client_golang library. Adds an internal/metrics package that registers five GaugeVec collectors (active, failed, completed, prioritized, oldest_job_age_seconds) and refreshes them from Redis on each request via a thin wrapper around promhttp.Handler. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace short variable/field/parameter names with self-documenting ones: - rds → redisDS (RedisDataSource) - br → browser - cfg → config - rdb → redisClient - w → worker (in New() signatures; http.ResponseWriter stays as w) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests spin up a Redis container via testcontainers-go and cover: - HTTP endpoints: health, metrics, token auth, method validation, 404 - Handler pipeline: cache hit, multi-user jobs, save-page job enqueueing - Domain blocking: hardcoded list (weibo.com) and failure-count threshold - BullMQ primitives: AddBulk/PopJob round-trip, priority ordering, complete/fail - Full end-to-end: worker consumes from content-fetch queue and produces to backend queue - HTTP POST: valid token + cached result → save-page job in Redis No PostgreSQL required; service only depends on Redis. Run with: go test -v -timeout 120s ./... Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces internal/gcs (cloud.google.com/go/storage) with a new
internal/storage package backed by gocloud.dev/blob, giving self-hosted
users a choice of object storage backend via a single env var:
BLOB_STORAGE_URL=gs://bucket → GCS (unchanged behaviour)
BLOB_STORAGE_URL=s3://bucket?region=... → AWS S3
BLOB_STORAGE_URL=s3://bucket?endpoint=http://minio:9000&use_path_style=true&disable_https=true®ion=us-east-1
→ MinIO
Backward compatibility: when BLOB_STORAGE_URL is not set, a gs:// URL is
constructed from the existing GCS_UPLOAD_BUCKET env var, so existing GCS
deployments require no config changes.
Changes:
- internal/gcs/gcs.go deleted
- internal/storage/storage.go created (gocloud.dev/blob, gcsblob, s3blob)
- internal/storage/storage_test.go created (6 memblob unit tests, no Docker)
- config.go: BlobStorageURL field + BlobURL() fallback method
- handler.go: swaps gcs import for storage, bridges GCS key-file via
GOOGLE_APPLICATION_CREDENTIALS for the gcsblob URL opener
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replaces the Node.js content-fetch container with the Go implementation
in both self-hosted compose files, and enables MinIO-backed original
content uploads via BLOB_STORAGE_URL.
Changes:
- self-hosting/docker-compose/docker-compose.yml
- content-fetch image: sh-content-fetch → sh-content-fetch-go
- Remove USE_FIREFOX (Go service uses Chromium, no Firefox needed)
- Add dependency on createbuckets (bucket must exist before uploads)
- self-hosting/docker-compose/self-build/docker-compose.yml
- content-fetch build: packages/content-fetch → packages/content-fetch-go
- Remove USE_FIREFOX
- Add dependency on createbuckets
- self-hosting/docker-compose/.env.example
- Remove SKIP_UPLOAD_ORIGINAL=true (uploads now work via MinIO)
- Add BLOB_STORAGE_URL pointing to the MinIO container
- Consolidate AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY entries
(were split between two comment blocks, now in one place)
- packages/content-fetch-go/Dockerfile
- Build stage: golang:1.24-alpine → golang:1.25-alpine (matches go.mod)
MinIO URL used: s3://omnivore?endpoint=http%3A%2F%2Fminio%3A9000&use_path_style=true&disable_https=true®ion=us-east-1
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The queue-processor sends labels as an array of objects matching
TypeScript's CreateLabelInput interface, e.g.:
labels: [{"name":"RSS"}]
The Go struct had this declared as []string, causing an unmarshal error
whenever an RSS feed job arrived:
json: cannot unmarshal object into Go struct field JobData.labels of type string
Fix both JobData (incoming jobs) and savePageJobData (outgoing jobs) to
use []LabelInput{Name, Color, Description}, matching the TS types exactly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
handler_test.go (unit, no Docker):
- TestJobData_UnmarshalLabelsAsObjects — exact payload from refreshFeed.ts
- TestJobData_UnmarshalLabelsWithColor — optional color field
- TestJobData_UnmarshalNoLabels — absent labels field is valid
- TestJobData_UnmarshalMultipleLabels — multiple label objects with all fields
- TestSavePageJobData_MarshalLabels — outgoing jobs serialise labels as objects
integration_test.go (Redis container):
- TestIntegration_RSSJobWithLabelObjects — enqueues a job with labels:[{"name":"RSS"}]
through the full worker pipeline and asserts the resulting save-page job carries
the label object through to the backend queue
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…re binary) - Add src-go/ with cobra CLI: omnivore server content-fetcher - Support GCS, S3, and MinIO via gocloud.dev/blob (BLOB_STORAGE_URL) - Fix labels type mismatch ([]string → []LabelInput) for BullMQ jobs - Wire Go content-fetcher into self-hosted Docker Compose setup - Add docker/content-fetcher.Dockerfile built from repo root - Add Makefile targets: content_fetch_go, docker_build/push_content_fetcher - Add integration tests (testcontainers-go) and unit tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This changes allows to create user without registration, so on self-hosted instace registration could be disabled