Skip to content

Tx#37

Open
Josephrp wants to merge 12 commits intodevfrom
tx
Open

Tx#37
Josephrp wants to merge 12 commits intodevfrom
tx

Conversation

@Josephrp
Copy link
Copy Markdown
Owner

@Josephrp Josephrp commented Mar 15, 2026

Greptile Summary

This PR (titled "Tx") covers a broad set of hardening and refactoring changes across the backend, configuration, and frontend layers. Key areas of change include: parameterizing Docker Compose Postgres credentials/ports, tightening JWT secret validation with a production guard, improving emergency approval error handling (proper 503s instead of misleading ok:true responses), adding authentication to the previously-open publish_inbound bus endpoint, removing the VITE_RADIOSHAQ_TOKEN build-time env var in favour of runtime login, fixing a React useEffect dependency on clearMarkers, and dropping the loguru stub in favour of the real package.

Notable changes:

  • docker-compose.yml: Postgres credentials and ports are now fully overridable via POSTGRES_* env vars with defaults preserved — good DX improvement, but the hindsight service's fallback HINDSIGHT_API_DATABASE_URL is not updated to reference these vars, creating a silent mismatch when credentials are overridden
  • twilio.py: STOP/opt-out handling now returns 503/500 when the DB is unavailable instead of logging a warning and confirming. Because Twilio does not auto-retry SMS webhooks on 5xx, opt-out requests will be silently dropped (no DB record, no TwiML acknowledgment to the user) under any transient DB outage — a TCPA/CASL compliance risk
  • bus.py: Adding get_current_user auth to publish_inbound is a security improvement but is a breaking change for any unauthenticated Lambda or service caller
  • config/schema.py: Production insecure JWT secret guard and improved validator are good additions; the create_directories validator conflates directory creation with security enforcement, which can leave directories uncreated on validation failure
  • postgres_gis.py: New count_pending_coordination_events is clean and efficient
  • Frontend token handling and OperatorMap dependency fix are correct

Confidence Score: 3/5

  • Mergeable with caution — the Twilio opt-out compliance gap and the Hindsight credential mismatch should be addressed before production use
  • The PR improves security (JWT guard, auth on bus endpoint, proper error codes) and DX (parameterized Docker Compose, runtime token login) in meaningful ways. However, the change to raise 503 on opt-out DB failure introduces a TCPA/CASL compliance risk where STOP messages can be silently dropped without user acknowledgment. The Hindsight service DB credential mismatch is a correctness gap when credential overrides are used. Both issues could cause real-world failures in production.
  • radioshaq/radioshaq/api/routes/twilio.py (opt-out compliance) and radioshaq/infrastructure/local/docker-compose.yml (Hindsight credential mismatch)

Important Files Changed

Filename Overview
radioshaq/radioshaq/api/routes/twilio.py STOP opt-out now raises 503/500 on DB failure; Twilio won't auto-retry, so the opt-out is silently dropped with no user acknowledgment — a TCPA/CASL compliance risk.
radioshaq/infrastructure/local/docker-compose.yml Postgres credentials and ports are now fully parameterizable via env vars; healthcheck references updated. Hindsight service DB URL default is not updated to reference the new POSTGRES_* variables.
radioshaq/radioshaq/api/routes/bus.py publish_inbound endpoint now requires JWT auth — security improvement but is a breaking change for any existing unauthenticated callers (e.g. Lambda).
radioshaq/radioshaq/api/routes/emergency.py Approval failures now raise HTTP 503 instead of returning ok:true/sent:false; count_pending_coordination_events preferred with legacy fallback. Breaking change for API clients.
radioshaq/radioshaq/config/schema.py JWT validator improved: strips whitespace, rejects empty keys. Production insecure-secret guard added in model validator. Minor: directory creation and security check are conflated in create_directories.
radioshaq/radioshaq/database/postgres_gis.py New count_pending_coordination_events method correctly uses func.count() with SELECT_FROM to avoid N+1 fetches. Clean implementation.
radioshaq/web-interface/src/services/radioshaqApi.ts VITE_RADIOSHAQ_TOKEN env var support removed; all token access now routed through getRuntimeToken(). Consistent cleanup across all auth header usages.

Sequence Diagram

sequenceDiagram
    participant U as User (SMS)
    participant T as Twilio
    participant API as RadioShaq API
    participant DB as PostgreSQL

    Note over API: Inbound STOP message flow (new behavior)
    U->>T: Sends "STOP"
    T->>API: POST /twilio/sms (webhook)
    API->>DB: record_opt_out_by_phone()
    alt DB available
        DB-->>API: success
        API-->>T: 200 TwiML "You have been opted out"
        T-->>U: Confirmation SMS
    else DB unavailable
        DB-->>API: exception / None
        API-->>T: 500 / 503 HTTP error
        Note over T: Twilio logs error,<br/>no auto-retry for SMS
        T--xU: No confirmation sent
        Note over DB: Opt-out record LOST
    end

    Note over API: Emergency approval flow (new behavior)
    participant Op as Operator
    participant MB as Message Bus
    Op->>API: POST /emergency/events/{id}/approve
    API->>DB: claim_emergency_event_pending()
    DB-->>API: claimed
    API->>MB: publish_outbound()
    alt Bus available & queue not full
        MB-->>API: ok=True
        API->>DB: update status=approved
        API-->>Op: 200 {ok: true, queued: true}
    else Bus unavailable or queue full
        MB-->>API: error / ok=False
        API->>DB: rollback status=pending
        API-->>Op: 503 HTTPException
    end
Loading

Comments Outside Diff (4)

  1. radioshaq/web-interface/src/locales/es.json, line 25-46 (link)

    Missing i18n keys in Spanish locale

    The en.json locale adds map.setLocation, map.latInvalid, map.lngInvalid, emergency.mapTitle, and emergency.viewOnMap keys, but these are missing from es.json. The CallsignsPage "Set location" dialog and the EmergencyPage map section use these keys with ?? 'fallback' fallbacks, so they'll show English text to Spanish users instead of translated text.

    Add the missing keys to the map section:

        "setLocation": "Establecer ubicación",
        "latInvalid": "La latitud debe estar entre -90 y 90.",
        "lngInvalid": "La longitud debe estar entre -180 y 180."
    

    And to the emergency section:

        "mapTitle": "Eventos en el mapa",
        "viewOnMap": "Ver en mapa"
    
  2. radioshaq/web-interface/src/locales/fr.json, line 25-46 (link)

    Missing i18n keys in French locale

    Same issue as es.json — the fr.json locale is missing map.setLocation, map.latInvalid, map.lngInvalid, emergency.mapTitle, and emergency.viewOnMap keys that were added to en.json. French users will see English fallback text for the "Set location" dialog and the emergency map section.

  3. radioshaq/web-interface/src/components/maps/OperatorMap.tsx, line 133 (link)

    Missing clearMarkers in useEffect dependency array

    The useEffect at line 92 uses clearMarkers (defined via useCallback) and centerRef/zoomRef refs, but its dependency array is []. React's exhaustive-deps rule would flag this. While refs won't cause stale-closure issues, clearMarkers is a dependency and should be listed. In practice this likely won't cause bugs here (since clearMarkers is stable), but it's best to include it for correctness:

  4. radioshaq/infrastructure/local/docker-compose.yml, line 166 (link)

    Hindsight default DB URL not parameterized with POSTGRES_* overrides

    This PR parameterizes the main Postgres service credentials via POSTGRES_USER, POSTGRES_PASSWORD, and POSTGRES_DB, but the Hindsight service's fallback HINDSIGHT_API_DATABASE_URL still hard-codes the original default credentials. If a user overrides POSTGRES_USER or POSTGRES_PASSWORD, the Hindsight service will fail to authenticate against Postgres at startup, because its default connection URL still points to the original values.

    Docker Compose does not support nested variable interpolation within a single value, so this can't easily be a single-line fix. The recommended mitigation is to explicitly document in configuration.md (and the docker-compose comments) that users who override POSTGRES_* credentials must also set HINDSIGHT_API_DATABASE_URL to match — otherwise the Hindsight service will silently fail to connect.

Last reviewed commit: c1d8ceb

Greptile also left 3 inline comments on this PR.

Comment thread radioshaq/radioshaq/config/schema.py
@Josephrp
Copy link
Copy Markdown
Owner Author

@greptileai comprehensively update your review based on the changes above :

Comment on lines +148 to +165
# Opt-out handling (STOP): record in DB first; only acknowledge if persisted (TCPA/CASL).
if body.upper() in _OPTOUT_KEYWORDS:
db = getattr(request.app.state, "db", None)
if db is not None and hasattr(db, "record_opt_out_by_phone"):
try:
await db.record_opt_out_by_phone(from_phone, channel)
except Exception as e:
logger.warning("Opt-out record failed (channel={} phone={}): {}", channel, from_phone, e)
if db is None or not hasattr(db, "record_opt_out_by_phone"):
raise HTTPException(
status_code=503,
detail="Opt-out not available; database not configured",
)
try:
await db.record_opt_out_by_phone(from_phone, channel)
except Exception as e:
logger.error(
"Opt-out record failed (channel={} phone={}): {}", channel, from_phone, e
)
raise HTTPException(
status_code=500,
detail="Opt-out could not be recorded; please try again",
) from e
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

STOP opt-out silently lost on DB failure

Returning a 503 when the database is unavailable causes the STOP request to be silently dropped. Twilio does not automatically retry SMS webhooks by default on 5xx responses — it logs an error and moves on (a fallback URL must be explicitly configured in the Twilio console for retries). This means:

  1. User sends STOP
  2. DB is temporarily down → Twilio receives 503
  3. Twilio does not retry; the opt-out is never recorded
  4. User receives no TwiML acknowledgment
  5. User continues receiving messages, unaware the opt-out failed

This is arguably a TCPA/CASL regression from the previous code, which at least sent the user a confirmation message (even though the opt-out wasn't persisted). With the current approach, when the DB is degraded, there is no user-facing acknowledgment whatsoever and the opt-out record is lost.

Consider implementing a durable retry mechanism (e.g., writing to a dead-letter queue or a fallback store) so opt-out requests survive transient DB outages, and always return a TwiML 200 to Twilio once the opt-out has been enqueued for reliable delivery.

Comment on lines 25 to 27
async def publish_inbound(
request: Request,
body: dict[str, Any] = Body(..., embed=False),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking change: publish_inbound now requires JWT auth

Adding _user: TokenPayload = Depends(get_current_user) to this endpoint is a good security improvement, but it is a breaking change for any Lambda function or external service that currently calls POST /bus/inbound (or wherever this router is mounted) without a JWT token. Those callers will now receive 401 Unauthorized silently.

It would be worth documenting (in the PR description or changelog):

  • Which callers are affected (e.g. AWS Lambda integrations)
  • The migration path (how to obtain and pass a valid JWT)
  • Whether a service-account token should be pre-generated and distributed to existing callers

Comment on lines 861 to 877

@model_validator(mode="after")
def create_directories(self) -> Config:
"""Ensure workspace and data directories exist."""
"""Ensure directories exist and block insecure runtime defaults."""
secret = (self.jwt.secret_key or "").strip()
if secret in INSECURE_JWT_SECRETS:
if self.debug:
logger.warning(
"Using insecure jwt.secret_key in debug mode; set RADIOSHAQ_JWT__SECRET_KEY before production"
)
else:
raise ValueError(
"Insecure jwt.secret_key configured. Set RADIOSHAQ_JWT__SECRET_KEY to a strong secret or enable debug for local development."
)
self.workspace_dir.mkdir(parents=True, exist_ok=True)
self.data_dir.mkdir(parents=True, exist_ok=True)
return self
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INSECURE_JWT_SECRETS check in create_directories conflates two concerns

create_directories validates directory creation and enforces a security policy on the JWT secret. If the secret check raises ValueError, the directories are never created, leaving workspace_dir and data_dir potentially absent. Consider reordering so directory creation happens first (or in a separate validator), and then raising on insecure secrets — or breaking these into two @model_validator methods with clearer names.

Additionally, the validate_secret field validator already strips and validates the key, so (self.jwt.secret_key or "").strip() here is redundant — self.jwt.secret_key is already stripped at that point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant