Skip to content

[Iris] Replace txn_log audit table with structured log lines#5082

Merged
rjpower merged 2 commits intomainfrom
iris-structured-audit-logging
Apr 22, 2026
Merged

[Iris] Replace txn_log audit table with structured log lines#5082
rjpower merged 2 commits intomainfrom
iris-structured-audit-logging

Conversation

@rjpower
Copy link
Copy Markdown
Collaborator

@rjpower rjpower commented Apr 22, 2026

Drop the SQLite txn_log/txn_actions tables and their RPC/dashboard surface. Every state-changing transition now emits a single logger.info line of the form event= entity= trigger= k=v..., which the log server already captures. Cascades carry trigger= naming the parent event. Also instruments api_key create/revoke, which previously had no audit trail. Migration 0037 drops the tables.

Fixes #4895

Drop the SQLite txn_log/txn_actions tables and their RPC/dashboard surface.
Every state-changing transition now emits a single logger.info line of the
form event=<action> entity=<id> trigger=<parent> k=v..., which the log
server already captures. Cascades carry trigger= naming the parent event
(e.g. task->job terminations). Also instruments api_key create/revoke,
which previously had no audit trail. Migration 0037 drops the tables.

Fixes #4895

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rjpower rjpower added the agent-generated Created by automation/agent label Apr 22, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 22, 2026

Claude finished @rjpower's task in 8m 12s —— View job


Code review

  • Check PR state and whether review is needed
  • Gather relevant CLAUDE.md / AGENTS.md files
  • Summarize the PR changes
  • Run parallel review agents (2 compliance + 2 bug)
  • Validate all flagged issues
  • Post review summary and inline comments

Two validated issues, both posted inline:

  1. lib/iris/scripts/benchmark_db_queries.py — broken import/call to deleted _transaction_actions (line 70 and line 442). The PR touched this file (removed txn_log/txn_actions from the table list) but missed these two references. Script now raises ImportError at load. Inline comment.

  2. lib/iris/OPS.md — the documented grammar event=<kind> action=<action> entity=<id> ... and example grep 'action=worker_heartbeat_failed' don't match what log_event emits (event=<action> entity=<entity_id> trigger=<trigger> k=v ...). There is no action= token, so the example greps nothing. Inline comment with suggested fix.

No other bugs or CLAUDE.md/AGENTS.md violations found. The table/RPC/dashboard deletions, migration 0037, and auth instrumentation are consistent; deleted symbols have no other lingering references in lib/.
· branch iris-structured-audit-logging

@rjpower rjpower requested a review from ravwojdyla April 22, 2026 21:28
@rjpower rjpower enabled auto-merge (squash) April 22, 2026 21:28
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 171c7fcf21

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

return list(reversed(actions))


def _live_user_stats(db: ControllerDB) -> list[UserStats]:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove stale _transaction_actions benchmark dependency

This change deletes the _transaction_actions helper from service.py, but lib/iris/scripts/benchmark_db_queries.py in the same commit still imports and benchmarks _transaction_actions (import at line 70 and call at line 442), so running the benchmark script now raises an import-time failure instead of executing. Either keep a compatibility helper here or remove/replace the benchmark path in the script in the same change.

Useful? React with 👍 / 👎.

Comment thread lib/iris/OPS.md
-- What the controller has been doing
SELECT kind, count(*) FROM txn_log GROUP BY kind ORDER BY count(*) DESC LIMIT 10;
```bash
iris process logs --since 24h | grep 'action=worker_heartbeat_failed'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Grep audit events using the event= key

The new audit logging format emitted by controller transitions uses event=<action> (for example event=worker_heartbeat_failed), but this OPS example filters on action=worker_heartbeat_failed; operators following it will miss the intended audit entries during debugging. Update the filter/example to match the emitted field name.

Useful? React with 👍 / 👎.

Comment thread lib/iris/OPS.md
Comment on lines +123 to +128
Controller audit events (`event=<kind> action=<action> entity=<id> ...`) are
emitted as structured `logger.info` lines — query them through
`iris process logs` with a substring filter, not via SQL. Example:

-- What the controller has been doing
SELECT kind, count(*) FROM txn_log GROUP BY kind ORDER BY count(*) DESC LIMIT 10;
```bash
iris process logs --since 24h | grep 'action=worker_heartbeat_failed'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grammar description and grep example don't match what log_event actually emits. log_event in transitions.py emits event=<action> entity=<entity_id> trigger=<trigger> k=v ... — there is no separate action= token. The documented grep 'action=worker_heartbeat_failed' will match nothing. Either the grammar line should drop the phantom action=<action> and the example should be grep 'event=worker_heartbeat_failed'.

Suggested change
Controller audit events (`event=<kind> action=<action> entity=<id> ...`) are
emitted as structured `logger.info` lines — query them through
`iris process logs` with a substring filter, not via SQL. Example:
-- What the controller has been doing
SELECT kind, count(*) FROM txn_log GROUP BY kind ORDER BY count(*) DESC LIMIT 10;
```bash
iris process logs --since 24h | grep 'action=worker_heartbeat_failed'
Controller audit events (`event=<action> entity=<id> trigger=<parent> k=v ...`)
are emitted as structured `logger.info` lines — query them through
`iris process logs` with a substring filter, not via SQL. Example:
```bash
iris process logs --since 24h | grep 'event=worker_heartbeat_failed'

"reservation_claims",
"txn_log",
"txn_actions",
"meta",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has two more dangling references to the deleted _transaction_actions helper that this PR missed:

  • line 70 — still imported in the from iris.cluster.controller.service import (...) block
  • line 442 — still called via bench("_transaction_actions", lambda: _transaction_actions(db))

Since _transaction_actions was removed from service.py, running the script now raises ImportError: cannot import name '_transaction_actions' from 'iris.cluster.controller.service' at module load. Drop both the import entry and the bench(...) call.

Fix this →

Wire log_event into scheduling pass completion, StartTasks RPC
dispatch failure, worker-rejected task acks, worker termination
(with cause trigger), sibling worker failover, reservation claim
mutations, and the provider-sync health summary. Enriches
worker_heartbeat_failed with address/age/rpc_action context and
drops the unstructured duplicate warning.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@ravwojdyla ravwojdyla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rjpower rjpower merged commit 26ac780 into main Apr 22, 2026
41 checks passed
@rjpower rjpower deleted the iris-structured-audit-logging branch April 22, 2026 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-generated Created by automation/agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

iris - controller audit logging

2 participants