Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions lib/iris/OPS.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,9 +118,14 @@ SELECT slice_id, lifecycle, scale_group, worker_ids FROM slices WHERE lifecycle=
-- Task attempt history (debugging retries)
SELECT task_id, attempt_id, state, exit_code, error FROM task_attempts
WHERE task_id LIKE '%<job_fragment>%' ORDER BY attempt_id;
```

Controller audit events (`event=<kind> action=<action> entity=<id> ...`) are
emitted as structured `logger.info` lines — query them through
`iris process logs` with a substring filter, not via SQL. Example:

-- What the controller has been doing
SELECT kind, count(*) FROM txn_log GROUP BY kind ORDER BY count(*) DESC LIMIT 10;
```bash
iris process logs --since 24h | grep 'action=worker_heartbeat_failed'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Grep audit events using the event= key

The new audit logging format emitted by controller transitions uses event=<action> (for example event=worker_heartbeat_failed), but this OPS example filters on action=worker_heartbeat_failed; operators following it will miss the intended audit entries during debugging. Update the filter/example to match the emitted field name.

Useful? React with 👍 / 👎.

Comment on lines +123 to +128
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grammar description and grep example don't match what log_event actually emits. log_event in transitions.py emits event=<action> entity=<entity_id> trigger=<trigger> k=v ... — there is no separate action= token. The documented grep 'action=worker_heartbeat_failed' will match nothing. Either the grammar line should drop the phantom action=<action> and the example should be grep 'event=worker_heartbeat_failed'.

Suggested change
Controller audit events (`event=<kind> action=<action> entity=<id> ...`) are
emitted as structured `logger.info` lines — query them through
`iris process logs` with a substring filter, not via SQL. Example:
-- What the controller has been doing
SELECT kind, count(*) FROM txn_log GROUP BY kind ORDER BY count(*) DESC LIMIT 10;
```bash
iris process logs --since 24h | grep 'action=worker_heartbeat_failed'
Controller audit events (`event=<action> entity=<id> trigger=<parent> k=v ...`)
are emitted as structured `logger.info` lines — query them through
`iris process logs` with a substring filter, not via SQL. Example:
```bash
iris process logs --since 24h | grep 'event=worker_heartbeat_failed'

```

Full table list: `iris query "SELECT name FROM sqlite_master WHERE type='table'"`.
Expand Down
3 changes: 0 additions & 3 deletions lib/iris/dashboard/src/App.vue
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ const WORKER_TABS: Tab[] = [
{ key: 'fleet', label: 'Workers', to: '/fleet' },
{ key: 'endpoints', label: 'Endpoints', to: '/endpoints' },
{ key: 'autoscaler', label: 'Autoscaler', to: '/autoscaler' },
{ key: 'transactions', label: 'Transactions', to: '/transactions' },
{ key: 'account', label: 'Account', to: '/account' },
{ key: 'status', label: 'Status', to: '/status' },
]
Expand All @@ -30,7 +29,6 @@ const KUBERNETES_TABS: Tab[] = [
{ key: 'scheduler', label: 'Scheduler', to: '/scheduler' },
{ key: 'cluster', label: 'Cluster', to: '/cluster' },
{ key: 'endpoints', label: 'Endpoints', to: '/endpoints' },
{ key: 'transactions', label: 'Transactions', to: '/transactions' },
{ key: 'account', label: 'Account', to: '/account' },
{ key: 'status', label: 'Status', to: '/status' },
]
Expand All @@ -46,7 +44,6 @@ const PATH_TO_TAB: Record<string, string> = {
'/cluster': 'cluster',
'/endpoints': 'endpoints',
'/autoscaler': 'autoscaler',
'/transactions': 'transactions',
'/account': 'account',
'/status': 'status',
}
Expand Down
102 changes: 0 additions & 102 deletions lib/iris/dashboard/src/components/controller/TransactionsTab.vue

This file was deleted.

4 changes: 0 additions & 4 deletions lib/iris/dashboard/src/router.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,6 @@ const routes = [
path: '/status',
component: () => import('./components/controller/StatusTab.vue'),
},
{
path: '/transactions',
component: () => import('./components/controller/TransactionsTab.vue'),
},
{
path: '/scheduler',
component: () => import('./components/controller/SchedulerTab.vue'),
Expand Down
13 changes: 0 additions & 13 deletions lib/iris/dashboard/src/types/rpc.ts
Original file line number Diff line number Diff line change
Expand Up @@ -458,19 +458,6 @@ export interface GetProcessStatusResponse {
logEntries?: LogEntry[]
}

// -- Transactions --

export interface TransactionAction {
timestamp?: ProtoTimestamp
action?: string
entityId?: string
details?: string
}

export interface GetTransactionsResponse {
actions: TransactionAction[]
}

// -- Task State Counts (used in job summaries and user summaries) --

/** Mapping from lowercase state name to count, e.g. { running: 2, pending: 5 } */
Expand Down
2 changes: 0 additions & 2 deletions lib/iris/scripts/benchmark_db_queries.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,6 @@
"task_resource_history",
"endpoints",
"reservation_claims",
"txn_log",
"txn_actions",
"meta",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has two more dangling references to the deleted _transaction_actions helper that this PR missed:

  • line 70 — still imported in the from iris.cluster.controller.service import (...) block
  • line 442 — still called via bench("_transaction_actions", lambda: _transaction_actions(db))

Since _transaction_actions was removed from service.py, running the script now raises ImportError: cannot import name '_transaction_actions' from 'iris.cluster.controller.service' at module load. Drop both the import entry and the bench(...) call.

Fix this →

"schema_migrations",
]
Expand Down
17 changes: 16 additions & 1 deletion lib/iris/src/iris/cluster/controller/auth.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,13 @@ def create_api_key(
"VALUES (?, ?, ?, ?, ?, ?, ?)",
(key_id, key_hash, key_prefix, user_id, name, now.epoch_ms(), expires_at.epoch_ms() if expires_at else None),
)
logger.info(
"event=api_key_created entity=%s trigger=- user=%s name=%s expires_at_ms=%s",
key_id,
user_id,
name,
expires_at.epoch_ms() if expires_at else "-",
)


def lookup_api_key_by_hash(db: ControllerDB, key_hash: str) -> ApiKeyRow | None:
Expand All @@ -81,7 +88,10 @@ def revoke_api_key(db: ControllerDB, key_id: str, now: Timestamp) -> bool:
f"UPDATE {db.api_keys_table} SET revoked_at_ms = ? WHERE key_id = ? AND revoked_at_ms IS NULL",
(now.epoch_ms(), key_id),
)
return cur._cursor.rowcount > 0
revoked = cur._cursor.rowcount > 0
if revoked:
logger.info("event=api_key_revoked entity=%s trigger=-", key_id)
return revoked


def list_api_keys(db: ControllerDB, user_id: str | None = None) -> list[ApiKeyRow]:
Expand Down Expand Up @@ -111,6 +121,11 @@ def revoke_login_keys_for_user(db: ControllerDB, user_id: str, now: Timestamp) -
" WHERE user_id = ? AND name LIKE 'login-%' AND revoked_at_ms IS NULL",
(now.epoch_ms(), user_id),
)
logger.info(
"event=login_keys_revoked entity=%s trigger=- count=%d",
user_id,
len(revoked_ids),
)
return revoked_ids


Expand Down
Loading
Loading