Skip to content

[OPIK-5191] [BE] feat: add user identity to ClickHouse log_comment#5778

Open
Nimrod007 wants to merge 5 commits intomainfrom
nimrodlahav/OPIK-5191-add-username-to-log-comment
Open

[OPIK-5191] [BE] feat: add user identity to ClickHouse log_comment#5778
Nimrod007 wants to merge 5 commits intomainfrom
nimrodlahav/OPIK-5191-add-username-to-log-comment

Conversation

@Nimrod007
Copy link
Copy Markdown
Collaborator

@Nimrod007 Nimrod007 commented Mar 22, 2026

Details

Extends the existing ClickHouse log_comment SETTINGS format from query_name:workspace_id:details to query_name:workspace_id:user_name:details so slow query investigations can identify which user triggered each query.

  • Adds userName parameter to DatabaseUtils.getSTWithLogComment() and getLogComment()
  • Updates ~100 call sites across 11 DAO files — userName was already in scope at every site (first param of makeMonoContextAware lambda)
  • Motivated by a prod analysis showing 30 slow queries (>1s) in 60 min including a 209s OOM — user attribution would significantly improve investigation

Change checklist

  • User facing
  • Documentation update

Issues

  • OPIK-5191

AI-WATERMARK

AI-WATERMARK: yes

  • If yes:
    • Tools: Claude Code
    • Model(s): Claude Opus 4.6
    • Scope: Full implementation
    • Human verification: Build compilation verified, code review of all changes

Testing

  • Verified build compiles: mvn compile passes with zero errors
  • No existing tests reference log_comment or DatabaseUtils — consistent with prior log_comment changes (OPIK-5050, OPIK-4380) which also had no tests
  • Post-deployment verification query:
SELECT
    extractAll(log_comment, ':([^:]+)')[3] AS user_name,
    count() AS query_count,
    avg(query_duration_ms) AS avg_duration_ms
FROM system.query_log
WHERE event_date = today() AND type = 'QueryFinish' AND log_comment != ''
GROUP BY user_name
ORDER BY query_count DESC

Documentation

No documentation changes needed — internal observability improvement only.

…r query attribution

Extend the existing log_comment SETTINGS format from
`query_name:workspace_id:details` to `query_name:workspace_id:user_name:details`
so slow query investigations can identify which user triggered each query.

The userName is already available in scope at every call site (first parameter
of the makeMonoContextAware lambda) — this change simply passes it through to
the log_comment string.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Nimrod007 Nimrod007 requested a review from a team as a code owner March 22, 2026 19:47
@github-actions github-actions bot added java Pull requests that update Java code Backend labels Mar 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

📋 PR Linter Failed

Missing Section. The description is missing the ## Details section.


Missing Section. The description is missing the ## Change checklist section.


Missing Section. The description is missing the ## Issues section.


Missing Section. The description is missing the ## Testing section.


Missing Section. The description is missing the ## Documentation section.

2 similar comments
@github-actions
Copy link
Copy Markdown
Contributor

📋 PR Linter Failed

Missing Section. The description is missing the ## Details section.


Missing Section. The description is missing the ## Change checklist section.


Missing Section. The description is missing the ## Issues section.


Missing Section. The description is missing the ## Testing section.


Missing Section. The description is missing the ## Documentation section.

@github-actions
Copy link
Copy Markdown
Contributor

📋 PR Linter Failed

Missing Section. The description is missing the ## Details section.


Missing Section. The description is missing the ## Change checklist section.


Missing Section. The description is missing the ## Issues section.


Missing Section. The description is missing the ## Testing section.


Missing Section. The description is missing the ## Documentation section.

Comment on lines +174 to +185
public static ST getSTWithLogComment(String query, String queryName, String workspaceId, String userName,
Object details) {
var logComment = getLogComment(queryName, workspaceId, userName, details);
return TemplateUtils.newST(query)
.add("log_comment", logComment);
}

public static String getLogComment(String queryName, String workspaceId, Object details) {
public static String getLogComment(String queryName, String workspaceId, String userName, Object details) {
return TemplateUtils.newST(LOG_COMMENT)
.add("query_name", queryName != null ? queryName.replace("'", "''") : null)
.add("workspace_id", workspaceId != null ? workspaceId.replace("'", "''") : null)
.add("user_name", userName != null ? userName.replace("'", "''") : null)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getLogComment logs RequestContext.USER_NAME into ClickHouse (which can include emails/PII) and violates .agents/skills/opik-backend/SKILL.md; should we restrict log_comment to a non-PII identifier like workspace_id/opaque user ID or mask the user string?

Finding type: AI Coding Guidelines | Severity: 🔴 High


Want Baz to fix this for you? Activate Fixer

Other fix methods

Fix in Cursor

Prompt for AI Agents:

Before applying, verify this suggestion against the current code. In
apps/opik-backend/src/main/java/com/comet/opik/infrastructure/DatabaseUtils.java around
lines 174 to 185, the getLogComment method currently adds user_name directly into
log_comment which can contain PII (emails/display names). Change getLogComment to NOT
log raw user_name: either remove the .add("user_name", ...) entry entirely or replace it
with a sanitized non-PII value (for example: look up an opaque user id from
RequestContext or compute a short deterministic hash of the user_name and render it as
"user_hash:<first8chars>" or mask emails by replacing the local-part with '***'). Make
the change in getLogComment and update getSTWithLogComment signature/usage if needed to
accept the sanitized value; preserve the existing single-quote escaping behavior for the
final rendered string. Add a short unit test ensuring that log_comment never contains an
'@' or full email and that workspace_id continues to be logged as before.

Copy link
Copy Markdown
Member

@andrescrz andrescrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a future suggestion.


public static ST getSTWithLogComment(String query, String queryName, String workspaceId, Object details) {
var logComment = getLogComment(queryName, workspaceId, details);
public static ST getSTWithLogComment(String query, String queryName, String workspaceId, String userName,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: at some point we will need to create a Java record (with a builder) for the whole payload containing the query_name, workspace_id etc. fields for the log comment, instead of so many params.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend java Pull requests that update Java code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants