Skip to content

fix(codeql): Resolve static analysis security alerts#100

Merged
berntpopp merged 5 commits into
mainfrom
fix/codeql-security-alerts
Nov 30, 2025
Merged

fix(codeql): Resolve static analysis security alerts#100
berntpopp merged 5 commits into
mainfrom
fix/codeql-security-alerts

Conversation

@berntpopp

Copy link
Copy Markdown
Owner

Summary

  • Fix all ERROR-level CodeQL alerts (uninitialized local variables)
  • Fix all WARNING-level CodeQL alerts (unreachable code, procedure return value)
  • Fix multiple NOTE-level code quality issues (imports, unused variables, context managers)

Changes Made

ERROR Fixes (Critical)

  • tests/e2e/test_docker_health.py: Initialize data: dict = {} before try blocks to prevent uninitialized variable access

WARNING Fixes

  • phentrieve/evaluation/semantic_metrics.py: Remove dead code block (if False and ...)
  • phentrieve/visualization/plot_utils.py: Remove impossible condition (if num_models==1 inside if num_models>1 block)
  • tests/unit/cli/test_init.py: Don't assign procedure return value

NOTE Fixes (Code Quality)

  • api/main.py: Use imported config values (CORS settings, LOG_LEVEL)
  • phentrieve/cli/text_commands.py: Remove redundant logging import
  • tests/unit/cli/test_similarity_commands.py: Fix import-and-import-from issues
  • tests/unit/api/test_dependencies_model_loading.py: Properly assign reload result
  • tests/unit/data_processing/test_hpo_database.py: Use context managers for DB fixtures
  • tests/unit/retrieval/test_dense_retriever_real.py: Add descriptive alias for mocking import

Not Fixed (Informational)

  • CIS-DI-0005: Docker content trust is a deployment configuration, not a Dockerfile issue
  • CIS-DI-0010: Already addressed - both Dockerfiles use non-root users

Test plan

  • make check - Format and lint checks pass
  • make typecheck-fast - Type checking passes (0 errors)
  • Unit tests pass (555 passed, 1 skipped, 3 deselected)
  • No regressions introduced

Comment thread tests/unit/retrieval/test_dense_retriever_real.py Fixed

@github-advanced-security github-advanced-security AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR systematically addresses CodeQL static analysis security alerts across ERROR, WARNING, and NOTE severity levels. The changes focus on fixing uninitialized variables, removing dead code, eliminating unused imports/variables, and standardizing logging patterns to use lazy formatting.

Key Changes

  • Fixed critical uninitialized variable issues in test files by adding explicit initialization
  • Removed unreachable dead code blocks in visualization and evaluation modules
  • Standardized logging across the codebase to use %s placeholder formatting instead of f-strings for better performance
  • Applied proper context manager patterns and cleaned up unused variable assignments in tests
  • Updated dependencies (urllib3 for security, kubernetes downgrade)

Reviewed changes

Copilot reviewed 28 out of 29 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
uv.lock Dependency version updates for urllib3 (2.5.0) and kubernetes (33.1.0)
pyproject.toml Added explicit urllib3>=2.5.0 requirement for CVE fixes
tests/e2e/test_docker_health.py Initialize dict variables before try blocks to satisfy static analysis
tests/unit/cli/test_init.py Removed unused return value assignment from procedure callback
tests/unit/cli/test_similarity_commands.py Fixed import-and-import-from pattern by using module imports
tests/unit/retrieval/test_api_helpers.py Removed unused result variable, improved comment clarity
tests/unit/data_processing/test_hpo_database.py Applied context manager pattern for database fixtures
tests/unit/retrieval/test_dense_retriever_real.py Added descriptive alias for chromadb import used in mocking
tests/e2e/test_docker_security.py Simplified PIDs limit assertion logic
tests/unit/cli/test_index_commands.py Removed unused mock variables
tests/unit/cli/test_data_commands.py Removed unused mock variables
tests/unit/cli/test_cli_performance.py Made string concatenation explicit with parentheses
tests/unit/api/test_dependencies_model_loading.py Properly captured reload module result
api/main.py Used imported CORS and logging config values instead of hardcoded literals
api/dependencies.py Converted logging to use lazy %s formatting
api/routers/*.py Converted logging to use lazy %s formatting throughout
phentrieve/cli/text_commands.py Removed redundant logging import inside function
phentrieve/visualization/plot_utils.py Removed unreachable condition (num_models==1 inside num_models>1 block) and converted logging to %s format
phentrieve/evaluation/semantic_metrics.py Removed dead code block with if False and ... condition, converted logging to %s format
phentrieve/evaluation/metrics.py Converted logging to use lazy %s formatting
phentrieve/retrieval/*.py Converted logging to use lazy %s formatting
phentrieve/text_processing/*.py Converted logging to use lazy %s formatting
frontend/vite-icon-optimizer.js Removed redundant variable assignment
frontend/src/services/logService.js Changed to proper re-export pattern for useLogStore

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# (dense_retriever imports it locally inside functions)
import chromadb # noqa: F401
# The import is used for mocking PersistentClient in tests below
import chromadb as _chromadb_for_mocking # noqa: F401

Copilot AI Nov 30, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of '_chromadb_for_mocking' is not used.

Copilot uses AI. Check for mistakes.
## Log Injection Fixes (py/log-injection)
- Convert all f-string logging to %-style formatting across 15+ files
- Fixed in: api/dependencies.py, api/routers/*.py, phentrieve/retrieval/*.py,
  phentrieve/text_processing/*.py, phentrieve/evaluation/*.py

## Code Quality Fixes
- Fix py/redundant-comparison in plot_utils.py (4) and test_docker_security.py (1)
- Fix py/implicit-string-concatenation-in-list in test_cli_performance.py
- Fix py/unused-local-variable in test files (removed unused mock assignments)
- Fix py/unused-import in test_dense_retriever_real.py (clarify chromadb import)
- Fix py/uninitialized-local-variable in test_docker_health.py

## JavaScript Fixes
- Fix js/useless-assignment-to-local in vite-icon-optimizer.js
- Fix js/unused-local-variable in logService.js (convert to re-export)

## Security Updates
- Update urllib3>=2.5.0 for CVE fixes (GHSA-48p4-8xcf-vxj5, GHSA-pq67-6m6q-mj2v)

All tests pass (580 passed), 0 lint errors, 0 type errors.
@berntpopp berntpopp force-pushed the fix/codeql-security-alerts branch from b65f415 to 6f9814e Compare November 30, 2025 18:55
Implement OWASP-compliant log sanitization across all modules to address
CodeQL py/log-injection alerts. This prevents CWE-117 log forging attacks.

Changes:
- Add sanitize_log_value() utility in phentrieve/utils.py that:
  - Removes CRLF characters (\r\n, \r, \n)
  - Replaces tabs with spaces for readability
  - Removes non-printable control characters (< ASCII 32)
  - Truncates to max 500 chars to prevent log flooding

- Apply sanitization to user-provided values in log statements across:
  - api/dependencies.py (29 statements)
  - api/routers/*.py (34 statements)
  - phentrieve/retrieval/*.py (5 files)
  - phentrieve/text_processing/*.py (4 files)
  - phentrieve/evaluation/*.py (2 files)

References:
- https://owasp.org/www-community/attacks/Log_Injection
- https://codeql.github.com/codeql-query-help/python/py-log-injection/
Comment thread api/routers/similarity_router.py Fixed
Comment thread api/routers/text_processing_router.py Fixed
Comment thread api/routers/text_processing_router.py Fixed
Comment thread api/routers/text_processing_router.py Fixed
Comment thread api/routers/text_processing_router.py Fixed
Comment thread tests/unit/retrieval/test_dense_retriever_real.py Fixed
- test_hpo_database.py: Use context managers for HPODatabase (py/should-use-with)
- test_docker_health.py: Add explicit raise after pytest.fail (py/uninitialized-local-variable)
- test_similarity_commands.py: Use consistent import style (py/import-and-import-from)

These fixes address CodeQL alerts related to:
- CWE-404: Resource leak prevention with context managers
- Static analysis flow control clarity
- Import hygiene
- similarity_router.py: Sanitize error_detail containing user HPO terms
- text_processing_router.py: Sanitize numeric config values in debug logs
- test_dense_retriever_real.py: Add explicit chromadb reference for static analysis

This fixes the 6 NEW alerts identified by CodeQL in this PR:
- 5 high-severity log injection vulnerabilities
- 1 unused import warning
Configure ALLOWED_ORIGINS environment variable for phentrieve_api in
docker-compose.dev.yml to enable frontend-API communication during
local Docker development (ports 8080, 8001, 5734, 8734).
@berntpopp berntpopp merged commit b7d92d0 into main Nov 30, 2025
17 checks passed
@berntpopp berntpopp deleted the fix/codeql-security-alerts branch December 9, 2025 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants