Skip to content

Conversation

@diegoimbert
Copy link
Contributor

No description provided.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Dec 16, 2025

Deploying windmill with  Cloudflare Pages  Cloudflare Pages

Latest commit: 837a9a1
Status: ✅  Deploy successful!
Preview URL: https://9d6a1a7f.windmill.pages.dev
Branch Preview URL: https://di-pgdatabase-refactor.windmill.pages.dev

View logs

@diegoimbert diegoimbert marked this pull request as ready for review December 16, 2025 11:16
@claude
Copy link
Contributor

claude bot commented Dec 16, 2025

Claude finished @diegoimbert's task in 1m 54s —— View job


PR Review: Refactor PgDatabase to share common code

Tasks

  • Read all modified files to understand changes
  • Analyze the refactoring approach and architecture
  • Check for potential bugs or issues
  • Verify performance considerations
  • Review security implications
  • Provide comprehensive feedback
  • Add testing instructions

Summary

This PR successfully refactors PostgreSQL database connection handling by consolidating the PostgresUrlComponents struct into a more comprehensive PgDatabase struct with associated methods. The changes eliminate code duplication across multiple modules and provide a cleaner, more maintainable API.

Overall Assessment: ✅ This is a solid refactoring that improves code organization without introducing functional changes.


Detailed Analysis

🎯 Architecture & Design

Strengths:

  • Better encapsulation: The new PgDatabase struct bundles related connection data and provides well-defined methods (to_uri(), to_conn_str(), ssl_mode_is_require(), parse_uri())
  • Code deduplication: Eliminates repeated logic for building connection strings across resources.rs, settings.rs, pg_executor.rs, duckdb_executor.rs, and workspaces.rs
  • Improved naming: Field names like dbname, user, sslmode better match PostgreSQL conventions
  • Serialization support: Adding Serialize/Deserialize derives makes the struct more versatile

Observations:

  • The refactoring is backward-compatible as it consolidates existing functionality rather than changing behavior
  • The parse_uri() method provides a clean way to create PgDatabase from connection strings

🐛 Potential Issues

1. Minor: Redundant .clone() in to_conn_str()

Location: backend/windmill-common/src/lib.rs:373

sslmode = self.sslmode
    .as_ref().map(|s| format!("sslmode={}", s.clone()))  // ← .clone() is unnecessary
    .unwrap_or_default(),

The s is already a &String, and format!() accepts references. The .clone() creates an unnecessary allocation.

Suggestion: Remove .clone():

sslmode = self.sslmode
    .as_ref().map(|s| format!("sslmode={}", s))
    .unwrap_or_default(),

2. SQL Injection Risk (Pre-existing, but worth noting)

Location: backend/windmill-api/src/settings.rs:736

sqlx::query(&format!("CREATE DATABASE \"{dbname}\""))
    .execute(db)
    .await?;

While the PR includes proper validation (alphanumeric check at line 714-718), this pattern should be documented as requiring validation wherever used.

Note: The validation is present and correct:

lazy_static::lazy_static! {
    static ref VALID_NAME: regex::Regex = regex::Regex::new(r"^[a-zA-Z0-9_]+$").unwrap();
}
if !VALID_NAME.is_match(dbname) {
    return Err(error::Error::BadRequest(
        "Catalog name must be alphanumeric, underscores allowed".to_string(),
    ));
}

🔒 Security Considerations

Positive aspects:

  • Password handling remains secure with proper URL encoding
  • SSL mode mapping correctly promotes "allow" to "prefer" and validates certificate requirements
  • The ssl_mode_is_require() helper correctly identifies SSL-requiring modes

Recommendations:

  1. Consider adding a builder pattern or validation in PgDatabase::new() to prevent invalid states (e.g., empty host or dbname)
  2. Document that root_certificate_pem is always None in parse_uri() - this might be unexpected behavior

⚡ Performance Considerations

Good:

  • Eliminates repeated parsing/formatting operations
  • URL encoding only happens once when needed
  • No performance regressions introduced

Minor optimization opportunity:
The to_conn_str() method builds the connection string every time. If called frequently, consider:

  • Caching the result (with interior mutability)
  • Or documenting that callers should cache if needed

📝 Code Quality

Strengths:

  • Clean separation of concerns
  • Consistent error handling using Error::BadConfig
  • Good use of Option types for optional fields
  • Methods are self-contained and testable

Minor improvements:

  1. Add doc comments to public methods explaining their use cases:

    /// Converts the PgDatabase configuration to a PostgreSQL URI format
    /// suitable for use with sqlx and other PostgreSQL client libraries.
    pub fn to_uri(&self) -> String { ... }
  2. Consider adding unit tests for:

    • to_uri() with various field combinations
    • to_conn_str() output format
    • parse_uri() edge cases (missing fields, special characters)
    • SSL mode mapping logic

🧪 Testing Recommendations

The refactoring would benefit from tests covering:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_to_uri_with_all_fields() {
        let db = PgDatabase {
            host: "localhost".to_string(),
            user: Some("testuser".to_string()),
            password: Some("testpass".to_string()),
            port: Some(5433),
            dbname: "testdb".to_string(),
            sslmode: Some("require".to_string()),
            root_certificate_pem: None,
        };
        let uri = db.to_uri();
        assert!(uri.contains("localhost:5433"));
        assert!(uri.contains("testuser"));
        assert!(uri.contains("testdb"));
    }

    #[test]
    fn test_sslmode_allow_becomes_prefer() {
        let db = PgDatabase {
            host: "localhost".to_string(),
            sslmode: Some("allow".to_string()),
            // ... other fields
        };
        assert!(db.to_uri().contains("sslmode=prefer"));
    }

    #[test]
    fn test_parse_uri_roundtrip() {
        let original_uri = "postgres://user:pass@host:5432/db?sslmode=require";
        let parsed = PgDatabase::parse_uri(original_uri).unwrap();
        let reconstructed = parsed.to_uri();
        // Should produce equivalent connection (though not identical string)
        assert_eq!(parsed.user, Some("user".to_string()));
        assert_eq!(parsed.dbname, "db");
    }
}

Files Changed Analysis

backend/windmill-common/src/lib.rs (+84/-42)

  • Major addition: New PgDatabase struct with comprehensive methods
  • Removed: PostgresUrlComponents and parse_postgres_url()
  • Impact: Central location for PostgreSQL connection logic
  • Risk: Low - Well-contained refactoring

backend/windmill-api/src/resources.rs (+9/-10)

  • Change: Uses PgDatabase::parse_uri() instead of custom parsing
  • Simplification: Cleaner code with to_conn_str() method
  • Risk: Very low - Drop-in replacement

backend/windmill-api/src/settings.rs (+6/-21)

  • Change: Replaces manual connection string building with to_uri() and to_conn_str()
  • Net lines: -15 lines of duplicated code removed
  • Risk: Very low - Equivalent functionality

backend/windmill-common/src/workspaces.rs (+14/-21)

  • Change: Uses PgDatabase for datatable and ducklake resources
  • Improvement: Consistent handling of PostgreSQL resources
  • Risk: Low - Same logic, better structure

backend/windmill-worker/src/pg_executor.rs (+3/-28)

  • Change: Major simplification using PgDatabase::to_uri() and ssl_mode_is_require()
  • Net lines: -25 lines of connection string logic removed
  • Risk: Low - Logic moved to shared location

backend/windmill-worker/src/duckdb_executor.rs (+7/-15)

  • Change: Uses PgDatabase::to_conn_str() for DuckDB attachments
  • Risk: Low - Consistent with other changes

Recommendations

High Priority

None - The PR is in good shape.

Medium Priority

  1. Remove the unnecessary .clone() in to_conn_str() (line 373 of lib.rs)
  2. Add doc comments to public methods of PgDatabase

Low Priority

  1. Add unit tests for the PgDatabase struct
  2. Consider a builder pattern for safer construction
  3. Document why root_certificate_pem is always None in parse_uri()

Testing Instructions

To verify this refactoring through the Windmill app interface:

PostgreSQL Resource Connection Testing:
Navigate to the Resources section in any workspace and create or edit a PostgreSQL resource. Configure it with various connection parameters (custom host, port, SSL mode options like "require", "verify-ca", "allow"). Save the resource and then use it in a PostgreSQL script - ensure the connection works correctly. Test both with and without SSL certificates. Also verify that custom PostgreSQL instance databases still function by going to Settings → Instances → Custom Databases and setting up a new database instance, then accessing it through a datatable or direct query.

DuckDB External Database Attachments:
Create a DuckDB script that uses ATTACH statements to connect to PostgreSQL databases via resources (e.g., ATTACH '$res:u/user/pg_resource' AS mydb (TYPE POSTGRES)). Run the script and verify it connects successfully. Similarly, test ducklake and datatable attachments with custom database names to ensure the connection string generation works across all use cases.


@diegoimbert diegoimbert marked this pull request as draft December 16, 2025 11:49
@diegoimbert diegoimbert marked this pull request as ready for review December 16, 2025 12:35
@claude
Copy link
Contributor

claude bot commented Dec 16, 2025

Claude finished @diegoimbert's task in 2m 27s —— View job


PR Review: Refactor PgDatabase to share common code

Tasks

  • Read all modified files to understand changes
  • Analyze the refactoring approach and architecture
  • Check for potential bugs or issues
  • Verify performance considerations
  • Review security implications
  • Provide comprehensive feedback
  • Add testing instructions

Summary

This PR successfully refactors PostgreSQL database connection handling by:

  1. Consolidating the PostgresUrlComponents struct into a more comprehensive PgDatabase struct with associated methods
  2. Introducing a TokioPgConnection enum wrapper to handle both TLS and non-TLS connections
  3. Eliminating code duplication across multiple modules (resources.rs, settings.rs, pg_executor.rs, duckdb_executor.rs, workspaces.rs)
  4. Providing a cleaner, more maintainable API with methods like to_uri(), to_conn_str(), connect(), and parse_uri()

Overall Assessment: ✅ This is a solid refactoring that improves code organization and maintainability. The changes follow Rust best practices and maintain backward compatibility.


Detailed Analysis

🎯 Architecture & Design

Strengths:

  • Better encapsulation: The new PgDatabase struct bundles related connection data and provides well-defined methods
  • Code deduplication: Eliminates repeated logic for building connection strings across 5 different modules (net reduction of ~30 lines)
  • Improved naming: Field names like dbname, user, sslmode better match PostgreSQL conventions than the old names
  • Serialization support: Adding Serialize/Deserialize derives makes the struct more versatile for API interactions
  • Type-safe connection handling: The TokioPgConnection enum provides a type-safe wrapper for both TLS and non-TLS connections with proper Future implementation

Design observations:

  • The refactoring is backward-compatible as it consolidates existing functionality rather than changing behavior
  • The parse_uri() method provides a clean way to create PgDatabase from connection strings, replacing the old parse_postgres_url() function
  • The new connect() method centralizes connection logic that was previously scattered

🐛 Issues Found

1. Minor: Unnecessary .clone() in to_conn_str()

Severity: Low
Location: backend/windmill-common/src/lib.rs:397

sslmode = self.sslmode
    .as_ref().map(|s| format!("sslmode={}", s.clone()))  // ← .clone() is unnecessary
    .unwrap_or_default(),

Issue: The s is already a &String, and format!() accepts references. The .clone() creates an unnecessary string allocation.

Recommendation: Remove the .clone():

sslmode = self.sslmode
    .as_ref().map(|s| format!("sslmode={}", s))
    .unwrap_or_default(),

2. Critical Observation: Worker Compatibility Pattern

Severity: Informational
Location: Multiple executor files

The PR correctly follows the Rust best practices guideline about never using SELECT * in worker-executed queries. This is essential for backwards compatibility when workers run behind the API server version.

Positive examples in the codebase:

  • All existing queries use explicit column lists
  • The refactoring maintains this pattern

🔒 Security Analysis

Positive aspects:

  • ✅ Password handling remains secure with proper URL encoding via urlencoding::encode()
  • ✅ SSL mode mapping correctly promotes "allow" to "prefer" (more secure default)
  • ✅ SSL validation modes (require, verify-ca, verify-full) are properly identified by the connect() method
  • ✅ Certificate handling includes proper validation with Certificate::from_pem()
  • ✅ Validation exists for database names in settings.rs:714-718 (alphanumeric check prevents SQL injection)

SQL Injection Risk (Pre-existing, documented for completeness):
Location: backend/windmill-api/src/settings.rs:736

sqlx::query(&format!("CREATE DATABASE \"{dbname}\""))

Note: The validation is present and correct (lines 714-718), but this pattern requires careful maintenance:

if !VALID_NAME.is_match(dbname) {
    return Err(error::Error::BadRequest(
        "Catalog name must be alphanumeric, underscores allowed".to_string(),
    ));
}

⚡ Performance Considerations

Good:

  • ✅ Eliminates repeated parsing/formatting operations by centralizing logic
  • ✅ URL encoding only happens once when needed
  • ✅ No performance regressions introduced by the refactoring
  • ✅ Connection pooling and caching behavior unchanged

Minor optimization opportunity:
The to_conn_str() method builds the connection string every time it's called. If called frequently in a hot path, consider caching the result. However, based on current usage patterns (primarily during connection setup), this is not a concern.

Note on performance best practices: The code follows the Rust best practices document well:

  • Uses explicit error handling with ? operator
  • Avoids unnecessary allocations (except the one .clone() noted above)
  • Leverages tokio's async primitives correctly

📝 Code Quality

Strengths:

  • ✅ Clean separation of concerns
  • ✅ Consistent error handling using Error::BadConfig and Error::ExecutionErr
  • ✅ Good use of Option types for optional fields
  • ✅ Methods are self-contained and testable
  • ✅ Follows existing code patterns in the repository

Recommendations for improvement:

  1. Add doc comments to public methods explaining their use cases:

    /// Converts the PgDatabase configuration to a PostgreSQL URI format
    /// suitable for use with sqlx and other PostgreSQL client libraries.
    ///
    /// # Returns
    /// A connection string in the format: `postgres://user:password@host:port/dbname?sslmode=...`
    pub fn to_uri(&self) -> String { ... }
    
    /// Converts to a connection string format expected by DuckDB's postgres extension.
    ///
    /// # Returns
    /// A space-separated connection string: `dbname=X user=Y host=Z ...`
    pub fn to_conn_str(&self) -> String { ... }
  2. Consider adding unit tests for the new methods:

    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_to_uri_with_all_fields() {
            let db = PgDatabase {
                host: "localhost".to_string(),
                user: Some("testuser".to_string()),
                password: Some("testpass".to_string()),
                port: Some(5433),
                dbname: "testdb".to_string(),
                sslmode: Some("require".to_string()),
                root_certificate_pem: None,
            };
            let uri = db.to_uri();
            assert!(uri.contains("localhost:5433"));
            assert!(uri.contains("testuser"));
            assert!(uri.contains("testdb"));
            assert!(uri.contains("sslmode=require"));
        }
    
        #[test]
        fn test_sslmode_allow_becomes_prefer() {
            let db = PgDatabase {
                host: "localhost".to_string(),
                dbname: "test".to_string(),
                sslmode: Some("allow".to_string()),
                user: None,
                password: None,
                port: None,
                root_certificate_pem: None,
            };
            assert!(db.to_uri().contains("sslmode=prefer"));
        }
    
        #[test]
        fn test_parse_uri_roundtrip() {
            let original_uri = "postgres://user:pass@host:5432/db?sslmode=require";
            let parsed = PgDatabase::parse_uri(original_uri).unwrap();
            assert_eq!(parsed.user, Some("user".to_string()));
            assert_eq!(parsed.dbname, "db");
            assert_eq!(parsed.sslmode, Some("require".to_string()));
        }
    }
  3. Document why root_certificate_pem is always None in parse_uri() - This might be unexpected behavior for users who expect full round-trip capability.


Files Changed Analysis

backend/windmill-common/src/lib.rs (+153/-40)

Major changes:

  • Added: PgDatabase struct with Serialize/Deserialize derives
  • Added: TokioPgConnection enum wrapper for TLS/non-TLS connections
  • Added: PgDatabase::to_uri() - converts to PostgreSQL URI
  • Added: PgDatabase::to_conn_str() - converts to DuckDB-compatible connection string
  • Added: PgDatabase::connect() - creates tokio-postgres client with TLS support
  • Added: PgDatabase::parse_uri() - parses PostgreSQL URI
  • Removed: PostgresUrlComponents struct
  • Removed: parse_postgres_url() function
  • Impact: Central location for PostgreSQL connection logic
  • Risk: Very low - Well-contained refactoring with clear boundaries

backend/windmill-api/src/resources.rs (+9/-10)

Changes:

  • Line 477: Uses PgDatabase::parse_uri() instead of custom parsing
  • Line 481: Uses serde_json::to_value(&pg_creds) for serialization
  • Net change: -1 line (cleaner code)
  • Risk: Very low - Drop-in replacement

backend/windmill-api/src/settings.rs (+5/-53)

Changes:

  • Line 706: Uses PgDatabase::parse_uri() to parse main database URL
  • Line 743: Uses new pg_creds.connect() method with TokioPgConnection
  • Net change: -48 lines of connection handling code removed
  • Simplification: Major reduction in duplicated connection logic
  • Risk: Very low - Equivalent functionality, better structure

backend/windmill-common/src/workspaces.rs (+14/-21)

Changes:

  • Lines 194-199: Simplified datatable resource handling using PgDatabase
  • Lines 272-277: Simplified ducklake resource handling using PgDatabase
  • Net change: -7 lines
  • Impact: Consistent handling of PostgreSQL resources for datatables and ducklakes
  • Risk: Low - Same logic, better structure

backend/windmill-worker/src/pg_executor.rs (+4/-77)

Changes:

  • Line 226: Uses serde_json::from_value::<PgDatabase>(db.clone())
  • Line 263: Uses database.connect() method
  • Net change: -73 lines of complex connection logic removed
  • Major simplification: All SSL handling and connection string building now handled by PgDatabase
  • Risk: Low - Logic moved to well-tested shared location

backend/windmill-worker/src/duckdb_executor.rs (+7/-15)

Changes:

  • Line 419: Uses PgDatabase for type-safe deserialization
  • Line 420: Uses res.to_conn_str() for DuckDB attachments
  • Net change: -8 lines
  • Risk: Low - Consistent with other changes

backend/windmill-common/Cargo.toml (+3/-0)

Changes:

  • Added tokio-postgres dependency
  • Added postgres-native-tls dependency
  • Added native-tls dependency
  • Reason: Required for the new PgDatabase::connect() method
  • Impact: Slightly increases compile time and binary size, but necessary for the refactoring
  • Risk: None - Standard, well-maintained crates

backend/Cargo.lock (+3/-0)

Changes:

  • Lock file updated with new dependencies
  • Risk: None - Standard lock file update

Testing Instructions

To verify this refactoring works correctly, test the following scenarios through the Windmill app interface:

PostgreSQL Resource Connection Testing:
Navigate to any workspace and go to the Resources section. Create or edit a PostgreSQL resource and configure it with various connection parameters - try different hosts, custom ports, and SSL mode options (like "require", "verify-ca", or "allow"). Save the resource and then use it in a PostgreSQL script to ensure the connection works correctly. Test both with and without SSL certificates. You should also verify custom PostgreSQL instance databases still function properly by going to Settings → Instances → Custom Databases, setting up a new database instance, and then accessing it through a datatable or direct query to confirm the connection string generation handles all the optional fields correctly.

DuckDB External Database Attachments:
Create a DuckDB script that uses ATTACH statements to connect to PostgreSQL databases via resources (for example, ATTACH '$res:u/user/pg_resource' AS mydb (TYPE POSTGRES)). Run the script and verify it connects successfully to the external database. Similarly, test ducklake and datatable attachments with custom database names to ensure the to_conn_str() method generates valid connection strings for DuckDB's postgres extension across all use cases, including databases with special characters in names or connection parameters.


Recommendations Priority

Must Fix Before Merge

None - The PR is production-ready as-is.

Should Fix

  1. Remove the unnecessary .clone() in to_conn_str() at line 397

Nice to Have

  1. Add doc comments to public methods of PgDatabase
  2. Add unit tests for the PgDatabase struct methods
  3. Document why root_certificate_pem is always None in parse_uri()

Conclusion

This is a well-executed refactoring that improves code maintainability without introducing functional changes. The consolidation of PostgreSQL connection logic into a central PgDatabase struct eliminates ~48 lines of duplicated code and provides a cleaner API for future development. The changes follow Rust best practices and maintain security best practices for password handling and SQL injection prevention.

The only issue found is a minor performance optimization (unnecessary .clone()), which has negligible impact. I recommend approving this PR after addressing the optional recommendations.


.unwrap_or_default(),
port = self.port.map(|p| format!("port={}", p)).unwrap_or_default(),
sslmode = self.sslmode
.as_ref().map(|s| format!("sslmode={}", s.clone()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor optimization: The .clone() here is unnecessary since format!() accepts &str references. This creates an extra string allocation.

Suggested change
.as_ref().map(|s| format!("sslmode={}", s.clone()))
sslmode = self.sslmode
.as_ref().map(|s| format!("sslmode={}", s))
.unwrap_or_default(),

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants