Skip to content

Add baseline infrastructure for emulating materialized views #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

amotl
Copy link
Member

@amotl amotl commented Oct 10, 2023

About

Emulated materialized views may cover 80% of the most common use cases already.

Backlog

  • CLI
  • Documentation

Comment on lines 52 to 59
# FIXME: SQLParseException[Target table name must not include a schema]
sql_ddl = f"ALTER TABLE {mview.staging_table_fullname} RENAME TO {mview.table_name}"
logger.info(f"Activating materialized view: {sql_ddl}")
self.store.execute(sa.text(sql_ddl))
Copy link
Member Author

@amotl amotl Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a ticket in crate/crate: Currently, you can't rename a table "into a different schema", i.e. it will always be doc.

Copy link
Member Author

@amotl amotl Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like resolving this issue might not be too difficult. If someone has the capacity to file a corresponding issue at crate/crate, I will appreciate it. /cc @hammerhead, @hlcianfagna 🌻

// we do not support renaming to a different schema, thus the target table identifier must not include a schema
// this is an artificial limitation, technically it can be done
List<String> newIdentParts = node.newName().getParts();
if (newIdentParts.size() > 1) {
    throw new IllegalArgumentException("Target table name must not include a schema");
}

-- https://github.com/crate/crate/blob/5.4.3/server/src/main/java/io/crate/analyze/AlterTableAnalyzer.java#L96C1-L102C10

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hlcianfagna suggested to use ALTER CLUSTER SWAP TABLE as a workaround at crate/crate#14833 (comment). Thank you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used ALTER CLUSTER SWAP TABLE now, as you recommended. Thanks!

However, we added an inline comment that we'd prefer to use ALTER TABLE ... RENAME TO ..., because it wouldn't need the target table to exist.

Use ALTER TABLE ... RENAME TO ... after resolving crate/crate#14833.

SQLParseException[Target table name must not include a schema]

CrateDB does not support renaming to a different schema, thus the target table identifier must not include a schema. This is an artificial limitation, technically it can be done.
https://github.com/crate/crate/blob/5.3.3/server/src/main/java/io/crate/analyze/AlterTableAnalyzer.java#L97-L102

Comment on lines 41 to 42
# TODO: IF NOT EXISTS
sql_ddl = f"CREATE TABLE {mview.staging_table_fullname} AS (\n{mview.sql}\n)"
logger.info(f"Creating materialized view (staging): {sql_ddl}")
self.store.execute(sa.text(sql_ddl))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another spot where I discovered a shortcoming in SQL syntax: CREATE TABLE ... AS ... does not support the IF NOT EXISTS infix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We filed a database issue about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREATE TABLE IF NOT EXISTS ... AS ... now works as expected. Thanks @matriv!

Copy link

coderabbitai bot commented May 6, 2025

Warning

Rate limit exceeded

@amotl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 17 minutes and 17 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 6c462b9 and 6cdcaa8.

📒 Files selected for processing (12)
  • CHANGES.md (1 hunks)
  • cratedb_toolkit/materialized/core.py (1 hunks)
  • cratedb_toolkit/materialized/model.py (1 hunks)
  • cratedb_toolkit/materialized/readme.md (1 hunks)
  • cratedb_toolkit/materialized/schema.py (1 hunks)
  • cratedb_toolkit/materialized/schema.sql (1 hunks)
  • cratedb_toolkit/materialized/store.py (1 hunks)
  • examples/materialized_view.py (1 hunks)
  • tests/conftest.py (2 hunks)
  • tests/materialized/conftest.py (1 hunks)
  • tests/materialized/test_core.py (1 hunks)
  • tests/materialized/test_examples.py (1 hunks)

Walkthrough

This update introduces a baseline infrastructure for emulating materialized views in CrateDB. It adds core classes, models, schema setup, and store logic to define, manage, and refresh materialized views. The changes include a new example program, documentation, and comprehensive tests to validate the materialized view lifecycle and integration.

Changes

File(s) Change Summary
CHANGES.md Updated unreleased changelog to mention the new materialized views infrastructure.
cratedb_toolkit/materialized/core.py Added MaterializedViewManager class with methods to refresh synthetic materialized views by executing SQL operations and swapping tables.
cratedb_toolkit/materialized/model.py Introduced MaterializedView and MaterializedViewSettings dataclasses, and a helper for default table address.
cratedb_toolkit/materialized/schema.py, cratedb_toolkit/materialized/schema.sql Added function and SQL schema for setting up the materialized view management table.
cratedb_toolkit/materialized/store.py Implemented MaterializedViewStore class to create, retrieve, delete, and synchronize materialized view records using SQLAlchemy.
cratedb_toolkit/materialized/readme.md Added README documentation for the materialized views subsystem, describing features and goals.
examples/materialized_view.py New example program demonstrating creation, registration, and refresh of a materialized view, including data setup and cleanup.
tests/conftest.py Updated copyright year; added materialized view table to reset list.
tests/materialized/conftest.py Added pytest fixtures for settings, database, and store setup for materialized view testing.
tests/materialized/test_core.py Added tests for MaterializedViewManager covering error cases and successful refresh.
tests/materialized/test_examples.py Added test to run the materialized view example and verify correct record count.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Example as Example Program
    participant Manager as MaterializedViewManager
    participant Store as MaterializedViewStore
    participant DB as CrateDB

    User->>Example: Run example script
    Example->>Store: Setup schema, create materialized view definition
    Example->>Manager: Refresh materialized view
    Manager->>Store: Retrieve materialized view definition
    Manager->>DB: Drop staging table (if exists)
    Manager->>DB: Create staging table from view SQL
    Manager->>DB: Refresh staging table
    Manager->>DB: Ensure live table exists
    Manager->>DB: Swap staging table with live table
    Manager->>DB: Refresh live table
    Manager->>DB: Drop staging table
    Example->>DB: Query and display results
Loading

Poem

In the warren where data flows anew,
Materialized views now join the crew!
With swaps and refreshes, tables align,
Queries run faster—oh, how divine!
Thanks to this patch, our toolkit grew,
A hop, a leap, and a view for you!
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🔭 Outside diff range comments (1)
tests/conftest.py (1)

132-138: 💡 Verification agent

🧩 Analysis chain

Verify if automatic database reset is intended to be removed

The cratedb fixture no longer resets the database automatically before each test function. This might be intentional for the materialized view tests, but could affect other tests that expect a clean database state.


🏁 Script executed:

#!/bin/bash
# Check if there are any tests that might rely on database reset
grep -r "def test_.*cratedb" --include="test_*.py" tests/ | grep -v "tests/materialized/"

Length of output: 6109


🏁 Script executed:

#!/bin/bash
# Locate the definition of cratedb_custom_service in tests/conftest.py
grep -R "def cratedb_custom_service" -n tests/conftest.py

Length of output: 93


🏁 Script executed:

#!/bin/bash
# Display the implementation of cratedb_custom_service to check for database reset logic
sed -n '115,155p' tests/conftest.py

Length of output: 1359


🏁 Script executed:

#!/bin/bash
# Locate and display RESET_TABLES definition and any comment around it in tests/conftest.py
grep -R "RESET_TABLES" -n -C3 tests/conftest.py

Length of output: 816


Add per-test database reset in the cratedb fixture

The cratedb fixture currently just yields the session-scoped service, so the database is only reset once at session startup. To guarantee a clean state for every test (as the docstring promises), call reset before yielding.

• File: tests/conftest.py
• Location: cratedb fixture (around line 132)

Suggested change:

 @pytest.fixture(scope="function")
 def cratedb(cratedb_custom_service):
     """
     Provide a fresh canvas to each test case invocation, by resetting database content.
     """
+    cratedb_custom_service.reset(tables=RESET_TABLES)
     yield cratedb_custom_service
♻️ Duplicate comments (1)
cratedb_toolkit/materialized/core.py (1)

41-43: ⚠️ Potential issue

CREATE TABLE IF NOT EXISTS … AS (…) is not supported in CrateDB

A long-standing CrateDB limitation (see #14834) disallows combining IF NOT EXISTS with AS <query>. This DDL will raise SQLParseException. Remove the clause or guard existence beforehand:

-sql_ddl = f"CREATE TABLE IF NOT EXISTS {mview.staging_table_fullname} AS (\n{mview.sql}\n)"
+sql_ddl = f"CREATE TABLE {mview.staging_table_fullname} AS (\n{mview.sql}\n)"

You can swallow the RelationAlreadyExists error or test existence via information_schema.tables before issuing the statement.

🧹 Nitpick comments (21)
tests/materialized/test_examples.py (2)

1-1: Update copyright year

The copyright year should be updated to 2025 to match the implementation year.

-# Copyright (c) 2023, Crate.io Inc.
+# Copyright (c) 2025, Crate.io Inc.

5-11: Consider adding explicit assertions

While this test verifies the example runs without errors, adding specific assertions would better validate the expected behavior of the materialized view functionality.

 def test_example_materialized_view(store):
     """
     Verify that the program `examples/materialized_view.py` works.
     """
     from examples.materialized_view import main
+    
+    # Run the example
+    main(dburi=store.database.dburi)
+    
+    # Verify materialized view was created and populated
+    view = store.get_view_by_name("example_materialized_view")
+    assert view is not None
+    assert view.last_refresh_at is not None
-
-    main(dburi=store.database.dburi)
tests/materialized/readme.md (3)

8-9: Add missing preposition in the description

The sentence about high traffic reads needs a preposition for clarity.

 The subsystem emulates materialized views so that queries that take a long
-time to run can be cached, which is specifically useful when applied in
-scenarios with high traffic reads.
+time to run can be cached, which is specifically useful when applied in
+scenarios with high traffic of reads.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~9-~9: Possible missing preposition found.
Context: ... applied in scenarios with high traffic reads. ## Prior Art - https://github.com/nr...

(AI_HYDRA_LEO_MISSING_OF)


13-19: Format URLs as proper Markdown links

Use proper Markdown formatting for links instead of bare URLs to improve readability and maintainability.

-## Prior Art
+## Prior Art
 
-https://github.com/nroi/elfenbein
-https://github.com/nroi/pg_materialized_views_refresh_topologically
-https://github.com/darkside/monocle
-https://github.com/maggregor/maggregor
-https://github.com/adamfoneil/ViewMaterializer
-https://github.com/jhollinger/activerecord-viewmatic
-https://github.com/q-m/metabase-matview
+- [Elfenbein](https://github.com/nroi/elfenbein)
+- [PG Materialized Views Refresh Topologically](https://github.com/nroi/pg_materialized_views_refresh_topologically)
+- [Monocle](https://github.com/darkside/monocle)
+- [Maggregor](https://github.com/maggregor/maggregor)
+- [ViewMaterializer](https://github.com/adamfoneil/ViewMaterializer)
+- [ActiveRecord Viewmatic](https://github.com/jhollinger/activerecord-viewmatic)
+- [Metabase Matview](https://github.com/q-m/metabase-matview)
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

13-13: Bare URL used
null

(MD034, no-bare-urls)


14-14: Bare URL used
null

(MD034, no-bare-urls)


15-15: Bare URL used
null

(MD034, no-bare-urls)


16-16: Bare URL used
null

(MD034, no-bare-urls)


17-17: Bare URL used
null

(MD034, no-bare-urls)


18-18: Bare URL used
null

(MD034, no-bare-urls)


19-19: Bare URL used
null

(MD034, no-bare-urls)


1-20: Expand the README with usage examples and API reference

The README provides a good introduction but could be expanded to include usage examples, API reference, and configuration options for better user guidance.

Consider adding sections like:

  • Usage Example
  • API Reference
  • Configuration Options
  • Development and Testing

This would make the documentation more comprehensive for developers using this subsystem.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~9-~9: Possible missing preposition found.
Context: ... applied in scenarios with high traffic reads. ## Prior Art - https://github.com/nr...

(AI_HYDRA_LEO_MISSING_OF)


[grammar] ~11-~11: It appears that a preposition is missing after ‘Prior’.
Context: ... scenarios with high traffic reads. ## Prior Art - https://github.com/nroi/elfenbein - ...

(MISSING_PREPOSITION)

🪛 markdownlint-cli2 (0.17.2)

13-13: Bare URL used
null

(MD034, no-bare-urls)


14-14: Bare URL used
null

(MD034, no-bare-urls)


15-15: Bare URL used
null

(MD034, no-bare-urls)


16-16: Bare URL used
null

(MD034, no-bare-urls)


17-17: Bare URL used
null

(MD034, no-bare-urls)


18-18: Bare URL used
null

(MD034, no-bare-urls)


19-19: Bare URL used
null

(MD034, no-bare-urls)

cratedb_toolkit/materialized/schema.sql (1)

2-14: Consider shard-count configurability & quoting edge-cases

  1. Hard-coding CLUSTERED INTO 1 SHARDS is fine for demo/test workloads, but it can become a scalability bottleneck once the metadata table grows. Accepting the shard count from MaterializedViewSettings (with a sane default of 1) would make the schema reusable in larger installations.

  2. Although id, table_schema, table_name and sql are all declared TEXT, CrateDB implicitly uses keyword for TEXT without an analyzer unless specified. If you plan to filter with LIKE or full-text search on sql, you may want TEXT ANALYZER english. At a minimum, document the expectation.

  3. The identifiers are unquoted – fine for lowercase names – but CREATE TABLE will fail if a user provides an upper-case schema/table in settings.materialized_table. Wrapping {materialized_table.fullname} in double-quotes here (and ensuring fullname returns a quoted identifier) would harden the DDL.

cratedb_toolkit/materialized/core.py (1)

47-50: Dummy column placeholder may hide real schema mismatches

Creating the live table with a single dummy INT column ensures the identifier exists, but it masks divergent schemas between refreshes (e.g. columns added to the view). Consider:

  1. Creating the live table from the same SELECT the first time to mirror the staging schema, or
  2. Dropping the live table if it exists and recreating it from the staging table before the first swap.

This guarantees column consistency and avoids surprises when querying the live view.

tests/materialized/conftest.py (1)

35-36: Update fixture docstring to reflect its purpose

The docstring for the store fixture incorrectly states it provides a client database adapter, but it actually provides a MaterializedViewStore instance with schema already set up.

-    Provide a client database adapter, which is connected to the test database instance.
-    The retention policy database table schema has been established.
+    Provide a MaterializedViewStore instance, connected to the test database instance.
+    The materialized view management table schema has been established.
examples/materialized_view.py (3)

65-66: Address TODO comment about refactoring to MaterializedViewManager

The TODO comment indicates a potential refactoring opportunity to move the schema setup responsibility to the MaterializedViewManager class.

Consider moving the setup_schema call into the MaterializedViewManager constructor or providing a dedicated method to ensure the schema is properly set up before use.


152-153: Remove commented-out code

There's a commented-out line of code that should be removed to keep the codebase clean.

    def refresh(self):
        """
        Refresh materialized view.
        """
        self.manager.refresh(name="examples.raw_metrics_view")
-        # self.db.run_sql('REFRESH TABLE "examples"."raw_metrics_view";')

57-59: Consider making the test behavior configuration more explicit

The example changes its behavior based on the presence of an environment variable, which may not be immediately clear to readers of this example.

Consider making this behavior more explicit by adding a parameter to the constructor or documenting this behavior more clearly in comments.

cratedb_toolkit/materialized/model.py (2)

42-43: Reconsider staging table naming strategy

The current approach to generating staging table names might cause issues if the schema name already contains special characters or doesn't follow naming conventions.

Consider a more robust approach for creating staging table names, such as:

@property
def staging_table_fullname(self) -> str:
-    return f'"{self.table_schema}-staging"."{self.table_name}"'
+    return f'"{self.table_schema}_staging"."{self.table_name}"'

Or create a separate property for the staging schema name:

@property
def staging_schema_name(self) -> str:
    return f"{self.table_schema}_staging"

@property
def staging_table_fullname(self) -> str:
    return f'"{self.staging_schema_name}"."{self.table_name}"'

89-92: Simplify to_dict method

The current implementation of to_dict first converts the dataclass to a dictionary with dataclasses.asdict, but then manually assigns the materialized_table field. Since dataclasses.asdict already includes all fields, this manual assignment is likely unnecessary.

def to_dict(self):
    data = dataclasses.asdict(self)
-    data["materialized_table"] = self.materialized_table
    return data

If there's a specific reason for this explicit assignment, please add a comment explaining why it's needed.

tests/materialized/test_core.py (1)

23-26: Remove unused function foo

The function foo is defined but never used in the test file. It should be removed to avoid confusion.

-def foo(database):
-    database.run_sql(f'CREATE TABLE "{TESTDRIVE_EXT_SCHEMA}"."foobar" AS SELECT 1;')
-    database.run_sql(f'CREATE TABLE "{TESTDRIVE_DATA_SCHEMA}"."raw_metrics" AS SELECT 1;')
-
cratedb_toolkit/materialized/store.py (7)

83-86: Incorrect docstring needs updating

The docstring for the get_by_table method doesn't match its functionality. It mentions "effective policies to process, by strategy and tags" which appears to be copied from a different context.

def get_by_table(self, table_address: TableAddress) -> MaterializedView:
    """
-    Retrieve effective policies to process, by strategy and tags.
+    Retrieve a materialized view by its table schema and name.
+    Raises KeyError if the materialized view doesn't exist.
    """

117-126: Add exception handling to database operations

Database operations can fail for various reasons (connection issues, permission problems, etc.). Consider adding exception handling to make the code more robust.

def execute(self, statement) -> CursorResult:
    """
    Execute SQL statement, and return result object.

    TODO: Generalize, see `RetentionPolicyStore`.
    """
    with Session(self.database.engine) as session:
-        result = session.execute(statement)
-        session.commit()
-        return result
+        try:
+            result = session.execute(statement)
+            session.commit()
+            return result
+        except Exception as e:
+            session.rollback()
+            logger.error(f"Failed to execute statement: {e}")
+            raise

139-144: Incorrect docstring needs updating

The docstring for the exists method refers to a "retention policy" instead of a materialized view, which seems to be copied from another context.

def exists(self, mview: MaterializedView):
    """
-    Check if retention policy for specific table already exists.
+    Check if materialized view for specific table already exists.

    TODO: Generalize, see `RetentionPolicyStore`.
    """

70-81: Consider adding pagination for large result sets

The retrieve method fetches all records from the table without pagination. For tables with a large number of materialized views, this could lead to performance issues or memory problems.

Consider implementing pagination by adding optional limit and offset parameters:

-def retrieve(self):
+def retrieve(self, limit: t.Optional[int] = None, offset: t.Optional[int] = 0):
    """
    Retrieve all records from database table.

    TODO: Add filtering capabilities.
    TODO: Generalize, see `RetentionPolicyStore`.
+
+    :param limit: Maximum number of records to retrieve
+    :param offset: Number of records to skip
    """

    # Run SELECT statement, and return result.
    selectable = sa.select(self.table)
+    if limit is not None:
+        selectable = selectable.limit(limit)
+    if offset:
+        selectable = selectable.offset(offset)
    records = self.query(selectable)
    return records

45-68: Consider handling duplicate key errors more robustly

The current implementation checks for duplicates programmatically and has a specific error message for DuplicateKey errors. It might be more robust to rely on the database constraint and catch the specific exception type that would be raised.

def create(self, mview: MaterializedView, ignore: t.Optional[str] = None):
    """
    Create a new materialized view, and return its identifier.

    TODO: Generalize, see `RetentionPolicyStore`.
    """

    # TODO: Sanity check, whether target table already exists?

    ignore = ignore or ""

    # Sanity checks.
    if mview.table_schema is None:
        raise ValueError("Table schema needs to be defined")
    if mview.table_name is None:
        raise ValueError("Table name needs to be defined")
-    if self.exists(mview):
-        if not ignore.startswith("DuplicateKey"):
-            raise ValueError(f"Materialized view '{mview.table_schema}.{mview.table_name}' already exists")

    table = self.table
    # TODO: Add UUID as converter to CrateDB driver?
    identifier = str(uuid.uuid4())
    data = mview.to_storage_dict(identifier=identifier)
-    insertable = sa.insert(table).values(**data).returning(table.c.id)
-    cursor = self.execute(insertable)
-    identifier = cursor.one()[0]
+    try:
+        insertable = sa.insert(table).values(**data).returning(table.c.id)
+        cursor = self.execute(insertable)
+        identifier = cursor.one()[0]
+    except sa.exc.IntegrityError as e:
+        if not ignore.startswith("DuplicateKey"):
+            raise ValueError(f"Materialized view '{mview.table_schema}.{mview.table_name}' already exists") from e
+        # If ignore is set to "DuplicateKey", just get the existing ID
+        existing_view = self.get_by_table(TableAddress(schema=mview.table_schema, table=mview.table_name))
+        identifier = existing_view.id
    self.synchronize()
    return identifier

127-138: Add type annotations to improve code clarity

The query method returns a list of dictionaries, but the exact type of the dictionary values is not specified. Adding more specific type annotations would improve code clarity and IDE support.

-def query(self, statement) -> t.List[t.Dict]:
+def query(self, statement) -> t.List[t.Dict[str, t.Any]]:
    """
    Execute SQL statement, fetch result rows, and return them converted to dictionaries.

    TODO: Generalize, see `RetentionPolicyStore`.
    """
    cursor = self.execute(statement)
    rows = cursor.mappings().fetchall()
    records = [dict(row.items()) for row in rows]
    return records

102-116: Consider adding a method to drop the underlying materialized view table

Currently, delete only removes the entry from the management table, but doesn't drop the actual materialized view table. Consider adding this capability or documenting this limitation.

def delete(self, identifier: str) -> int:
    """
    Delete materialized view by identifier.
+   
+   Note: This only removes the entry from the management table,
+   it does not drop the actual materialized view table.

    TODO: Generalize, see `RetentionPolicyStore`.
    """
    table = self.table
    constraint = table.c.id == identifier
    deletable = sa.delete(table).where(constraint)
    result = self.execute(deletable)
    self.synchronize()
    if result.rowcount == 0:
        logger.warning(f"Materialized view not found with id: {identifier}")
    return result.rowcount

+def drop_table(self, table_address: TableAddress) -> None:
+    """
+    Drop the actual materialized view table.
+    
+    :param table_address: The schema and name of the table to drop
+    """
+    sql = f"DROP TABLE IF EXISTS {table_address.schema}.{table_address.table};"
+    self.database.run_sql(sql)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8876fb1 and c45f1af.

📒 Files selected for processing (12)
  • CHANGES.md (1 hunks)
  • cratedb_toolkit/materialized/core.py (1 hunks)
  • cratedb_toolkit/materialized/model.py (1 hunks)
  • cratedb_toolkit/materialized/schema.py (1 hunks)
  • cratedb_toolkit/materialized/schema.sql (1 hunks)
  • cratedb_toolkit/materialized/store.py (1 hunks)
  • examples/materialized_view.py (1 hunks)
  • tests/conftest.py (2 hunks)
  • tests/materialized/conftest.py (1 hunks)
  • tests/materialized/readme.md (1 hunks)
  • tests/materialized/test_core.py (1 hunks)
  • tests/materialized/test_examples.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
cratedb_toolkit/materialized/schema.py (3)
cratedb_toolkit/materialized/model.py (2)
  • MaterializedViewSettings (73-92)
  • to_dict (89-92)
tests/materialized/conftest.py (2)
  • database (25-29)
  • settings (12-21)
cratedb_toolkit/model.py (3)
  • safe (138-145)
  • dburi (103-107)
  • fullname (191-197)
🪛 LanguageTool
tests/materialized/readme.md

[uncategorized] ~9-~9: Possible missing preposition found.
Context: ... applied in scenarios with high traffic reads. ## Prior Art - https://github.com/nr...

(AI_HYDRA_LEO_MISSING_OF)


[grammar] ~11-~11: It appears that a preposition is missing after ‘Prior’.
Context: ... scenarios with high traffic reads. ## Prior Art - https://github.com/nroi/elfenbein - ...

(MISSING_PREPOSITION)

🪛 markdownlint-cli2 (0.17.2)
tests/materialized/readme.md

13-13: Bare URL used
null

(MD034, no-bare-urls)


14-14: Bare URL used
null

(MD034, no-bare-urls)


15-15: Bare URL used
null

(MD034, no-bare-urls)


16-16: Bare URL used
null

(MD034, no-bare-urls)


17-17: Bare URL used
null

(MD034, no-bare-urls)


18-18: Bare URL used
null

(MD034, no-bare-urls)


19-19: Bare URL used
null

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms (12)
  • GitHub Check: Generic: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
  • GitHub Check: Kinesis: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Kinesis: Python 3.9 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
  • GitHub Check: CFR: Python 3.12 on OS ubuntu-latest
  • GitHub Check: CFR for OS windows-latest
  • GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest
  • GitHub Check: CFR for OS ubuntu-latest
  • GitHub Check: CFR: Python 3.9 on OS ubuntu-latest
  • GitHub Check: CFR for OS macos-latest
  • GitHub Check: CFR for OS macos-13
🔇 Additional comments (15)
CHANGES.md (1)

13-13: LGTM: Clear and concise changelog entry

The new entry appropriately documents the addition of the materialized views infrastructure in the unreleased section, following the existing format.

tests/conftest.py (2)

1-1: LGTM: Updated copyright year range

Copyright year range has been properly updated to include 2025.


30-30: LGTM: Added materialized view table to reset list

Appropriately added the new "materialized_view" table to the RESET_TABLES list, ensuring it will be properly reset during tests.

examples/materialized_view.py (3)

68-69: LGTM - MaterializedViewManager instance creation

Good implementation of the manager instantiation. This shows clear separation of concerns between the example code and the toolkit functionality.


88-99: LGTM - MaterializedView configuration

The MaterializedView setup with clean SQL query definition using the dedent function is well implemented. This shows good use of the toolkit's model classes.


148-153: LGTM - Refresh method implementation

Good implementation of the refresh method, delegating the materialized view refresh to the manager. This is a clear example of how to use the toolkit's API.

cratedb_toolkit/materialized/model.py (3)

10-36: LGTM - MaterializedView dataclass definition

Well-structured dataclass with appropriate field definitions and metadata. The comment about synchronizing with the schema.sql table definition is helpful for maintainers.


64-70: LGTM - Default table address implementation

Good implementation of the default table address with environment variable support, allowing for flexibility in configuration.


72-88: LGTM - MaterializedViewSettings dataclass definition

Well-structured settings class with appropriate defaults and documentation. The use of default_factory with lambdas and function references is good practice.

tests/materialized/test_core.py (4)

28-33: LGTM - Test for undefined materialized view

Good test implementation for the case when a materialized view is not defined. The test properly asserts the expected exception and its message.


35-40: LGTM - Test for missing schema

Well-structured test for the case when the schema is missing. The use of re.escape for matching the error message is a good practice to handle special characters in the error message.


42-49: LGTM - Test for missing table

Good test case for the scenario when the source table is missing. The test properly creates the necessary schema and asserts the expected exception.


51-62: LGTM - Test for successful materialized view refresh

Comprehensive test for the successful refresh scenario, with proper setup of the source table and verification of the results. The test follows the Arrange-Act-Assert pattern effectively.

cratedb_toolkit/materialized/store.py (2)

18-39: LGTM - Class initialization is well-structured

The class initialization is well-structured with appropriate validation of required parameters and logging. The setup of both generic and SQLAlchemy database adapters is clear and appropriate.


1-15: LGTM - Imports and logger setup are appropriate

The imports and logger setup at the top of the file are appropriate and follow good practices.

@amotl amotl changed the title [WIP] Add baseline infrastructure for emulating materialized views Add baseline infrastructure for emulating materialized views May 7, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (2)
tests/materialized/readme.md (2)

11-20: 🛠️ Refactor suggestion

Format the "Prior Art" section with proper Markdown links and descriptions

The current section lists bare URLs without context. Add descriptions for each project to explain how they relate to this implementation and format them as proper Markdown links.

-## Prior Art
+## Prior Art
 
-https://github.com/nroi/elfenbein
-https://github.com/nroi/pg_materialized_views_refresh_topologically
-https://github.com/darkside/monocle
-https://github.com/maggregor/maggregor
-https://github.com/adamfoneil/ViewMaterializer
-https://github.com/jhollinger/activerecord-viewmatic
-https://github.com/q-m/metabase-matview
+The following projects have implemented similar concepts and served as inspiration:
+
+- [Elfenbein](https://github.com/nroi/elfenbein) - A PostgreSQL extension for creating and refreshing materialized views.
+- [pg_materialized_views_refresh_topologically](https://github.com/nroi/pg_materialized_views_refresh_topologically) - Refreshes PostgreSQL materialized views in topological order.
+- [Monocle](https://github.com/darkside/monocle) - A Ruby gem for creating and refreshing materialized views in PostgreSQL.
+- [maggregor](https://github.com/maggregor/maggregor) - A PostgreSQL extension for managing materialized views.
+- [ViewMaterializer](https://github.com/adamfoneil/ViewMaterializer) - A .NET library for materializing views in SQL Server.
+- [activerecord-viewmatic](https://github.com/jhollinger/activerecord-viewmatic) - A Ruby gem for working with materialized views in Rails.
+- [metabase-matview](https://github.com/q-m/metabase-matview) - A Metabase extension for working with materialized views.
🧰 Tools
🪛 LanguageTool

[grammar] ~11-~11: It appears that a preposition is missing after ‘Prior’.
Context: ... scenarios with high traffic reads. ## Prior Art - https://github.com/nroi/elfenbein - ...

(MISSING_PREPOSITION)

🪛 markdownlint-cli2 (0.17.2)

13-13: Bare URL used
null

(MD034, no-bare-urls)


14-14: Bare URL used
null

(MD034, no-bare-urls)


15-15: Bare URL used
null

(MD034, no-bare-urls)


16-16: Bare URL used
null

(MD034, no-bare-urls)


17-17: Bare URL used
null

(MD034, no-bare-urls)


18-18: Bare URL used
null

(MD034, no-bare-urls)


19-19: Bare URL used
null

(MD034, no-bare-urls)


1-20: 🛠️ Refactor suggestion

Add usage examples and documentation sections

The README lacks information on how to use the materialized views infrastructure. Add sections with usage examples, configuration options, and API documentation.

## Usage

### Basic Example

```python
from cratedb_toolkit.materialized import MaterializedViewManager
from cratedb_toolkit.materialized.model import MaterializedView
from cratedb_toolkit.model import TableAddress

# Initialize the manager
manager = MaterializedViewManager(settings)

# Define a materialized view
mview = MaterializedView(
    name="daily_sales_summary",
    table_schema="analytics",
    table_name="daily_sales",
    query="SELECT date, sum(amount) as total FROM sales GROUP BY date",
    refresh_interval=3600,  # Refresh every hour
)

# Create the materialized view
manager.create(mview)

# Refresh the materialized view
manager.refresh(TableAddress(schema="analytics", table="daily_sales"))

Configuration

Describe the configuration options for the materialized views infrastructure.

API Reference

Document the key classes and methods in the materialized views infrastructure.

Development

Instructions for contributing to and testing the materialized views infrastructure.


<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 LanguageTool</summary>

[uncategorized] ~9-~9: Possible missing preposition found.
Context: ... applied in scenarios with high traffic reads.  ## Prior Art  - https://github.com/nr...

(AI_HYDRA_LEO_MISSING_OF)

---

[grammar] ~11-~11: It appears that a preposition is missing after ‘Prior’.
Context: ... scenarios with high traffic reads.  ## Prior Art  - https://github.com/nroi/elfenbein - ...

(MISSING_PREPOSITION)

</details>
<details>
<summary>🪛 markdownlint-cli2 (0.17.2)</summary>

13-13: Bare URL used
null

(MD034, no-bare-urls)

---

14-14: Bare URL used
null

(MD034, no-bare-urls)

---

15-15: Bare URL used
null

(MD034, no-bare-urls)

---

16-16: Bare URL used
null

(MD034, no-bare-urls)

---

17-17: Bare URL used
null

(MD034, no-bare-urls)

---

18-18: Bare URL used
null

(MD034, no-bare-urls)

---

19-19: Bare URL used
null

(MD034, no-bare-urls)

</details>

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>♻️ Duplicate comments (2)</summary><blockquote>

<details>
<summary>cratedb_toolkit/materialized/store.py (2)</summary><blockquote>

`61-67`: **Implement centralized UUID TypeDecorator for CrateDB**

Manual string conversion of UUIDs creates inconsistency in type handling throughout the codebase. The TODO comment indicates this is a known issue.

---

`153-158`: **REFRESH TABLE doesn't support a timeout parameter—implement timeout at the application level**

CrateDB's `REFRESH TABLE` command has no built-in timeout option and long-running refreshes on large tables can block your application.

</blockquote></details>

</blockquote></details>

<details>
<summary>🧹 Nitpick comments (6)</summary><blockquote>

<details>
<summary>cratedb_toolkit/materialized/store.py (6)</summary><blockquote>

`18-21`: **Enhance class documentation with more details**

The current docstring is too brief for a central component. Consider adding more comprehensive documentation that explains the purpose, responsibilities, and usage patterns of this class. Include information about table structure, interaction with other components, and the emulated materialized view concept.

```diff
 class MaterializedViewStore:
     """
-    A wrapper around the materialized view management table.
+    A wrapper around the materialized view management table that handles
+    persistence and retrieval of materialized view metadata.
+    
+    This class provides CRUD operations for managing materialized view records,
+    including the ability to create, retrieve, and delete view definitions,
+    as well as checking for their existence and synchronizing the management table.
+    
+    It serves as the persistence layer for the materialized view infrastructure,
+    working with the MaterializedViewManager to provide emulated materialized views.
     """

40-45: Docstring should clarify the return value type

The method documentation should specify that the returned identifier is a string UUID.

 def create(self, mview: MaterializedView, ignore: t.Optional[str] = None):
     """
-    Create a new materialized view, and return its identifier.
+    Create a new materialized view, and return its identifier as a string UUID.
 
     TODO: Generalize, see `RetentionPolicyStore`.
     """

23-32: Improve error message with more context

The error message when table name is missing could be more specific to help with troubleshooting.

 def __init__(self, settings: MaterializedViewSettings):
     self.settings = settings

     if self.settings.materialized_table.table is None:
-        raise ValueError("Unable to create MaterializedViewStore without table name")
+        raise ValueError(f"Unable to create MaterializedViewStore without table name for schema '{self.settings.materialized_table.schema}'")

     logger.info(
         f"Connecting to database {self.settings.database.safe}, table {self.settings.materialized_table.fullname}"
     )

93-100: Consider adding a more descriptive log message before querying

The current log message is somewhat generic. Consider adding more detail about what is being retrieved to make debugging easier.

     logger.info(f"View definition DQL: {selectable}")
+    logger.debug(f"Retrieving materialized view definition for table {table_address.schema}.{table_address.table}")
     try:
         record = self.query(selectable)[0]
     except IndexError as ex:
         raise KeyError(
             f"Synthetic materialized table definition does not exist: {table_address.schema}.{table_address.table}"
         ) from ex
     mview = MaterializedView.from_record(record)
     return mview

102-116: Return more meaningful information from delete method

The delete method returns only the row count. Consider enhancing it to return a more informative result, such as a boolean success indicator or the deleted entity data.

-def delete(self, identifier: str) -> int:
+def delete(self, identifier: str) -> bool:
     """
     Delete materialized view by identifier.
+
+    Returns:
+        bool: True if the materialized view was found and deleted, False otherwise.
 
     TODO: Generalize, see `RetentionPolicyStore`.
     """
     table = self.table
     constraint = table.c.id == identifier
     deletable = sa.delete(table).where(constraint)
     result = self.execute(deletable)
     self.synchronize()
     if result.rowcount == 0:
         logger.warning(f"Materialized view not found with id: {identifier}")
-    return result.rowcount
+    return result.rowcount > 0

1-159: Consider adding capability to get materialized view by ID

The class includes methods to create, retrieve all, get by table, and delete materialized views, but it's missing a method to retrieve a specific view by its ID. This would be useful for operations that have the ID but not the table information.

def get_by_id(self, identifier: str) -> MaterializedView:
    """
    Retrieve a materialized view by its identifier.
    
    Args:
        identifier: The UUID of the materialized view to retrieve.
        
    Returns:
        MaterializedView: The materialized view object.
        
    Raises:
        KeyError: If no materialized view with the given identifier exists.
    """
    table = self.table
    selectable = sa.select(table).where(table.c.id == identifier)
    logger.debug(f"Retrieving materialized view with id: {identifier}")
    try:
        record = self.query(selectable)[0]
    except IndexError as ex:
        raise KeyError(f"Materialized view not found with id: {identifier}") from ex
    mview = MaterializedView.from_record(record)
    return mview
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c45f1af and b92c027.

📒 Files selected for processing (12)
  • CHANGES.md (1 hunks)
  • cratedb_toolkit/materialized/core.py (1 hunks)
  • cratedb_toolkit/materialized/model.py (1 hunks)
  • cratedb_toolkit/materialized/schema.py (1 hunks)
  • cratedb_toolkit/materialized/schema.sql (1 hunks)
  • cratedb_toolkit/materialized/store.py (1 hunks)
  • examples/materialized_view.py (1 hunks)
  • tests/conftest.py (2 hunks)
  • tests/materialized/conftest.py (1 hunks)
  • tests/materialized/readme.md (1 hunks)
  • tests/materialized/test_core.py (1 hunks)
  • tests/materialized/test_examples.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • CHANGES.md
🚧 Files skipped from review as they are similar to previous changes (9)
  • tests/conftest.py
  • tests/materialized/test_examples.py
  • cratedb_toolkit/materialized/core.py
  • cratedb_toolkit/materialized/schema.sql
  • cratedb_toolkit/materialized/schema.py
  • tests/materialized/conftest.py
  • examples/materialized_view.py
  • cratedb_toolkit/materialized/model.py
  • tests/materialized/test_core.py
🧰 Additional context used
🪛 LanguageTool
tests/materialized/readme.md

[uncategorized] ~9-~9: Possible missing preposition found.
Context: ... applied in scenarios with high traffic reads. ## Prior Art - https://github.com/nr...

(AI_HYDRA_LEO_MISSING_OF)


[grammar] ~11-~11: It appears that a preposition is missing after ‘Prior’.
Context: ... scenarios with high traffic reads. ## Prior Art - https://github.com/nroi/elfenbein - ...

(MISSING_PREPOSITION)

🪛 markdownlint-cli2 (0.17.2)
tests/materialized/readme.md

13-13: Bare URL used
null

(MD034, no-bare-urls)


14-14: Bare URL used
null

(MD034, no-bare-urls)


15-15: Bare URL used
null

(MD034, no-bare-urls)


16-16: Bare URL used
null

(MD034, no-bare-urls)


17-17: Bare URL used
null

(MD034, no-bare-urls)


18-18: Bare URL used
null

(MD034, no-bare-urls)


19-19: Bare URL used
null

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms (12)
  • GitHub Check: Kinesis: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Kinesis: Python 3.9 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
  • GitHub Check: CFR: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
  • GitHub Check: CFR: Python 3.9 on OS ubuntu-latest
  • GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest
  • GitHub Check: CFR for OS windows-latest
  • GitHub Check: CFR for OS ubuntu-latest
  • GitHub Check: CFR for OS macos-latest
  • GitHub Check: CFR for OS macos-13

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (2)
cratedb_toolkit/materialized/core.py (1)

1-67: 🛠️ Refactor suggestion

Parameterize SQL statements to avoid injection risks

The SQL statements are constructed by directly embedding values from the materialized view object.
Consider parameterizing the statements where possible to mitigate SQL injection risks.

-sql_ddl = f"CREATE TABLE IF NOT EXISTS {mview.staging_table_fullname} AS (\n{mview.sql}\n)"
+# For statements like CREATE TABLE AS, parameterization may not be possible
+# Consider validating mview.sql to ensure it doesn't contain malicious SQL
+# For example, ensure it only contains SELECT statements
+if not mview.sql.strip().upper().startswith("SELECT"):
+    raise ValueError("Materialized view SQL must be a SELECT statement")
+sql_ddl = f"CREATE TABLE IF NOT EXISTS {mview.staging_table_fullname} AS (\n{mview.sql}\n)"
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 21-21: cratedb_toolkit/materialized/core.py#L21
Added line #L21 was not covered by tests


[warning] 24-24: cratedb_toolkit/materialized/core.py#L24
Added line #L24 was not covered by tests


[warning] 30-30: cratedb_toolkit/materialized/core.py#L30
Added line #L30 was not covered by tests


[warning] 32-35: cratedb_toolkit/materialized/core.py#L32-L35
Added lines #L32 - L35 were not covered by tests


[warning] 37-39: cratedb_toolkit/materialized/core.py#L37-L39
Added lines #L37 - L39 were not covered by tests


[warning] 41-45: cratedb_toolkit/materialized/core.py#L41-L45
Added lines #L41 - L45 were not covered by tests


[warning] 47-49: cratedb_toolkit/materialized/core.py#L47-L49
Added lines #L47 - L49 were not covered by tests


[warning] 58-62: cratedb_toolkit/materialized/core.py#L58-L62
Added lines #L58 - L62 were not covered by tests


[warning] 64-66: cratedb_toolkit/materialized/core.py#L64-L66
Added lines #L64 - L66 were not covered by tests

cratedb_toolkit/materialized/model.py (1)

10-93: 💡 Verification agent

🧩 Analysis chain

Add test coverage for model classes

The static analysis indicates a lack of test coverage for this critical foundation.


🏁 Script executed:

#!/bin/bash
# Check for existing tests of the MaterializedView and MaterializedViewSettings classes
rg -p "def test.*MaterializedView" --type py "tests/"

Length of output: 53


🏁 Script executed:

#!/bin/bash
# Search for any tests referencing MaterializedView or MaterializedViewSettings
rg -n "MaterializedView" tests/
rg -n "MaterializedViewSettings" tests/

Length of output: 1491


🏁 Script executed:

#!/bin/bash
# Check for direct tests covering MaterializedView and MaterializedViewSettings methods
rg -n "table_fullname" tests/
rg -n "staging_table_fullname" tests/
rg -n "from_record" tests/
rg -n "to_storage_dict" tests/
rg -n "to_dict" tests/
rg -n "default_table_address" tests/

Length of output: 790


Add missing unit tests for MaterializedView and MaterializedViewSettings

We have existing smoke tests that instantiate these classes, but no assertions against their core behaviors. Please add targeted tests in tests/materialized/ covering:

  • MaterializedView.table_fullname and staging_table_fullname
    • Verify correct quoting with custom or default schema.
  • MaterializedView.from_record()
    • Round-trip instantiation from a dict.
  • MaterializedView.to_storage_dict()
    • Serializes all fields, respects an overridden id.
  • default_table_address()
    • Respects CRATEDB_EXT_SCHEMA; falls back to "ext".
  • MaterializedViewSettings.to_dict()
    • Returns a dict and embeds the materialized_table as a TableAddress object.

File under test:
• cratedb_toolkit/materialized/model.py

Ensure each of the above behaviors has at least one focused unit test.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 39-39: cratedb_toolkit/materialized/model.py#L39
Added line #L39 was not covered by tests


[warning] 43-43: cratedb_toolkit/materialized/model.py#L43
Added line #L43 was not covered by tests


[warning] 47-47: cratedb_toolkit/materialized/model.py#L47
Added line #L47 was not covered by tests


[warning] 55-55: cratedb_toolkit/materialized/model.py#L55
Added line #L55 was not covered by tests


[warning] 58-59: cratedb_toolkit/materialized/model.py#L58-L59
Added lines #L58 - L59 were not covered by tests


[warning] 61-61: cratedb_toolkit/materialized/model.py#L61
Added line #L61 was not covered by tests


[warning] 68-69: cratedb_toolkit/materialized/model.py#L68-L69
Added lines #L68 - L69 were not covered by tests


[warning] 90-92: cratedb_toolkit/materialized/model.py#L90-L92
Added lines #L90 - L92 were not covered by tests

♻️ Duplicate comments (10)
tests/materialized/readme.md (1)

5-9: Expand and clarify the "About" section
The introduction is concise but could offer more context on how the subsystem operates, its refresh strategies, and typical use cases.

Suggested revision:

-Basic subsystem to materialize SQL statements in CrateDB. 
-The subsystem emulates materialized views so that queries that take a long
-time to run can be cached, which is specifically useful when applied in
-scenarios with high traffic reads.
+This subsystem provides a foundation for emulating materialized views in CrateDB by materializing SQL queries into precomputed tables.
+It supports on-demand and scheduled refresh strategies, improving performance for complex or resource-intensive queries.
+By caching the results of long-running operations, it reduces database load and accelerates response times in high-read-traffic scenarios.
+Typical use cases include dashboards, analytics workflows, and reporting pipelines where query latency is critical.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~9-~9: Possible missing preposition found.
Context: ... applied in scenarios with high traffic reads. ## Prior Art - https://github.com/nr...

(AI_HYDRA_LEO_MISSING_OF)

cratedb_toolkit/materialized/schema.py (1)

19-22: settings.database.safe is a method – missing call operator

The log will show a bound method object instead of the redacted URI:

-        f"database '{settings.database.safe}', table {settings.materialized_table}"
+        f"database '{settings.database.safe()}', table {settings.materialized_table}"
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 19-19: cratedb_toolkit/materialized/schema.py#L19
Added line #L19 was not covered by tests

cratedb_toolkit/materialized/core.py (1)

32-34: Parsing schema.table via simple split(".") is brittle

"schema.table".split(".") fails when:

  • The view name is quoted ("my.schema"."my.table"),
  • Either part itself contains a dot,
  • No dot is given (user passes just the table name).

Prefer using the existing TableAddress.from_string helper:

-table_schema, table_name = name.split(".")
-table_address = TableAddress(schema=table_schema, table=table_name)
+try:
+    table_address = TableAddress.from_string(name)
+except ValueError as ex:
+    raise ValueError(
+        f"Materialized view name must be in <schema>.<table> form, got '{name}'"
+    ) from ex
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 32-35: cratedb_toolkit/materialized/core.py#L32-L35
Added lines #L32 - L35 were not covered by tests

cratedb_toolkit/materialized/store.py (7)

30-32: settings.database.safe is a method – missing call operator

The log will show a bound method object instead of the redacted URI:

-            f"Connecting to database {self.settings.database.safe}, table {self.settings.materialized_table.fullname}"
+            f"Connecting to database {self.settings.database.safe()}, table {self.settings.materialized_table.fullname}"
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 30-30: cratedb_toolkit/materialized/store.py#L30
Added line #L30 was not covered by tests


48-56: Implement target table existence check

Add a check in create to verify the target table before proceeding:

 # Sanity checks.
 if mview.table_schema is None:
     raise ValueError("Table schema needs to be defined")
 if mview.table_name is None:
     raise ValueError("Table name needs to be defined")
+
+# Verify that the target table actually exists.
+target = mview.table_fullname
+if not self.database.table_exists(target):
+    raise ValueError(f"Target table '{target}' does not exist")
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 50-50: cratedb_toolkit/materialized/store.py#L50
Added line #L50 was not covered by tests


61-68: Implement centralized UUID TypeDecorator

The comment # TODO: Add UUID as converter to CrateDB driver? indicates a need for proper UUID handling:

  1. Create a shared GUID TypeDecorator in a new module, e.g. cratedb_toolkit/types.py
  2. Update table definitions to use this type
  3. Use uuid.uuid4() directly instead of string conversion
# Example implementation for cratedb_toolkit/types.py
import uuid
from sqlalchemy.types import TypeDecorator, String

class GUID(TypeDecorator):
    """Platform-independent GUID type."""
    impl = String
    cache_ok = True

    def process_bind_param(self, value, dialect):
        if value is None:
            return None
        elif dialect.name == 'postgresql':
            return str(value)
        else:
            if not isinstance(value, uuid.UUID):
                return str(uuid.UUID(value))
            else:
                return str(value)

    def process_result_value(self, value, dialect):
        if value is None:
            return value
        else:
            if not isinstance(value, uuid.UUID):
                return uuid.UUID(value)
            return value
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 61-61: cratedb_toolkit/materialized/store.py#L61
Added line #L61 was not covered by tests


84-87: Fix incorrect docstring in get_by_table method

The docstring appears to be copied from another method:

 def get_by_table(self, table_address: TableAddress) -> MaterializedView:
     """
-    Retrieve effective policies to process, by strategy and tags.
+    Retrieve a materialized view by its table address (schema and table name).
+    
+    Raises KeyError if the materialized view doesn't exist.
     """

118-128: Add error handling in execute method

The current implementation doesn't handle database exceptions:

 def execute(self, statement) -> CursorResult:
     """
     Execute SQL statement, and return result object.
 
     TODO: Generalize, see `RetentionPolicyStore`.
     """
     with Session(self.database.engine) as session:
-        result = session.execute(statement)
-        session.commit()
-        return result
+        try:
+            result = session.execute(statement)
+            session.commit()
+            return result
+        except sa.exc.SQLAlchemyError as e:
+            session.rollback()
+            logger.error(f"Database error executing statement: {e}")
+            raise

140-145: Fix incorrect docstring in exists method

The docstring appears to be copied from another method:

 def exists(self, mview: MaterializedView):
     """
-    Check if retention policy for specific table already exists.
+    Check if a materialized view for the specific table already exists.
 
     TODO: Generalize, see `RetentionPolicyStore`.
     """

154-159: Add timeout handling for REFRESH TABLE command

The REFRESH TABLE command doesn't support a timeout parameter natively:

 def synchronize(self):
     """
     Synchronize data by issuing `REFRESH TABLE` statement.
     """
     sql = f"REFRESH TABLE {self.settings.materialized_table.fullname};"
-    self.database.run_sql(sql)
+    import concurrent.futures
+    with concurrent.futures.ThreadPoolExecutor() as executor:
+        future = executor.submit(self.database.run_sql, sql)
+        try:
+            # Set an appropriate timeout value (in seconds)
+            future.result(timeout=30)
+        except concurrent.futures.TimeoutError:
+            logger.warning(f"REFRESH TABLE operation timed out after 30 seconds")
+            # Cannot cancel the operation easily, but we can at least not block indefinitely
+            raise TimeoutError("REFRESH TABLE operation timed out")
🧹 Nitpick comments (5)
tests/materialized/readme.md (1)

13-19: Convert bare URLs to Markdown links
Replace bare URLs with descriptive link text to improve readability and comply with markdownlint rules.

Proposed diff:

 ## Prior Art
-
-- https://github.com/nroi/elfenbein
-- https://github.com/nroi/pg_materialized_views_refresh_topologically
-- https://github.com/darkside/monocle
-- https://github.com/maggregor/maggregor
-- https://github.com/adamfoneil/ViewMaterializer
-- https://github.com/jhollinger/activerecord-viewmatic
-- https://github.com/q-m/metabase-matview
+
+- [elfenbein](https://github.com/nroi/elfenbein)
+- [pg_materialized_views_refresh_topologically](https://github.com/nroi/pg_materialized_views_refresh_topologically)
+- [monocle](https://github.com/darkside/monocle)
+- [maggregor](https://github.com/maggregor/maggregor)
+- [ViewMaterializer](https://github.com/adamfoneil/ViewMaterializer)
+- [activerecord-viewmatic](https://github.com/jhollinger/activerecord-viewmatic)
+- [metabase-matview](https://github.com/q-m/metabase-matview)
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

13-13: Bare URL used
null

(MD034, no-bare-urls)


14-14: Bare URL used
null

(MD034, no-bare-urls)


15-15: Bare URL used
null

(MD034, no-bare-urls)


16-16: Bare URL used
null

(MD034, no-bare-urls)


17-17: Bare URL used
null

(MD034, no-bare-urls)


18-18: Bare URL used
null

(MD034, no-bare-urls)


19-19: Bare URL used
null

(MD034, no-bare-urls)

cratedb_toolkit/materialized/core.py (1)

23-24: Incorrect comment refers to "Retention policy"

The comment should refer to "Materialized view store API" instead.

-        # Retention policy store API.
+        # Materialized view store API.
         self.store = MaterializedViewStore(settings=self.settings)
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 24-24: cratedb_toolkit/materialized/core.py#L24
Added line #L24 was not covered by tests

cratedb_toolkit/materialized/model.py (2)

45-47: Add type hinting for record parameter

Add proper type hints to improve code clarity and enable better IDE support:

 @classmethod
-def from_record(cls, record) -> "MaterializedView":
+def from_record(cls, record: t.Dict[str, t.Any]) -> "MaterializedView":
     return cls(**record)
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 47-47: cratedb_toolkit/materialized/model.py#L47
Added line #L47 was not covered by tests


89-92: Fix to_dict method redundant materialized_table assignment

The dataclasses.asdict already includes materialized_table in the dictionary, making the explicit assignment redundant and potentially confusing:

 def to_dict(self):
+    """
+    Convert settings to a dictionary suitable for template formatting.
+    
+    Returns:
+        dict: Dictionary representation of settings.
+    """
     data = dataclasses.asdict(self)
-    data["materialized_table"] = self.materialized_table
     return data
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 90-92: cratedb_toolkit/materialized/model.py#L90-L92
Added lines #L90 - L92 were not covered by tests

cratedb_toolkit/materialized/store.py (1)

71-82: Add pagination to retrieve method

For large tables, retrieving all records at once can be inefficient:

-def retrieve(self):
+def retrieve(self, limit: int = 1000, offset: int = 0):
     """
     Retrieve all records from database table.
+    
+    Args:
+        limit: Maximum number of records to retrieve
+        offset: Number of records to skip
 
     TODO: Add filtering capabilities.
     TODO: Generalize, see `RetentionPolicyStore`.
     """
 
     # Run SELECT statement, and return result.
-    selectable = sa.select(self.table)
+    selectable = sa.select(self.table).limit(limit).offset(offset)
     records = self.query(selectable)
     return records
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b92c027 and 6b5677e.

📒 Files selected for processing (12)
  • CHANGES.md (1 hunks)
  • cratedb_toolkit/materialized/core.py (1 hunks)
  • cratedb_toolkit/materialized/model.py (1 hunks)
  • cratedb_toolkit/materialized/schema.py (1 hunks)
  • cratedb_toolkit/materialized/schema.sql (1 hunks)
  • cratedb_toolkit/materialized/store.py (1 hunks)
  • examples/materialized_view.py (1 hunks)
  • tests/conftest.py (2 hunks)
  • tests/materialized/conftest.py (1 hunks)
  • tests/materialized/readme.md (1 hunks)
  • tests/materialized/test_core.py (1 hunks)
  • tests/materialized/test_examples.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • examples/materialized_view.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • CHANGES.md
  • tests/materialized/test_examples.py
  • tests/conftest.py
  • cratedb_toolkit/materialized/schema.sql
  • tests/materialized/conftest.py
  • tests/materialized/test_core.py
🧰 Additional context used
🧬 Code Graph Analysis (3)
cratedb_toolkit/materialized/core.py (3)
cratedb_toolkit/materialized/model.py (3)
  • MaterializedViewSettings (73-92)
  • staging_table_fullname (42-43)
  • table_fullname (38-39)
cratedb_toolkit/materialized/store.py (3)
  • MaterializedViewStore (19-159)
  • get_by_table (84-101)
  • execute (118-127)
cratedb_toolkit/model.py (2)
  • TableAddress (182-201)
  • schema (174-178)
cratedb_toolkit/materialized/model.py (1)
cratedb_toolkit/model.py (3)
  • DatabaseAddress (55-178)
  • TableAddress (182-201)
  • schema (174-178)
cratedb_toolkit/materialized/schema.py (3)
cratedb_toolkit/materialized/model.py (2)
  • MaterializedViewSettings (73-92)
  • to_dict (89-92)
tests/materialized/conftest.py (2)
  • database (25-29)
  • settings (12-21)
cratedb_toolkit/model.py (3)
  • safe (138-145)
  • dburi (103-107)
  • fullname (191-197)
🪛 GitHub Check: codecov/patch
cratedb_toolkit/materialized/store.py

[warning] 25-25: cratedb_toolkit/materialized/store.py#L25
Added line #L25 was not covered by tests


[warning] 27-28: cratedb_toolkit/materialized/store.py#L27-L28
Added lines #L27 - L28 were not covered by tests


[warning] 30-30: cratedb_toolkit/materialized/store.py#L30
Added line #L30 was not covered by tests


[warning] 35-35: cratedb_toolkit/materialized/store.py#L35
Added line #L35 was not covered by tests


[warning] 38-39: cratedb_toolkit/materialized/store.py#L38-L39
Added lines #L38 - L39 were not covered by tests


[warning] 50-50: cratedb_toolkit/materialized/store.py#L50
Added line #L50 was not covered by tests


[warning] 53-59: cratedb_toolkit/materialized/store.py#L53-L59
Added lines #L53 - L59 were not covered by tests


[warning] 61-61: cratedb_toolkit/materialized/store.py#L61
Added line #L61 was not covered by tests

cratedb_toolkit/materialized/core.py

[warning] 21-21: cratedb_toolkit/materialized/core.py#L21
Added line #L21 was not covered by tests


[warning] 24-24: cratedb_toolkit/materialized/core.py#L24
Added line #L24 was not covered by tests


[warning] 30-30: cratedb_toolkit/materialized/core.py#L30
Added line #L30 was not covered by tests


[warning] 32-35: cratedb_toolkit/materialized/core.py#L32-L35
Added lines #L32 - L35 were not covered by tests


[warning] 37-39: cratedb_toolkit/materialized/core.py#L37-L39
Added lines #L37 - L39 were not covered by tests


[warning] 41-45: cratedb_toolkit/materialized/core.py#L41-L45
Added lines #L41 - L45 were not covered by tests


[warning] 47-49: cratedb_toolkit/materialized/core.py#L47-L49
Added lines #L47 - L49 were not covered by tests


[warning] 58-62: cratedb_toolkit/materialized/core.py#L58-L62
Added lines #L58 - L62 were not covered by tests


[warning] 64-66: cratedb_toolkit/materialized/core.py#L64-L66
Added lines #L64 - L66 were not covered by tests

cratedb_toolkit/materialized/model.py

[warning] 39-39: cratedb_toolkit/materialized/model.py#L39
Added line #L39 was not covered by tests


[warning] 43-43: cratedb_toolkit/materialized/model.py#L43
Added line #L43 was not covered by tests


[warning] 47-47: cratedb_toolkit/materialized/model.py#L47
Added line #L47 was not covered by tests


[warning] 55-55: cratedb_toolkit/materialized/model.py#L55
Added line #L55 was not covered by tests


[warning] 58-59: cratedb_toolkit/materialized/model.py#L58-L59
Added lines #L58 - L59 were not covered by tests


[warning] 61-61: cratedb_toolkit/materialized/model.py#L61
Added line #L61 was not covered by tests


[warning] 68-69: cratedb_toolkit/materialized/model.py#L68-L69
Added lines #L68 - L69 were not covered by tests


[warning] 90-92: cratedb_toolkit/materialized/model.py#L90-L92
Added lines #L90 - L92 were not covered by tests

cratedb_toolkit/materialized/schema.py

[warning] 19-19: cratedb_toolkit/materialized/schema.py#L19
Added line #L19 was not covered by tests


[warning] 25-25: cratedb_toolkit/materialized/schema.py#L25
Added line #L25 was not covered by tests


[warning] 27-28: cratedb_toolkit/materialized/schema.py#L27-L28
Added lines #L27 - L28 were not covered by tests


[warning] 30-32: cratedb_toolkit/materialized/schema.py#L30-L32
Added lines #L30 - L32 were not covered by tests


[warning] 35-36: cratedb_toolkit/materialized/schema.py#L35-L36
Added lines #L35 - L36 were not covered by tests

🪛 LanguageTool
tests/materialized/readme.md

[uncategorized] ~9-~9: Possible missing preposition found.
Context: ... applied in scenarios with high traffic reads. ## Prior Art - https://github.com/nr...

(AI_HYDRA_LEO_MISSING_OF)


[grammar] ~11-~11: It appears that a preposition is missing after ‘Prior’.
Context: ... scenarios with high traffic reads. ## Prior Art - https://github.com/nroi/elfenbein - ...

(MISSING_PREPOSITION)

🪛 markdownlint-cli2 (0.17.2)
tests/materialized/readme.md

13-13: Bare URL used
null

(MD034, no-bare-urls)


14-14: Bare URL used
null

(MD034, no-bare-urls)


15-15: Bare URL used
null

(MD034, no-bare-urls)


16-16: Bare URL used
null

(MD034, no-bare-urls)


17-17: Bare URL used
null

(MD034, no-bare-urls)


18-18: Bare URL used
null

(MD034, no-bare-urls)


19-19: Bare URL used
null

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: build-and-publish
  • GitHub Check: Generic: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
  • GitHub Check: Kinesis: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
  • GitHub Check: Kinesis: Python 3.9 on OS ubuntu-latest
  • GitHub Check: CFR for OS windows-latest
  • GitHub Check: CFR for OS ubuntu-latest
  • GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest
  • GitHub Check: CFR for OS macos-13
🔇 Additional comments (2)
cratedb_toolkit/materialized/schema.py (1)

12-37:

❌ Incorrect review comment

Add test coverage for schema setup function

The static analysis indicates a lack of test coverage for this critical functionality.


🏁 Script executed:

#!/bin/bash
# Check for existing tests of the setup_schema function
rg -p "setup_schema" --type py "tests/"

Length of output: 304


🏁 Script executed:

#!/bin/bash
# Show context around setup_schema usage in materialized tests
rg -n -A3 -B3 'setup_schema' tests/materialized/conftest.py

Length of output: 704


Remove outdated test coverage request
Tests already invoke setup_schema in both materialized and retention fixtures:

  • tests/materialized/conftest.py: lines 38–40 call setup_schema(settings)
  • tests/retention/conftest.py: similarly covers the retention schema setup

No additional test is needed for the schema setup function at this time.

Likely an incorrect or invalid review comment.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 19-19: cratedb_toolkit/materialized/schema.py#L19
Added line #L19 was not covered by tests


[warning] 25-25: cratedb_toolkit/materialized/schema.py#L25
Added line #L25 was not covered by tests


[warning] 27-28: cratedb_toolkit/materialized/schema.py#L27-L28
Added lines #L27 - L28 were not covered by tests


[warning] 30-32: cratedb_toolkit/materialized/schema.py#L30-L32
Added lines #L30 - L32 were not covered by tests


[warning] 35-36: cratedb_toolkit/materialized/schema.py#L35-L36
Added lines #L35 - L36 were not covered by tests

cratedb_toolkit/materialized/core.py (1)

26-67:

❌ Incorrect review comment

Add test coverage for refresh function

The static analysis indicates a lack of test coverage for this critical functionality.


🏁 Script executed:

#!/bin/bash
# Check for existing tests of the refresh function
rg -p "def test.*refresh" --type py "tests/"

Length of output: 44


🏁 Script executed:

#!/bin/bash
# Search for any usage of the refresh method or calls in the tests directory
rg -n "refresh(" --type py tests/

Length of output: 102


🏁 Script executed:

#!/bin/bash
# First, search using ripgrep with fixed string to avoid regex errors
rg -F "refresh(" -n --type py tests/ || echo "No matches found with rg"

# Fallback to grep in case ripgrep misses files
grep -R "refresh(" -n tests/ || echo "No matches found with grep"

Length of output: 853


Tests for refresh already exist

I found multiple invocations of mvm.refresh(...) in tests/materialized/test_core.py (lines 31, 39, 48, 59), covering the core refresh logic. The original concern about missing test coverage can be safely ignored.

Likely an incorrect or invalid review comment.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 30-30: cratedb_toolkit/materialized/core.py#L30
Added line #L30 was not covered by tests


[warning] 32-35: cratedb_toolkit/materialized/core.py#L32-L35
Added lines #L32 - L35 were not covered by tests


[warning] 37-39: cratedb_toolkit/materialized/core.py#L37-L39
Added lines #L37 - L39 were not covered by tests


[warning] 41-45: cratedb_toolkit/materialized/core.py#L41-L45
Added lines #L41 - L45 were not covered by tests


[warning] 47-49: cratedb_toolkit/materialized/core.py#L47-L49
Added lines #L47 - L49 were not covered by tests


[warning] 58-62: cratedb_toolkit/materialized/core.py#L58-L62
Added lines #L58 - L62 were not covered by tests


[warning] 64-66: cratedb_toolkit/materialized/core.py#L64-L66
Added lines #L64 - L66 were not covered by tests

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
cratedb_toolkit/materialized/store.py (3)

61-63: Improve UUID handling

Consider implementing a proper conversion method for UUID values rather than manually converting to strings.

Based on your team's feedback in previous reviews, this UUID conversion approach is acceptable for now, but a centralized UUID TypeDecorator could be implemented later to improve type safety and eliminate boilerplate.


119-126: Add error handling in execute method

The execute method lacks error handling for database exceptions. Consider adding try-except blocks to handle common database errors gracefully.

     def execute(self, statement) -> CursorResult:
         """
         Execute SQL statement, and return result object.
         """
         with Session(self.database.engine) as session:
-            result = session.execute(statement)
-            session.commit()
-            return result
+            try:
+                result = session.execute(statement)
+                session.commit()
+                return result
+            except sa.exc.SQLAlchemyError as e:
+                session.rollback()
+                logger.error(f"Database error executing statement: {e}")
+                raise

48-48: 🛠️ Refactor suggestion

Implement target table existence check

The TODO comment suggests a missing validation. Consider implementing the check to verify the target table exists before creating a materialized view for it.

         # TODO: Sanity check, whether target table already exists?
+        # Verify that the target table actually exists.
+        target = f"{mview.table_schema}.{mview.table_name}"
+        if not self.database.table_exists(target):
+            raise ValueError(f"Target table '{target}' does not exist")
🧹 Nitpick comments (5)
tests/materialized/test_examples.py (1)

1-12: LGTM: Well-structured test for the materialized view example

The test correctly verifies that the example program works by running the main function and checking that it creates and populates the materialized view with the expected number of records. Good use of the fixture for database connectivity.

I would suggest two optional improvements:

  1. Consider adding cleanup code to drop the created view after the test
  2. Verify not just the count but also some attributes of the created view
cratedb_toolkit/materialized/readme.md (4)

12-13: Fix grammatical issue in sentence structure

The phrase "applied in scenarios with high traffic reads" should include "high-traffic" as a compound adjective.

-time to run can be cached, which is specifically useful when applied in
-scenarios with high traffic reads. This approach can significantly reduce
+time to run can be cached, which is specifically useful when applied in
+scenarios with high-traffic reads. This approach can significantly reduce
🧰 Tools
🪛 LanguageTool

[uncategorized] ~12-~12: Possible missing preposition found.
Context: ... applied in scenarios with high traffic reads. This approach can significantly reduce...

(AI_HYDRA_LEO_MISSING_OF)


18-19: Add hyphen to compound adjective

For the compound adjective "on demand", consider adding a hyphen when it's used to modify the following noun.

-- Refresh materialized views on demand or on schedule
+- Refresh materialized views on-demand or on schedule
🧰 Tools
🪛 LanguageTool

[uncategorized] ~18-~18: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...efinitions - Refresh materialized views on demand or on schedule - Track metadata about m...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


22-23: Fix heading grammar

The heading "Prior Art" is missing a preposition. Consider using "Related Projects" or "Prior Art References" instead.

-## Prior Art
+## Related Projects
🧰 Tools
🪛 LanguageTool

[grammar] ~22-~22: It appears that a preposition is missing after ‘Prior’.
Context: ...rt for different refresh strategies ## Prior Art - https://github.com/nroi/elfenbein - ...

(MISSING_PREPOSITION)


24-30: Format URLs as proper Markdown links

Replace bare URLs with properly formatted Markdown links to improve readability and accessibility. Consider adding descriptions of what each project is.

-- https://github.com/nroi/elfenbein
-- https://github.com/nroi/pg_materialized_views_refresh_topologically
-- https://github.com/darkside/monocle
-- https://github.com/maggregor/maggregor
-- https://github.com/adamfoneil/ViewMaterializer
-- https://github.com/jhollinger/activerecord-viewmatic
-- https://github.com/q-m/metabase-matview
+- [elfenbein](https://github.com/nroi/elfenbein) - A tool for refreshing PostgreSQL materialized views
+- [pg_materialized_views_refresh_topologically](https://github.com/nroi/pg_materialized_views_refresh_topologically) - PostgreSQL extension for refreshing materialized views
+- [monocle](https://github.com/darkside/monocle) - Materialized view library
+- [maggregor](https://github.com/maggregor/maggregor) - Database materialized view management
+- [ViewMaterializer](https://github.com/adamfoneil/ViewMaterializer) - .NET library for materializing views
+- [activerecord-viewmatic](https://github.com/jhollinger/activerecord-viewmatic) - ActiveRecord extension for materialized views
+- [metabase-matview](https://github.com/q-m/metabase-matview) - Metabase extension for materialized views
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

24-24: Bare URL used
null

(MD034, no-bare-urls)


25-25: Bare URL used
null

(MD034, no-bare-urls)


26-26: Bare URL used
null

(MD034, no-bare-urls)


27-27: Bare URL used
null

(MD034, no-bare-urls)


28-28: Bare URL used
null

(MD034, no-bare-urls)


29-29: Bare URL used
null

(MD034, no-bare-urls)


30-30: Bare URL used
null

(MD034, no-bare-urls)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b5677e and d525c62.

📒 Files selected for processing (12)
  • CHANGES.md (1 hunks)
  • cratedb_toolkit/materialized/core.py (1 hunks)
  • cratedb_toolkit/materialized/model.py (1 hunks)
  • cratedb_toolkit/materialized/readme.md (1 hunks)
  • cratedb_toolkit/materialized/schema.py (1 hunks)
  • cratedb_toolkit/materialized/schema.sql (1 hunks)
  • cratedb_toolkit/materialized/store.py (1 hunks)
  • examples/materialized_view.py (1 hunks)
  • tests/conftest.py (2 hunks)
  • tests/materialized/conftest.py (1 hunks)
  • tests/materialized/test_core.py (1 hunks)
  • tests/materialized/test_examples.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • CHANGES.md
🚧 Files skipped from review as they are similar to previous changes (8)
  • tests/conftest.py
  • cratedb_toolkit/materialized/schema.sql
  • tests/materialized/conftest.py
  • cratedb_toolkit/materialized/schema.py
  • examples/materialized_view.py
  • cratedb_toolkit/materialized/core.py
  • tests/materialized/test_core.py
  • cratedb_toolkit/materialized/model.py
🧰 Additional context used
🧠 Learnings (1)
cratedb_toolkit/materialized/store.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#56
File: cratedb_toolkit/materialized/store.py:154-159
Timestamp: 2025-05-07T01:01:21.282Z
Learning: In CrateDB, the REFRESH TABLE SQL statement is used to make changes visible to subsequent commands. The table that does bookkeeping on materializations (like in MaterializedViewStore.synchronize) is typically small, so refreshing it doesn't cause significant delays that would require timeout handling.
🧬 Code Graph Analysis (1)
tests/materialized/test_examples.py (2)
tests/materialized/conftest.py (2)
  • store (33-40)
  • database (25-29)
examples/materialized_view.py (1)
  • main (155-198)
🪛 LanguageTool
cratedb_toolkit/materialized/readme.md

[uncategorized] ~12-~12: Possible missing preposition found.
Context: ... applied in scenarios with high traffic reads. This approach can significantly reduce...

(AI_HYDRA_LEO_MISSING_OF)


[uncategorized] ~18-~18: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...efinitions - Refresh materialized views on demand or on schedule - Track metadata about m...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[grammar] ~22-~22: It appears that a preposition is missing after ‘Prior’.
Context: ...rt for different refresh strategies ## Prior Art - https://github.com/nroi/elfenbein - ...

(MISSING_PREPOSITION)

🪛 markdownlint-cli2 (0.17.2)
cratedb_toolkit/materialized/readme.md

24-24: Bare URL used
null

(MD034, no-bare-urls)


25-25: Bare URL used
null

(MD034, no-bare-urls)


26-26: Bare URL used
null

(MD034, no-bare-urls)


27-27: Bare URL used
null

(MD034, no-bare-urls)


28-28: Bare URL used
null

(MD034, no-bare-urls)


29-29: Bare URL used
null

(MD034, no-bare-urls)


30-30: Bare URL used
null

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Generic: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
  • GitHub Check: Kinesis: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Kinesis: Python 3.9 on OS ubuntu-latest
  • GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest
  • GitHub Check: CFR for OS windows-latest
  • GitHub Check: CFR: Python 3.12 on OS ubuntu-latest
  • GitHub Check: CFR for OS ubuntu-latest
  • GitHub Check: CFR: Python 3.9 on OS ubuntu-latest
  • GitHub Check: build-and-test
  • GitHub Check: CFR for OS macos-latest
  • GitHub Check: CFR for OS macos-13
🔇 Additional comments (2)
cratedb_toolkit/materialized/store.py (2)

137-140: Fix incorrect docstring in exists method

The docstring appears to be correct now, unlike what was mentioned in a previous review comment. It correctly mentions checking if a materialized view exists.


149-154: LGTM: Appropriate implementation of synchronize method

The synchronize method correctly issues a REFRESH TABLE statement to make changes visible. As noted in your retrieved learnings, the materialization bookkeeping table is typically small, so refreshing it won't cause significant delays that would require timeout handling.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
cratedb_toolkit/materialized/schema.py (3)

19-22: ⚠️ Potential issue

Fix method call for settings.database.safe

The safe method on DatabaseAddress is missing parentheses, which will cause the log to display a bound method object instead of the redacted URI.

logger.info(
    f"Installing materialized view management table at "
-   f"database '{settings.database.safe}', table {settings.materialized_table}"
+   f"database '{settings.database.safe()}', table {settings.materialized_table}"
)
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 19-19: cratedb_toolkit/materialized/schema.py#L19
Added line #L19 was not covered by tests


25-28: 🛠️ Refactor suggestion

Add error handling for SQL template formatting

There's no error handling for the template string formatting operation, which could fail if required keys are missing.

# Read SQL DDL statement.
sql = read_text("cratedb_toolkit.materialized", "schema.sql")

tplvars = settings.to_dict()
-sql = sql.format_map(tplvars)
+try:
+    sql = sql.format_map(tplvars)
+except KeyError as ex:
+    raise ValueError(f"Missing required template variable in schema.sql: {ex}") from ex
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 25-25: cratedb_toolkit/materialized/schema.py#L25
Added line #L25 was not covered by tests


[warning] 27-28: cratedb_toolkit/materialized/schema.py#L27-L28
Added lines #L27 - L28 were not covered by tests


34-36: 🛠️ Refactor suggestion

Add error handling for SQL execution

The SQL execution operations lack error handling, which could lead to unhandled exceptions.

# Materialize table schema.
-run_sql(settings.database.dburi, sql)
-run_sql(settings.database.dburi, f"REFRESH TABLE {settings.materialized_table.fullname}")
+try:
+    run_sql(settings.database.dburi, sql)
+    run_sql(settings.database.dburi, f"REFRESH TABLE {settings.materialized_table.fullname}")
+except Exception as ex:
+    logger.error(f"Failed to set up schema: {ex}")
+    raise ValueError(f"Failed to set up materialized view management table: {ex}") from ex
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 35-36: cratedb_toolkit/materialized/schema.py#L35-L36
Added lines #L35 - L36 were not covered by tests

cratedb_toolkit/materialized/core.py (1)

36-45: 🛠️ Refactor suggestion

Add error handling for SQL operations

The SQL execution operations for dropping and creating the staging table lack error handling.

sql_ddl = f"DROP TABLE IF EXISTS {mview.staging_table_fullname}"
logger.info(f"Dropping materialized view (staging): {sql_ddl}")
-self.store.execute(sa.text(sql_ddl))
+try:
+    self.store.execute(sa.text(sql_ddl))
+except Exception as ex:
+    logger.error(f"Failed to drop staging table: {ex}")
+    raise RuntimeError(f"Failed to refresh materialized view: {ex}") from ex

sql_ddl = f"CREATE TABLE IF NOT EXISTS {mview.staging_table_fullname} AS (\n{mview.sql}\n)"
logger.info(f"Creating materialized view (staging): {sql_ddl}")
-self.store.execute(sa.text(sql_ddl))
-sql_refresh = f"REFRESH TABLE {mview.staging_table_fullname}"
-self.store.execute(sa.text(sql_refresh))
+try:
+    self.store.execute(sa.text(sql_ddl))
+    sql_refresh = f"REFRESH TABLE {mview.staging_table_fullname}"
+    self.store.execute(sa.text(sql_refresh))
+except Exception as ex:
+    logger.error(f"Failed to create or refresh staging table: {ex}")
+    raise RuntimeError(f"Failed to refresh materialized view: {ex}") from ex
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 36-38: cratedb_toolkit/materialized/core.py#L36-L38
Added lines #L36 - L38 were not covered by tests


[warning] 40-44: cratedb_toolkit/materialized/core.py#L40-L44
Added lines #L40 - L44 were not covered by tests

cratedb_toolkit/materialized/store.py (2)

120-128: Previous feedback about error handling still applies

The execute() helper commits unconditionally and doesn’t protect
against SQLAlchemyErrors. Wrapping the call in try/except with a rollback
and proper logging would make the store more robust.

(This point was raised in an earlier review; tagging as duplicate.)


48-59: 🛠️ Refactor suggestion

Add the missing target-table existence check before insert

The create() method still contains a TODO for verifying that the referenced target table actually exists.
Without this guard the store can persist definitions pointing to non-existent tables, leading to errors later when the manager tries to refresh or swap them.

@@ def create(self, mview: MaterializedView, ignore: t.Optional[str] = None):
-        # TODO: Sanity check, whether target table already exists?
+        # Verify that the referenced base table is present; abort early if not.
+        target_fullname = f"{mview.table_schema}.{mview.table_name}"
+        if not self.database.table_exists(target_fullname):
+            raise ValueError(f"Target table '{target_fullname}' does not exist")

( DatabaseAdapter.table_exists() is already available, so the change is minimal.)

Implementing this now prevents bad state from entering the management table and avoids harder-to-diagnose runtime failures later.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 50-50: cratedb_toolkit/materialized/store.py#L50
Added line #L50 was not covered by tests


[warning] 53-59: cratedb_toolkit/materialized/store.py#L53-L59
Added lines #L53 - L59 were not covered by tests

🧹 Nitpick comments (5)
cratedb_toolkit/materialized/readme.md (2)

15-20: Consider adding details on implementation approach

The features section provides a good overview but could benefit from a brief mention of the implementation approach (table swapping, metadata tracking table, etc.) to help users understand the underlying mechanics.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~18-~18: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...efinitions - Refresh materialized views on demand or on schedule - Track metadata about m...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


22-31: Format Prior Art links as proper markdown

The current bare URLs could be better formatted as proper markdown links with descriptive text.

-## Prior Art
+## Prior Art
-
-https://github.com/nroi/elfenbein
-https://github.com/nroi/pg_materialized_views_refresh_topologically
-https://github.com/darkside/monocle
-https://github.com/maggregor/maggregor
-https://github.com/adamfoneil/ViewMaterializer
-https://github.com/jhollinger/activerecord-viewmatic
-https://github.com/q-m/metabase-matview
+
+- [Elfenbein](https://github.com/nroi/elfenbein)
+- [pg_materialized_views_refresh_topologically](https://github.com/nroi/pg_materialized_views_refresh_topologically)
+- [Monocle](https://github.com/darkside/monocle)
+- [Maggregor](https://github.com/maggregor/maggregor)
+- [ViewMaterializer](https://github.com/adamfoneil/ViewMaterializer)
+- [activerecord-viewmatic](https://github.com/jhollinger/activerecord-viewmatic)
+- [metabase-matview](https://github.com/q-m/metabase-matview)
🧰 Tools
🪛 LanguageTool

[grammar] ~22-~22: It appears that a preposition is missing after ‘Prior’.
Context: ...rt for different refresh strategies ## Prior Art - https://github.com/nroi/elfenbein - ...

(MISSING_PREPOSITION)

🪛 markdownlint-cli2 (0.17.2)

24-24: Bare URL used
null

(MD034, no-bare-urls)


25-25: Bare URL used
null

(MD034, no-bare-urls)


26-26: Bare URL used
null

(MD034, no-bare-urls)


27-27: Bare URL used
null

(MD034, no-bare-urls)


28-28: Bare URL used
null

(MD034, no-bare-urls)


29-29: Bare URL used
null

(MD034, no-bare-urls)


30-30: Bare URL used
null

(MD034, no-bare-urls)

cratedb_toolkit/materialized/model.py (2)

51-67: Fix return type annotation in to_storage_dict

The return type annotation is incorrect - it should be Dict[str, Any] instead of Dict[str, str] since the dictionary can contain non-string values.

-def to_storage_dict(self, identifier: t.Optional[str] = None) -> t.Dict[str, str]:
+def to_storage_dict(self, identifier: t.Optional[str] = None) -> t.Dict[str, t.Any]:
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 60-60: cratedb_toolkit/materialized/model.py#L60
Added line #L60 was not covered by tests


[warning] 63-63: cratedb_toolkit/materialized/model.py#L63
Added line #L63 was not covered by tests


[warning] 65-65: cratedb_toolkit/materialized/model.py#L65
Added line #L65 was not covered by tests


[warning] 67-67: cratedb_toolkit/materialized/model.py#L67
Added line #L67 was not covered by tests


78-98: Add return type annotation to to_dict method

The to_dict method is missing a return type annotation. Consider adding it for consistency.

-def to_dict(self):
+def to_dict(self) -> t.Dict[str, t.Any]:
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 96-98: cratedb_toolkit/materialized/model.py#L96-L98
Added lines #L96 - L98 were not covered by tests

cratedb_toolkit/materialized/store.py (1)

40-41: Gracefully handle missing management table during autoload

Table(..., autoload_with=self.database.engine) will raise sa.exc.NoSuchTableError
if the management schema hasn’t been installed yet.
Consider capturing this scenario and raising a clearer, actionable error (or creating the schema automatically).

Example pattern:

from sqlalchemy.exc import NoSuchTableError
...
try:
    self.table = Table(
        self.settings.materialized_table.table,
        metadata,
        autoload_with=self.database.engine,
    )
except NoSuchTableError as exc:
    raise RuntimeError(
        f"Materialized-view management table "
        f"'{self.settings.materialized_table.fullname}' is missing. "
        "Did you run `setup_schema`?"
    ) from exc

This gives users an immediate hint instead of an opaque SQLAlchemy stack-trace.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 40-41: cratedb_toolkit/materialized/store.py#L40-L41
Added lines #L40 - L41 were not covered by tests

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d525c62 and 1025b95.

📒 Files selected for processing (12)
  • CHANGES.md (1 hunks)
  • cratedb_toolkit/materialized/core.py (1 hunks)
  • cratedb_toolkit/materialized/model.py (1 hunks)
  • cratedb_toolkit/materialized/readme.md (1 hunks)
  • cratedb_toolkit/materialized/schema.py (1 hunks)
  • cratedb_toolkit/materialized/schema.sql (1 hunks)
  • cratedb_toolkit/materialized/store.py (1 hunks)
  • examples/materialized_view.py (1 hunks)
  • tests/conftest.py (2 hunks)
  • tests/materialized/conftest.py (1 hunks)
  • tests/materialized/test_core.py (1 hunks)
  • tests/materialized/test_examples.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • CHANGES.md
  • tests/materialized/test_examples.py
  • tests/conftest.py
  • cratedb_toolkit/materialized/schema.sql
  • tests/materialized/conftest.py
  • tests/materialized/test_core.py
  • examples/materialized_view.py
🧰 Additional context used
🧠 Learnings (2)
cratedb_toolkit/materialized/store.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#56
File: cratedb_toolkit/materialized/store.py:154-159
Timestamp: 2025-05-07T01:01:21.282Z
Learning: In CrateDB, the REFRESH TABLE SQL statement is used to make changes visible to subsequent commands. The table that does bookkeeping on materializations (like in MaterializedViewStore.synchronize) is typically small, so refreshing it doesn't cause significant delays that would require timeout handling.
cratedb_toolkit/materialized/core.py (1)
Learnt from: amotl
PR: crate/cratedb-toolkit#56
File: cratedb_toolkit/materialized/core.py:32-34
Timestamp: 2025-05-07T00:58:58.930Z
Learning: In the CrateDB toolkit codebase, use `TableAddress.from_string(name)` for parsing schema.table identifiers instead of manually splitting with `name.split(".")`. This approach is more robust and handles edge cases like quoted identifiers and names containing dots.
🧬 Code Graph Analysis (1)
cratedb_toolkit/materialized/schema.py (3)
cratedb_toolkit/materialized/model.py (2)
  • MaterializedViewSettings (79-98)
  • to_dict (95-98)
tests/materialized/conftest.py (2)
  • database (25-29)
  • settings (12-21)
cratedb_toolkit/model.py (3)
  • safe (138-145)
  • dburi (103-107)
  • fullname (191-197)
🪛 GitHub Check: codecov/patch
cratedb_toolkit/materialized/store.py

[warning] 27-27: cratedb_toolkit/materialized/store.py#L27
Added line #L27 was not covered by tests


[warning] 29-30: cratedb_toolkit/materialized/store.py#L29-L30
Added lines #L29 - L30 were not covered by tests


[warning] 32-32: cratedb_toolkit/materialized/store.py#L32
Added line #L32 was not covered by tests


[warning] 37-37: cratedb_toolkit/materialized/store.py#L37
Added line #L37 was not covered by tests


[warning] 40-41: cratedb_toolkit/materialized/store.py#L40-L41
Added lines #L40 - L41 were not covered by tests


[warning] 50-50: cratedb_toolkit/materialized/store.py#L50
Added line #L50 was not covered by tests


[warning] 53-59: cratedb_toolkit/materialized/store.py#L53-L59
Added lines #L53 - L59 were not covered by tests

cratedb_toolkit/materialized/core.py

[warning] 21-21: cratedb_toolkit/materialized/core.py#L21
Added line #L21 was not covered by tests


[warning] 24-24: cratedb_toolkit/materialized/core.py#L24
Added line #L24 was not covered by tests


[warning] 30-30: cratedb_toolkit/materialized/core.py#L30
Added line #L30 was not covered by tests


[warning] 32-34: cratedb_toolkit/materialized/core.py#L32-L34
Added lines #L32 - L34 were not covered by tests


[warning] 36-38: cratedb_toolkit/materialized/core.py#L36-L38
Added lines #L36 - L38 were not covered by tests


[warning] 40-44: cratedb_toolkit/materialized/core.py#L40-L44
Added lines #L40 - L44 were not covered by tests


[warning] 46-48: cratedb_toolkit/materialized/core.py#L46-L48
Added lines #L46 - L48 were not covered by tests


[warning] 57-61: cratedb_toolkit/materialized/core.py#L57-L61
Added lines #L57 - L61 were not covered by tests


[warning] 63-65: cratedb_toolkit/materialized/core.py#L63-L65
Added lines #L63 - L65 were not covered by tests

cratedb_toolkit/materialized/model.py

[warning] 41-41: cratedb_toolkit/materialized/model.py#L41
Added line #L41 was not covered by tests


[warning] 45-45: cratedb_toolkit/materialized/model.py#L45
Added line #L45 was not covered by tests


[warning] 49-49: cratedb_toolkit/materialized/model.py#L49
Added line #L49 was not covered by tests


[warning] 60-60: cratedb_toolkit/materialized/model.py#L60
Added line #L60 was not covered by tests


[warning] 63-63: cratedb_toolkit/materialized/model.py#L63
Added line #L63 was not covered by tests


[warning] 65-65: cratedb_toolkit/materialized/model.py#L65
Added line #L65 was not covered by tests


[warning] 67-67: cratedb_toolkit/materialized/model.py#L67
Added line #L67 was not covered by tests


[warning] 74-75: cratedb_toolkit/materialized/model.py#L74-L75
Added lines #L74 - L75 were not covered by tests


[warning] 96-98: cratedb_toolkit/materialized/model.py#L96-L98
Added lines #L96 - L98 were not covered by tests

cratedb_toolkit/materialized/schema.py

[warning] 19-19: cratedb_toolkit/materialized/schema.py#L19
Added line #L19 was not covered by tests


[warning] 25-25: cratedb_toolkit/materialized/schema.py#L25
Added line #L25 was not covered by tests


[warning] 27-28: cratedb_toolkit/materialized/schema.py#L27-L28
Added lines #L27 - L28 were not covered by tests


[warning] 30-32: cratedb_toolkit/materialized/schema.py#L30-L32
Added lines #L30 - L32 were not covered by tests


[warning] 35-36: cratedb_toolkit/materialized/schema.py#L35-L36
Added lines #L35 - L36 were not covered by tests

🪛 LanguageTool
cratedb_toolkit/materialized/readme.md

[uncategorized] ~12-~12: Possible missing preposition found.
Context: ... applied in scenarios with high traffic reads. This approach can significantly reduce...

(AI_HYDRA_LEO_MISSING_OF)


[uncategorized] ~18-~18: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...efinitions - Refresh materialized views on demand or on schedule - Track metadata about m...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[grammar] ~22-~22: It appears that a preposition is missing after ‘Prior’.
Context: ...rt for different refresh strategies ## Prior Art - https://github.com/nroi/elfenbein - ...

(MISSING_PREPOSITION)

🪛 markdownlint-cli2 (0.17.2)
cratedb_toolkit/materialized/readme.md

24-24: Bare URL used
null

(MD034, no-bare-urls)


25-25: Bare URL used
null

(MD034, no-bare-urls)


26-26: Bare URL used
null

(MD034, no-bare-urls)


27-27: Bare URL used
null

(MD034, no-bare-urls)


28-28: Bare URL used
null

(MD034, no-bare-urls)


29-29: Bare URL used
null

(MD034, no-bare-urls)


30-30: Bare URL used
null

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms (12)
  • GitHub Check: Generic: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
  • GitHub Check: Kinesis: Python 3.12 on OS ubuntu-latest
  • GitHub Check: CFR: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Kinesis: Python 3.9 on OS ubuntu-latest
  • GitHub Check: CFR for OS windows-latest
  • GitHub Check: CFR for OS ubuntu-latest
  • GitHub Check: build-and-test
  • GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest
  • GitHub Check: CFR for OS macos-latest
  • GitHub Check: CFR for OS macos-13
🔇 Additional comments (9)
cratedb_toolkit/materialized/readme.md (1)

1-13: LGTM! Good introduction to the materialized views subsystem.

The introduction clearly explains the purpose and benefits of emulating materialized views in CrateDB, focusing on improving query performance for complex or resource-intensive operations.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~12-~12: Possible missing preposition found.
Context: ... applied in scenarios with high traffic reads. This approach can significantly reduce...

(AI_HYDRA_LEO_MISSING_OF)

cratedb_toolkit/materialized/schema.py (1)

30-32: LGTM! Good dry-run implementation

The dry-run mode is implemented correctly, allowing for testing without actual database operations.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 30-32: cratedb_toolkit/materialized/schema.py#L30-L32
Added lines #L30 - L32 were not covered by tests

cratedb_toolkit/materialized/core.py (3)

14-25: LGTM! Good class structure and initialization

The MaterializedViewManager class is well-structured with clear responsibilities and proper initialization.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 21-21: cratedb_toolkit/materialized/core.py#L21
Added line #L21 was not covered by tests


[warning] 24-24: cratedb_toolkit/materialized/core.py#L24
Added line #L24 was not covered by tests


26-35: LGTM! Good use of TableAddress.from_string

The implementation correctly uses TableAddress.from_string() for parsing schema.table identifiers, which is robust against edge cases like quoted identifiers.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 30-30: cratedb_toolkit/materialized/core.py#L30
Added line #L30 was not covered by tests


[warning] 32-34: cratedb_toolkit/materialized/core.py#L32-L34
Added lines #L32 - L34 were not covered by tests


50-56: LGTM! Good documentation of CrateDB limitation

The comments thoroughly explain the CrateDB limitation regarding table renaming across schemas and include relevant issue links.

cratedb_toolkit/materialized/model.py (4)

12-37: LGTM! Well-structured MaterializedView dataclass

The MaterializedView dataclass is well-structured with clear field definitions, appropriate optional typing, and helpful metadata.


39-45: LGTM! Proper use of quote_relation_name

The table name properties correctly use quote_relation_name to handle special characters in identifiers.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 41-41: cratedb_toolkit/materialized/model.py#L41
Added line #L41 was not covered by tests


[warning] 45-45: cratedb_toolkit/materialized/model.py#L45
Added line #L45 was not covered by tests


47-50: LGTM! Simple and effective factory method

The from_record class method provides a clean way to instantiate objects from database records.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 49-49: cratedb_toolkit/materialized/model.py#L49
Added line #L49 was not covered by tests


70-75: LGTM! Good environment variable support

The default_table_address function correctly uses an environment variable with a sensible default value.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 74-75: cratedb_toolkit/materialized/model.py#L74-L75
Added lines #L74 - L75 were not covered by tests

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
cratedb_toolkit/materialized/model.py (1)

1-99: 💡 Verification agent

🧩 Analysis chain

Ensure comprehensive test coverage

According to previous review comments, several lines in this file are not covered by tests. Consider adding tests for:

  1. The table_fullname and staging_table_fullname properties (lines 39, 43)
  2. The documentation and implementation of to_storage_dict with the identifier parameter (lines 55-61)

Comprehensive test coverage is essential to ensure the reliability of this core data model.


🏁 Script executed:

#!/bin/bash
# Check test coverage for model.py

# Find test files that might test model.py
echo "Looking for test files that might test model.py..."
find tests -type f -name "test_*.py" | xargs grep -l "model\.py\|MaterializedView\|MaterializedViewSettings"

# Check for test coverage of specific methods
echo "Looking for tests that might cover table_fullname and staging_table_fullname..."
find tests -type f -name "test_*.py" | xargs grep -l "table_fullname\|staging_table_fullname"

echo "Looking for tests that might cover to_storage_dict..."
find tests -type f -name "test_*.py" | xargs grep -l "to_storage_dict"

Length of output: 706


Add tests for MaterializedView properties and to_storage_dict

To ensure full coverage of cratedb_toolkit/materialized/model.py, please add or extend tests for:

  • The table_fullname and staging_table_fullname properties (around lines 39 and 43)
    • Verify that table_fullname quotes <schema>.<table> correctly
    • Verify that staging_table_fullname quotes <schema>-staging.<table> correctly
  • The to_storage_dict method (lines 55–61)
    • Confirm default serialization preserves the existing id
    • Confirm passing an identifier argument overrides the id in the returned dict

These tests will cover the key behaviors and edge cases of the core data model.

♻️ Duplicate comments (1)
cratedb_toolkit/materialized/model.py (1)

51-51: Fix return type annotation in to_storage_dict

The current return type annotation is incorrect. The method returns a dictionary with mixed value types, not just strings:

-def to_storage_dict(self, identifier: t.Optional[str] = None) -> t.Dict[str, str]:
+def to_storage_dict(self, identifier: t.Optional[str] = None) -> t.Dict[str, t.Any]:

This would make the type annotation consistent with the actual return value types.

🧹 Nitpick comments (4)
cratedb_toolkit/materialized/readme.md (3)

12-13: Minor grammar improvement needed

Consider adding the preposition "in" before "high traffic reads" for better readability:

- time to run can be cached, which is specifically useful when applied in
- scenarios with high traffic reads. This approach can significantly reduce
+ time to run can be cached, which is specifically useful when applied in
+ scenarios with high traffic in reads. This approach can significantly reduce
🧰 Tools
🪛 LanguageTool

[uncategorized] ~12-~12: Possible missing preposition found.
Context: ... applied in scenarios with high traffic reads. This approach can significantly reduce...

(AI_HYDRA_LEO_MISSING_OF)


24-30: Format URLs as proper markdown links

The bare URLs should be formatted as proper markdown links for better readability and usability:

- - https://github.com/nroi/elfenbein
- - https://github.com/nroi/pg_materialized_views_refresh_topologically
- - https://github.com/darkside/monocle
- - https://github.com/maggregor/maggregor
- - https://github.com/adamfoneil/ViewMaterializer
- - https://github.com/jhollinger/activerecord-viewmatic
- - https://github.com/q-m/metabase-matview
+ - [Elfenbein](https://github.com/nroi/elfenbein)
+ - [PG Materialized Views Refresh Topologically](https://github.com/nroi/pg_materialized_views_refresh_topologically)
+ - [Monocle](https://github.com/darkside/monocle)
+ - [Maggregor](https://github.com/maggregor/maggregor)
+ - [ViewMaterializer](https://github.com/adamfoneil/ViewMaterializer)
+ - [ActiveRecord Viewmatic](https://github.com/jhollinger/activerecord-viewmatic)
+ - [Metabase Matview](https://github.com/q-m/metabase-matview)
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

24-24: Bare URL used
null

(MD034, no-bare-urls)


25-25: Bare URL used
null

(MD034, no-bare-urls)


26-26: Bare URL used
null

(MD034, no-bare-urls)


27-27: Bare URL used
null

(MD034, no-bare-urls)


28-28: Bare URL used
null

(MD034, no-bare-urls)


29-29: Bare URL used
null

(MD034, no-bare-urls)


30-30: Bare URL used
null

(MD034, no-bare-urls)


1-31: Enhance README with usage examples and installation instructions

The README provides a good overview but could be enhanced with:

  1. Installation instructions
  2. Basic usage examples
  3. Configuration options
  4. Reference to API documentation or inline API examples

This would make the README more comprehensive and help users get started quickly with the materialized views subsystem.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~12-~12: Possible missing preposition found.
Context: ... applied in scenarios with high traffic reads. This approach can significantly reduce...

(AI_HYDRA_LEO_MISSING_OF)


[uncategorized] ~18-~18: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...efinitions - Refresh materialized views on demand or on schedule - Track metadata about m...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[grammar] ~22-~22: It appears that a preposition is missing after ‘Prior’.
Context: ...rt for different refresh strategies ## Prior Art - https://github.com/nroi/elfenbein - ...

(MISSING_PREPOSITION)

🪛 markdownlint-cli2 (0.17.2)

24-24: Bare URL used
null

(MD034, no-bare-urls)


25-25: Bare URL used
null

(MD034, no-bare-urls)


26-26: Bare URL used
null

(MD034, no-bare-urls)


27-27: Bare URL used
null

(MD034, no-bare-urls)


28-28: Bare URL used
null

(MD034, no-bare-urls)


29-29: Bare URL used
null

(MD034, no-bare-urls)


30-30: Bare URL used
null

(MD034, no-bare-urls)

cratedb_toolkit/materialized/model.py (1)

95-98: Potential issue in MaterializedViewSettings.to_dict method

The current implementation of to_dict first converts the entire dataclass to a dictionary using dataclasses.asdict(), but then replaces the serialized materialized_table with the actual object. This creates an inconsistency as the rest of the fields are serialized but this one isn't.

Either:

  1. Return the fully serialized dictionary (current behavior of dataclasses.asdict())
  2. Return a dictionary with all object references preserved

Consider:

def to_dict(self):
    data = dataclasses.asdict(self)
-   data["materialized_table"] = self.materialized_table
    return data

Or clarify the intention with a comment if this mixed behavior is desired.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1025b95 and 6c462b9.

📒 Files selected for processing (12)
  • CHANGES.md (1 hunks)
  • cratedb_toolkit/materialized/core.py (1 hunks)
  • cratedb_toolkit/materialized/model.py (1 hunks)
  • cratedb_toolkit/materialized/readme.md (1 hunks)
  • cratedb_toolkit/materialized/schema.py (1 hunks)
  • cratedb_toolkit/materialized/schema.sql (1 hunks)
  • cratedb_toolkit/materialized/store.py (1 hunks)
  • examples/materialized_view.py (1 hunks)
  • tests/conftest.py (2 hunks)
  • tests/materialized/conftest.py (1 hunks)
  • tests/materialized/test_core.py (1 hunks)
  • tests/materialized/test_examples.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • cratedb_toolkit/materialized/store.py
🚧 Files skipped from review as they are similar to previous changes (9)
  • tests/conftest.py
  • tests/materialized/test_examples.py
  • CHANGES.md
  • cratedb_toolkit/materialized/schema.sql
  • cratedb_toolkit/materialized/schema.py
  • cratedb_toolkit/materialized/core.py
  • tests/materialized/conftest.py
  • examples/materialized_view.py
  • tests/materialized/test_core.py
🧰 Additional context used
🪛 LanguageTool
cratedb_toolkit/materialized/readme.md

[uncategorized] ~12-~12: Possible missing preposition found.
Context: ... applied in scenarios with high traffic reads. This approach can significantly reduce...

(AI_HYDRA_LEO_MISSING_OF)


[uncategorized] ~18-~18: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...efinitions - Refresh materialized views on demand or on schedule - Track metadata about m...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[grammar] ~22-~22: It appears that a preposition is missing after ‘Prior’.
Context: ...rt for different refresh strategies ## Prior Art - https://github.com/nroi/elfenbein - ...

(MISSING_PREPOSITION)

🪛 markdownlint-cli2 (0.17.2)
cratedb_toolkit/materialized/readme.md

24-24: Bare URL used
null

(MD034, no-bare-urls)


25-25: Bare URL used
null

(MD034, no-bare-urls)


26-26: Bare URL used
null

(MD034, no-bare-urls)


27-27: Bare URL used
null

(MD034, no-bare-urls)


28-28: Bare URL used
null

(MD034, no-bare-urls)


29-29: Bare URL used
null

(MD034, no-bare-urls)


30-30: Bare URL used
null

(MD034, no-bare-urls)

⏰ Context from checks skipped due to timeout of 90000ms (15)
  • GitHub Check: PyMongo: Python 3.12 on OS ubuntu-latest
  • GitHub Check: PyMongo: Python 3.9 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.9 on OS ubuntu-latest
  • GitHub Check: CFR: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Kinesis: Python 3.12 on OS ubuntu-latest
  • GitHub Check: Cloud: Python 3.13 on OS ubuntu-latest
  • GitHub Check: CFR: Python 3.9 on OS ubuntu-latest
  • GitHub Check: Kinesis: Python 3.9 on OS ubuntu-latest
  • GitHub Check: Generic: Python 3.8 on OS ubuntu-latest
  • GitHub Check: CFR for OS windows-latest
  • GitHub Check: CFR for OS ubuntu-latest
  • GitHub Check: build-and-test
  • GitHub Check: CFR for OS macos-latest
  • GitHub Check: CFR for OS macos-13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant