Skip to content

Conversation

@treff7es
Copy link
Contributor

No description provided.

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels Dec 19, 2025
@codecov
Copy link

codecov bot commented Dec 19, 2025

❌ 3 Tests Failed:

Tests completed Failed Passed Skipped
12579 3 12576 87
View the top 3 failed test(s) by shortest run time
tests.unit.test_unity_catalog_source.TestUnityCatalogSource::test_process_ml_model_generates_workunits
Stack Traces | 0.004s run time
self = <tests.unit.test_unity_catalog_source.TestUnityCatalogSource object at 0x7f0c21722a30>
mock_hive_proxy = <MagicMock name='HiveMetastoreProxy' id='139688225449008'>
mock_unity_proxy = <MagicMock name='UnityCatalogApiProxy' id='139688241344864'>

    @patch("datahub.ingestion.source.unity.source.UnityCatalogApiProxy")
    @patch("datahub.ingestion.source.unity.source.HiveMetastoreProxy")
    def test_process_ml_model_generates_workunits(
        self, mock_hive_proxy, mock_unity_proxy
    ):
        """Test that process_ml_model generates proper workunits."""
        from datetime import datetime
    
        from datahub.ingestion.api.common import PipelineContext
        from datahub.ingestion.source.unity.proxy_types import (
            Catalog,
            Metastore,
            Model,
            ModelVersion,
            Schema,
        )
    
        config = UnityCatalogSourceConfig.model_validate(
            {
                "token": "test_token",
                "workspace_url": "https://test.databricks.com",
                "warehouse_id": "test_warehouse",
                "include_hive_metastore": False,
            }
        )
    
        ctx = PipelineContext(run_id="test_run")
        source = UnityCatalogSource.create(config, ctx)
    
        # Create test schema
        metastore = Metastore(
            id="metastore",
            name="metastore",
            comment=None,
            global_metastore_id=None,
            metastore_id=None,
            owner=None,
            region=None,
            cloud=None,
        )
        catalog = Catalog(
            id="test_catalog",
            name="test_catalog",
            metastore=metastore,
            comment=None,
            owner=None,
            type=None,
        )
        schema = Schema(
            id="test_catalog.test_schema",
            name="test_schema",
            catalog=catalog,
            comment=None,
            owner=None,
        )
    
        # Create test model
        test_model = Model(
            id="test_catalog.test_schema.test_model",
            name="test_model",
            description="Test description",
            schema_name="test_schema",
            catalog_name="test_catalog",
            created_at=datetime(2023, 1, 1),
            updated_at=datetime(2023, 1, 2),
        )
    
        # Create test model version
        test_model_version = ModelVersion(
            id="test_catalog.test_schema.test_model_1",
            name="test_model_1",
            model=test_model,
            version="1",
            aliases=["prod"],
            description="Version 1",
            created_at=datetime(2023, 1, 3),
            updated_at=datetime(2023, 1, 4),
            created_by="test_user",
            run_details=None,
            signature=None,
        )
    
        # Process the model
        ml_model_workunits = list(source.process_ml_model(test_model, schema))
    
        # Should generate workunits (MLModelGroup creation and container assignment)
        assert len(ml_model_workunits) > 0
    
        assert len(source.report.ml_models.processed_entities) == 1
>       assert (
            source.report.ml_models.processed_entities[0][1]
            == "test_catalog.test_schema.test_model"
        )
E       AssertionError: assert 'e' == 'test_catalog.test_schema.test_model'
E         
E         - test_catalog.test_schema.test_model
E         + e

tests/unit/test_unity_catalog_source.py:550: AssertionError
tests.integration.mode.test_mode::test_mode_ingest_failure
Stack Traces | 0.115s run time
pytestconfig = <_pytest.config.Config object at 0x7f0911699450>
tmp_path = PosixPath('.../pytest-of-runner/pytest-0/test_mode_ingest_failure0')

    @freeze_time(FROZEN_TIME)
    def test_mode_ingest_failure(pytestconfig, tmp_path):
        with patch(
            "datahub.ingestion.source.mode.requests.Session",
            side_effect=mocked_requests_failure,
        ):
            global test_resources_dir
            test_resources_dir = pytestconfig.rootpath / "tests/integration/mode"
    
            pipeline = Pipeline.create(
                {
                    "run_id": "mode-test",
                    "source": {
                        "type": "mode",
                        "config": {
                            "token": "xxxx",
                            "password": "xxxx",
                            "connect_uri": "https://app.mode.com/",
                            "workspace": "acryl",
                        },
                    },
                    "sink": {
                        "type": "file",
                        "config": {
                            "filename": f"{tmp_path}/mode_mces.json",
                        },
                    },
                }
            )
            pipeline.run()
            with pytest.raises(PipelineExecutionError) as exec_error:
                pipeline.raise_from_status()
            assert exec_error.value.args[0] == "Source reported errors"
            assert len(exec_error.value.args[1]) == 1
            error_dict: StructuredLogEntry
>           _level, error_dict = exec_error.value.args[1][0]
            ^^^^^^^^^^^^^^^^^^
E           TypeError: cannot unpack non-iterable StructuredLogEntry object

.../integration/mode/test_mode.py:209: TypeError
tests.integration.mode.test_mode::test_mode_ingest_json_failure
Stack Traces | 0.197s run time
pytestconfig = <_pytest.config.Config object at 0x7f0911699450>
tmp_path = PosixPath('.../pytest-of-runner/pytest-0/test_mode_ingest_json_failure0')

    @freeze_time(FROZEN_TIME)
    def test_mode_ingest_json_failure(pytestconfig, tmp_path):
        with patch(
            "datahub.ingestion.source.mode.requests.Session",
            side_effect=lambda *args, **kwargs: MockResponseJson(
                json_error_list=["https://app.mode.com/api/modeuser"]
            ),
        ):
            global test_resources_dir
            test_resources_dir = pytestconfig.rootpath / "tests/integration/mode"
    
            pipeline = Pipeline.create(
                {
                    "run_id": "mode-test",
                    "source": {
                        "type": "mode",
                        "config": {
                            "token": "xxxx",
                            "password": "xxxx",
                            "connect_uri": "https://app.mode.com/",
                            "workspace": "acryl",
                        },
                    },
                    "sink": {
                        "type": "file",
                        "config": {
                            "filename": f"{tmp_path}/mode_mces.json",
                        },
                    },
                }
            )
            pipeline.run()
            pipeline.raise_from_status(raise_warnings=False)
            with pytest.raises(PipelineExecutionError) as exec_error:
                pipeline.raise_from_status(raise_warnings=True)
            assert len(exec_error.value.args[1]) > 0
            error_dict: StructuredLogEntry
>           _level, error_dict = exec_error.value.args[1][0]
            ^^^^^^^^^^^^^^^^^^
E           TypeError: cannot unpack non-iterable StructuredLogEntry object

.../integration/mode/test_mode.py:287: TypeError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@alwaysmeticulous
Copy link

alwaysmeticulous bot commented Dec 19, 2025

✅ Meticulous spotted 0 visual differences across 967 screens tested: view results.

Meticulous evaluated ~8 hours of user flows against your PR.

Expected differences? Click here. Last updated for commit 87527a6. This comment will update as new commits are pushed.

@codecov
Copy link

codecov bot commented Dec 19, 2025

Bundle Report

Bundle size has no change ✅

…ssors/standard_schema_processor.py

Co-authored-by: aikido-pr-checks[bot] <169896070+aikido-pr-checks[bot]@users.noreply.github.com>
try:
if not url.startswith(("http://", "https://")):
raise ValueError("Invalid URL scheme")
response = requests.get(url, timeout=10)
Copy link
Contributor

@aikido-pr-checks aikido-pr-checks bot Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential user input in HTTP request may allow SSRF attack - high severity
If an attacker can control the URL input leading into this HTTP request, the attack might be able to perform an SSRF attack. This kind of attack is even more dangerous if the application returns the response of the request to the user. It could allow them to retrieve information from higher privileged services within the network (such as the metadata service, which is commonly available in cloud services, and could allow them to retrieve credentials).

Remediation: If possible, only allow requests to allowlisting domains. If not, consult the article linked above to learn about other mitigating techniques such as disabling redirects, blocking private IPs and making sure private services have internal authentication. If you return data coming from the request to the user, validate the data before returning it to make sure you don't return random data.
View details in Aikido Security

@github-actions github-actions bot requested a deployment to datahub-project-web-react (Preview) December 19, 2025 16:18 Abandoned
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops PR or Issue related to DataHub backend & deployment ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants