fix(powerbi): emit CorpUserInfoClass instead of CorpUserKeyClass for users #15748

dinesh-verma-datahub · 2025-12-22T07:16:39Z

Problem

PowerBI ingestion was emitting CorpUserKeyClass (just username) instead of CorpUserInfoClass, causing existing user profiles from LDAP/SCIM/Okta to be overwritten with incomplete data.

Solution

Emit CorpUserInfoClass with full profile (displayName, email, active)
Add overwrite_existing_users config (default: false) to protect existing users
Add customProperties for traceability (powerbi_user_id, powerbi_graph_id, etc.)
Mark non-human principals (Apps, Service Principals) as active=false

Changes

config.py: Added overwrite_existing_users with validation
powerbi.py: Fixed to_datahub_user() to emit CorpUserInfoClass + skip logic
powerbi_pre.md: Added User Ownership Configuration documentation
test_powerbi_user_creation.py: 26 comprehensive unit tests
golden_*.json: Regenerated 16 golden files

…users ## Problem PowerBI ingestion was emitting CorpUserKeyClass (just username) instead of CorpUserInfoClass, causing existing user profiles from LDAP/SCIM/Okta to be overwritten with incomplete data. ## Solution - Emit CorpUserInfoClass with full profile (displayName, email, active) - Add overwrite_existing_users config (default: false) to protect existing users - Add customProperties for traceability (powerbi_user_id, powerbi_graph_id, etc.) - Mark non-human principals (Apps, Service Principals) as active=false ## Changes - config.py: Added overwrite_existing_users with validation - powerbi.py: Fixed to_datahub_user() to emit CorpUserInfoClass + skip logic - powerbi_pre.md: Added User Ownership Configuration documentation - test_powerbi_user_creation.py: 26 comprehensive unit tests - golden_*.json: Regenerated 16 golden files

github-actions · 2025-12-22T07:16:50Z

Linear: ING-1312

codecov · 2025-12-22T07:19:05Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

- Fix import sorting in test file (ruff I001) - Fix markdown formatting (prettier) - Add troubleshooting section to docs - Add caching strategy comment - Clarify graph access requirements in config

- Add breaking change entry to updating-datahub.md for user overwrite behavior - Add YAML example to Known Limitations workaround section in powerbi_pre.md - Strengthen test for _check_user_exists to ensure 100% coverage

The relative path to powerbi_pre.md doesn't resolve in docs-website build context. Replaced with plain text reference.

skrydal · 2025-12-22T11:18:03Z

metadata-ingestion/docs/sources/powerbi/powerbi_pre.md

+  config:
+    ownership:
+      create_corp_user: true
+      overwrite_existing_users: true # Temporarily enable to refresh all users


if I have stateful ingestion enabled and have one run with this option enabled and then second run without it, will it accidentally delete the users ingested by the previous run?

skrydal · 2025-12-22T11:20:29Z

metadata-ingestion/docs/sources/powerbi/powerbi_pre.md

+- **Graph access requirement:** `overwrite_existing_users=false` requires DataHub graph access to
+  check if users exist. This works automatically when using the DataHub REST sink. File-based sinks
+  (e.g., writing to JSON files) don't have graph access.


Minor comment - there is an optional config datahub_api (set on the same level as sink or source). It should be possible to provide the graph connection via this parameter (automatically) if file sink is used.

skrydal · 2025-12-22T11:27:19Z

metadata-ingestion/docs/sources/powerbi/powerbi_pre.md

+
+### Troubleshooting
+
+**Warning: "Graph unavailable - creating all users"**


I would consider throwing an error here and stopping the execution, instead of a warning.

skrydal · 2025-12-22T11:34:38Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/powerbi.py

+            )
+            return []
+
+        # Log at DEBUG level for data quality monitoring (INFO is too noisy)


This comment is not needed, please remove it

skrydal · 2025-12-22T11:34:41Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/powerbi.py

-        user_key = CorpUserKeyClass(username=user.id)

-        user_key_mcp = self.new_mcp(
+        # Check if we should skip this user


This comment is not needed, please remove it

skrydal · 2025-12-22T11:34:54Z

metadata-ingestion/src/datahub/ingestion/source/powerbi/powerbi.py

+        logger.debug(f"Mapping user {user.displayName}(id={user.id}) to DataHub user")

-        # Create an URN for user
+        # Build user URN


This comment is not needed, please remove it

skrydal · 2025-12-22T11:42:33Z