Skip to content

Conversation

azeezport
Copy link
Collaborator

@azeezport azeezport commented Oct 13, 2025

User description

Description

What - Updated the bitbucket server integration to enable file and folder kind support. The integration supports:

File kind
folder kind

Why - To enable users using Bitbucket server to automatically sync their resources to Port

How -

Type of change

Please leave one option from the following and delete the rest:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • New Integration (non-breaking change which adds a new integration)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Non-breaking change (fix of existing functionality that will not change current behavior)
  • Documentation (added/updated documentation)

All tests should be run against the port production environment(using a testing org).

Core testing checklist

  • Integration able to create all default resources from scratch
  • Resync finishes successfully
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Scheduled resync able to abort existing resync and start a new one
  • Tested with at least 2 integrations from scratch
  • Tested with Kafka and Polling event listeners
  • Tested deletion of entities that don't pass the selector

Integration testing checklist

  • Integration able to create all default resources from scratch
  • Completed a full resync from a freshly installed integration and it completed successfully
  • Resync able to create entities
  • Resync able to update entities
  • Resync able to detect and delete entities
  • Resync finishes successfully
  • If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the examples folder in the integration directory.
  • If resource kind is updated, run the integration with the example data and check if the expected result is achieved
  • If new resource kind is added or updated, validate that live-events for that resource are working as expected
  • Docs PR link here

Preflight checklist

  • Handled rate limiting
  • Handled pagination
  • Implemented the code in async
  • Support Multi account

Screenshots

Include screenshots from your environment showing how the resources of the integration will look.

API Documentation

Provide links to the API documentation used for this integration.


PR Type

Enhancement


Description

  • Added file and folder kind support to Bitbucket Server integration

  • Implemented pattern-based file and folder discovery with glob matching

  • Added webhook processors for real-time file and folder updates

  • Enhanced client with directory browsing and raw content fetching capabilities


Diagram Walkthrough

flowchart LR
  client["BitbucketClient"] -- "browse & list" --> helpers["File/Folder Helpers"]
  helpers -- "pattern matching" --> resync["Resync Handlers"]
  webhooks["Webhook Processors"] -- "live updates" --> resync
  resync -- "entities" --> port["Port"]
Loading

File Walkthrough

Relevant files
Enhancement
8 files
client.py
Added file/folder browsing and content fetching methods   
+106/-0 
file.py
Implemented file pattern matching and processing logic     
+323/-0 
folder.py
Implemented folder pattern matching and recursive directory listing
+259/-0 
integration.py
Added file and folder resource config models                         
+59/-2   
main.py
Added resync handlers and webhook processors registration
+31/-1   
__init__.py
Exported new file and folder webhook processors                   
+4/-0     
file_pattern_webhook_processor.py
Created webhook processor for file pattern events               
+105/-0 
folder_pattern_webhook_processor.py
Created webhook processor for folder pattern events           
+100/-0 
Configuration changes
3 files
blueprints.json
Removed default blueprint definitions from repository       
+0/-245 
port-app-config.yml
Removed default port app configuration file                           
+0/-73   
spec.yaml
Added file and folder kinds to integration spec                   
+2/-0     
Additional files
1 files
__init__.py [link]   

Copy link
Contributor

qodo-merge-pro bot commented Oct 13, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Excessive data fetch

Description: Potential large file download risk: raw file content is fetched up to a 2MB cap without
repository-level guardrails or content-type checks, which may enable excessive bandwidth
usage or processing of unexpected binary content on webhook/resync.
file.py [150-176]

Referred Code
# 2) Conditional content fetch
content: Optional[bytes] = None
content_type_header: Optional[str] = None
if not skip_parsing:
    file_size = int(file_obj.get("size") or 0)
    if file_size == 0:
        # Zero-byte files are safe to fetch, but raw still returns cleanly
        content, content_type_header = await client.get_file_raw(project_key, repo_slug, file_path, at=at)
    elif file_size <= size_limit_bytes:
        content, content_type_header = await client.get_file_raw(project_key, repo_slug, file_path, at=at)
    else:
        logger.info(
            f"Skipping content download for large file ({file_size} bytes): {file_path}"
        )

file_obj = _finalize_metadata_fallbacks(file_obj, content_type_header)

result = {
    "content": content,  # bytes or None
    "repo": repo,
    "project": {"key": project_key},


 ... (clipped 6 lines)
Webhook DoS risk

Description: Webhook-triggered full file scan of repositories can be abused to cause resource-intensive
operations if webhook endpoint is exposed or event filtering is too broad, leading to
potential DoS via repeated re-hydration.
file_pattern_webhook_processor.py [90-104]

Referred Code
# Hydrate the repo object (optional, used by helper result)
repo_obj = await self._client.get_single_repository(project_key, repo_slug) or {
    "slug": repo_slug,
    "project": {"key": project_key},
}

# Run your existing per-repo file pipeline (no commits, just current state)
updated: List[dict] = []
async for batch in process_repository_files(self._client, repo_obj, pattern):
    # process_repository_files yields lists (batches) of file results; we flatten for WebhookEventRawResults
    updated.extend(batch)

return WebhookEventRawResults(
    updated_raw_results=updated,
    deleted_raw_results=[],  # no deletes (full refresh model)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
No custom compliance provided

Follow the guide to enable custom compliance check.

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

Copy link
Contributor

qodo-merge-pro bot commented Oct 13, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Restore deleted default resource configurations

The PR removes the default blueprints.json and port-app-config.yml files. These
files should be restored to provide a working out-of-the-box configuration for
users.

Examples:

integrations/bitbucket-server/.port/resources/blueprints.json [1]
integrations/bitbucket-server/.port/resources/port-app-config.yml [1]

Solution Walkthrough:

Before:

// File: integrations/bitbucket-server/.port/resources/blueprints.json
(File content removed)

// File: integrations/bitbucket-server/.port/resources/port-app-config.yml
(File content removed)

After:

// File: integrations/bitbucket-server/.port/resources/blueprints.json
[
    {
        "identifier": "bitbucketProject",
        "title": "Bitbucket Project",
        ...
    },
    {
        "identifier": "bitbucketRepository",
        "title": "Bitbucket Repository",
        ...
    }
]

// File: integrations/bitbucket-server/.port/resources/port-app-config.yml
resources:
  - kind: project
    ...
  - kind: repository
    ...
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a critical regression where default configuration files (blueprints.json and port-app-config.yml) were removed, breaking the out-of-the-box experience for users.

High
Possible issue
Correctly access nested webhook configuration
Suggestion Impact:The method was refactored to read the file pattern from resource.selector.files, including handling for missing selector/files and returning the pattern directly, matching the suggested fix.

code diff:

-    def _build_file_pattern_from_resource(self, resource: ResourceConfig) -> Optional[BitbucketServerFilePattern]:
+    def _build_file_pattern_from_resource(
+    self, resource: ResourceConfig
+    ) -> Optional[BitbucketServerFilePattern]:
+        """
+        Build a BitbucketServerFilePattern from the resource's selector.
+        resource.selector is a BitbucketServerFileSelector; the actual pattern
+        is found on its `files` attribute.
+        """
         try:
-            cfg = resource.config or {}
-            return BitbucketServerFilePattern(**cfg)
+            selector = getattr(resource, "selector", None)
+            if not selector:
+                logger.warning("[FilePatternWebhook] Resource has no selector")
+                return None
+
+            files_attr = getattr(selector, "files", None)
+            if not files_attr:
+                logger.warning("[FilePatternWebhook] selector.files is empty or missing")
+                return None
+
+            # selector.files may be a single BitbucketServerFilePattern or a list of them.
+            if isinstance(files_attr, list):
+                # If multiple are configured, pick the first (or extend the processor to iterate them).
+                pattern = files_attr[0] if files_attr else None
+            else:
+                pattern = files_attr
+
+            if pattern is None:
+                logger.warning("[FilePatternWebhook] No usable file pattern found in selector.files")
+                return None
+
+            # Optional: quick sanity checks to avoid surprises at runtime
+            if not getattr(pattern, "project_key", None):
+                logger.warning("[FilePatternWebhook] Pattern missing project_key")
+            if not getattr(pattern, "filenames", None):
+                logger.warning("[FilePatternWebhook] Pattern has no filenames")
+
+            return pattern
+
         except Exception as e:
-            logger.error(f"[FilePatternWebhook] Failed to build BitbucketServerFilePattern from resource config: {e}")
+            logger.error(
+                f"[FilePatternWebhook] Failed to build BitbucketServerFilePattern from resource.selector.files: {e}"
+            )
             return None

Correct the logic in _build_file_pattern_from_resource to access the
BitbucketServerFilePattern from resource.selector.files instead of incorrectly
using resource.config.

integrations/bitbucket-server/webhook_processors/processors/file_pattern_webhook_processor.py [60-66]

 def _build_file_pattern_from_resource(self, resource: ResourceConfig) -> Optional[BitbucketServerFilePattern]:
     try:
-        cfg = resource.config or {}
-        return BitbucketServerFilePattern(**cfg)
+        # resource.config is an alias for selector, which is BitbucketServerFileSelector
+        # The actual pattern is in the `files` attribute of the selector
+        return resource.selector.files
     except Exception as e:
         logger.error(f"[FilePatternWebhook] Failed to build BitbucketServerFilePattern from resource config: {e}")
         return None

[Suggestion processed]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a bug where the code attempts to instantiate BitbucketServerFilePattern from the wrong configuration object, which would cause a runtime error and break the file webhook functionality.

High
Fix incorrect folder pattern extraction
Suggestion Impact:The commit removed the flawed _build_folder_pattern_from_resource and implemented _iter_folder_patterns_from_resource to read patterns from resource.selector.folders, then processed all matching patterns safely. This addresses the core issue identified by the suggestion, though not by simply returning None but by correctly iterating over the list.

code diff:

-    def _build_folder_pattern_from_resource(self, resource: ResourceConfig) -> Optional[BitbucketServerFolderPattern]:
-        try:
-            cfg = resource.config or {}
-            return BitbucketServerFolderPattern(**cfg)
-        except Exception as e:
-            logger.error(f"[FolderPatternWebhook] Failed to build BitbucketServerFolderPattern from resource config: {e}")
-            return None
+    def _iter_folder_patterns_from_resource(self, resource: ResourceConfig) -> Iterable[BitbucketServerFolderPattern]:
+        """
+        Returns an iterator over BitbucketServerFolderPattern objects from resource.selector.folders.
+        Handles single-or-list cases safely.
+        """
+        selector = getattr(resource, "selector", None)
+        if not selector:
+            logger.warning("[FolderPatternWebhook] Resource has no selector")
+            return []
+
+        folders_attr = getattr(selector, "folders", None)
+        if not folders_attr:
+            logger.warning("[FolderPatternWebhook] selector.folders is empty or missing")
+            return []
+
+        # Could be a single pattern or a list
+        if isinstance(folders_attr, list):
+            return [p for p in folders_attr if p]
+        return [folders_attr]
+
+    def _pattern_matches_repo(self, pattern: BitbucketServerFolderPattern, project_key: str, repo_slug: str) -> bool:
+        if pattern.project_key not in (project_key, "*"):
+            return False
+        if pattern.repos and repo_slug not in pattern.repos and "*" not in pattern.repos:
+            return False
+        return True
 
     # ---- main ----
 
@@ -71,30 +90,43 @@
         if not project_key or not repo_slug:
             return WebhookEventRawResults(updated_raw_results=[], deleted_raw_results=[])
 
-        pattern = self._build_folder_pattern_from_resource(resource)
-        if not pattern:
-            logger.warning("[FolderPatternWebhook] No usable folder pattern in resource; skipping.")
+        patterns = list(self._iter_folder_patterns_from_resource(resource))
+        if not patterns:
             return WebhookEventRawResults(updated_raw_results=[], deleted_raw_results=[])
 
-        # Respect repo filter if present
-        if pattern.repos and repo_slug not in pattern.repos and "*" not in pattern.repos:
-            return WebhookEventRawResults(updated_raw_results=[], deleted_raw_results=[])
-
-        # Respect project filter if present
-        if pattern.project_key not in (project_key, "*"):
-            return WebhookEventRawResults(updated_raw_results=[], deleted_raw_results=[])
-
-        # Hydrate the repo object (for folder helper signature)
+        # Fetch repo once; reused for all patterns
         repo_obj = await self._client.get_single_repository(project_key, repo_slug) or {
             "slug": repo_slug,
             "project": {"key": project_key},
         }
         repo_info = (repo_obj, project_key)
 
-        # Run your existing per-repo folder pipeline (no commits, just current state)
-        updated = await process_repository_folders(self._client, repo_info, pattern)
+        updated: List[dict] = []
+        seen_keys: set[tuple[str, str, str]] = set()  # (project_key, repo_slug, folder_path)
+
+        for pattern in patterns:
+            if not self._pattern_matches_repo(pattern, project_key, repo_slug):
+                continue
+
+            try:
+                matches = await process_repository_folders(self._client, repo_info, pattern)
+            except Exception as e:
+                logger.error(
+                    f"[FolderPatternWebhook] Failed processing repo {project_key}/{repo_slug} for pattern {getattr(pattern, 'path', '')}: {e}"
+                )
+                continue
+
+            # De-duplicate across patterns by (project, repo, folder.path)
+            for m in matches:
+                folder = m.get("folder") or {}
+                folder_path = folder.get("path", "")
+                key = (project_key, repo_slug, folder_path)
+                if key in seen_keys:
+                    continue
+                seen_keys.add(key)
+                updated.append(m)
 
         return WebhookEventRawResults(
             updated_raw_results=updated,
-            deleted_raw_results=[],  # no deletes (full refresh model)
+            deleted_raw_results=[],  # full-refresh model (no commit-based deletes)
         )

Fix the _build_folder_pattern_from_resource function, which incorrectly attempts
to create a BitbucketServerFolderPattern from resource.config. The configuration
is a list of patterns under resource.selector.folders, and the current logic
will fail.

integrations/bitbucket-server/webhook_processors/processors/folder_pattern_webhook_processor.py [57-63]

 def _build_folder_pattern_from_resource(self, resource: ResourceConfig) -> Optional[BitbucketServerFolderPattern]:
-    try:
-        cfg = resource.config or {}
-        return BitbucketServerFolderPattern(**cfg)
-    except Exception as e:
-        logger.error(f"[FolderPatternWebhook] Failed to build BitbucketServerFolderPattern from resource config: {e}")
-        return None
+    # This method is logically flawed as `resource.selector.folders` is a list.
+    # A webhook event for a single repo change can't easily determine which of the multiple
+    # folder patterns to re-evaluate. The current implementation will always fail.
+    # For now, returning None to prevent crashes, but this webhook processor
+    # for folders needs a more detailed implementation to be useful.
+    logger.warning(
+        "[FolderPatternWebhook] The logic to handle folder patterns from webhooks is not fully implemented, as it needs to decide which pattern to use from a list. Skipping."
+    )
+    return None

[Suggestion processed]

Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical bug where the code would fail at runtime due to trying to instantiate BitbucketServerFolderPattern from a selector object containing a list of patterns, which is invalid.

High
General
Stream file listing to reduce memory
Suggestion Impact:The commit implements streaming by replacing the all_items list with an async for over list_files_recursively_stream and adds the new async generator function to recursively yield items.

code diff:

-        all_items: List[Dict[str, Any]] = []
-        await list_files_recursively(client, project_key, repo_slug, "", all_items)
-
-        for item in all_items:
+        # Process items as a stream instead of collecting them all in memory first
+        async for item in list_files_recursively_stream(client, project_key, repo_slug, ""):
             # We only consider FILE items as candidates to match the filename patterns
             if item.get("type") != "FILE":
                 continue
@@ -298,6 +296,41 @@
         logger.error(f"Failed to list files in repository {repo_slug}: {e}")
 
 
+async def list_files_recursively_stream(
+    client: "BitbucketClient",
+    project_key: str,
+    repo_slug: str,
+    path: str,
+) -> AsyncGenerator[Dict[str, Any], None]:
+    """Recursively list all items under the given path using /files API as a stream."""
+    try:
+        path_to_use = "" if path in ("", "*") else path
+
+        async for contents in client.get_directory_contents(project_key, repo_slug, path_to_use):
+            for item in contents:
+                # Yield the item first
+                if isinstance(item, dict) and "path" in item:
+                    yield item
+                    if item.get("type") == "DIRECTORY":
+                        async for sub_item in list_files_recursively_stream(client, project_key, repo_slug, item["path"]):
+                            yield sub_item
+                elif isinstance(item, str):
+                    is_dir = item.endswith("/")
+                    file_obj = {
+                        "path": item.rstrip("/") if is_dir else item,
+                        "type": "DIRECTORY" if is_dir else "FILE",
+                    }
+                    yield file_obj
+                    if is_dir:
+                        async for sub_item in list_files_recursively_stream(client, project_key, repo_slug, file_obj["path"]):
+                            yield sub_item
+                else:
+                    logger.debug(f"Unknown item shape from directory listing: {item!r}")
+
+    except Exception as e:
+        logger.error(f"Error listing directory '{path}' in {repo_slug}: {e}")
+

Refactor process_repository_files and list_files_recursively to use an async
generator. This will stream file items instead of collecting them all in memory,
improving memory efficiency for large repositories.

integrations/bitbucket-server/helpers/file.py [261-298]

 async def process_repository_files(
     client: "BitbucketClient",
     repo: Dict[str, Any],
     file_pattern: BitbucketServerFilePattern,
 ) -> AsyncGenerator[List[Dict[str, Any]], None]:
     """
     Process files in a repository that match the pattern.
     Honors `skip_parsing` to avoid fetching file bodies.
     """
     repo_slug = repo["slug"]
     project_key = repo["project"]["key"]
 
     try:
-        all_items: List[Dict[str, Any]] = []
-        await list_files_recursively(client, project_key, repo_slug, "", all_items)
-
-        for item in all_items:
+        # Process items as a stream instead of collecting them all in memory first
+        async for item in list_files_recursively_stream(client, project_key, repo_slug, ""):
             # We only consider FILE items as candidates to match the filename patterns
             if item.get("type") != "FILE":
                 continue
 
             file_path = item.get("path", "")
             if matches_file_pattern(file_path, file_pattern.path, file_pattern.filenames):
                 result = await process_matching_file(
                     client,
                     project_key,
                     repo_slug,
                     file_path,
                     repo,
                     at=None,  # Optional: thread a ref through BitbucketServerFilePattern if you need it
                     size_limit_bytes=2_000_000,
                     skip_parsing=file_pattern.skip_parsing,
                 )
                 if result:
                     yield [result]
 
     except Exception as e:
         logger.error(f"Failed to list files in repository {repo_slug}: {e}")
 
+
+async def list_files_recursively_stream(
+    client: "BitbucketClient",
+    project_key: str,
+    repo_slug: str,
+    path: str,
+) -> AsyncGenerator[Dict[str, Any], None]:
+    """Recursively list all items under the given path using /files API as a stream."""
+    try:
+        path_to_use = "" if path in ("", "*") else path
+
+        async for contents in client.get_directory_contents(project_key, repo_slug, path_to_use):
+            for item in contents:
+                # Yield the item first
+                if isinstance(item, dict) and "path" in item:
+                    yield item
+                    if item.get("type") == "DIRECTORY":
+                        async for sub_item in list_files_recursively_stream(client, project_key, repo_slug, item["path"]):
+                            yield sub_item
+                elif isinstance(item, str):
+                    is_dir = item.endswith("/")
+                    file_obj = {
+                        "path": item.rstrip("/") if is_dir else item,
+                        "type": "DIRECTORY" if is_dir else "FILE",
+                    }
+                    yield file_obj
+                    if is_dir:
+                        async for sub_item in list_files_recursively_stream(client, project_key, repo_slug, file_obj["path"]):
+                            yield sub_item
+                else:
+                    logger.debug(f"Unknown item shape from directory listing: {item!r}")
+
+    except Exception as e:
+        logger.error(f"Error listing directory '{path}' in {repo_slug}: {e}")
+

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion proposes a valid performance and memory optimization by streaming file items instead of buffering them in a list, which is beneficial for large repositories.

Medium
  • Update

Copy link

This pull request is automatically being deployed by Amplify Hosting (learn more).

Access this pull request here: https://pr-2284.d1ftd8v2gowp8w.amplifyapp.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants