Skip to content

Filter out HNS directories from listing #149

Merged
noa-neria merged 6 commits into
run-ai:masterfrom
wonwuakpa-msft:wendi/fix-hns-list
Jun 14, 2026
Merged

Filter out HNS directories from listing #149
noa-neria merged 6 commits into
run-ai:masterfrom
wonwuakpa-msft:wendi/fix-hns-list

Conversation

@wonwuakpa-msft

@wonwuakpa-msft wonwuakpa-msft commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Description

Previously, directory stubs (that did not end with a trailing slash) were not being filtered properly. The API lists directories as blobs, which causes the model streamer to attempt to download directories as if they were files. This results in errors such as IsADirectoryError when the streamer tries to create a file where a directory already exists

This PR introduces a mechanism to rely on the blob directory metatdata hdi_isfolder to determine the entity type (either directory or file)

Testing:

Added unit tests,
Manually imported the module for validation

Summary by CodeRabbit

  • Bug Fixes

    • Listing results now reliably exclude ADLS/HNS directory stubs by inspecting blob metadata (case-insensitive) and size, for both recursive and non-recursive listings — reducing false-positive directories.
  • Tests

    • Added tests covering recursive/non-recursive listings and allow/ignore pattern filtering (e.g., include/exclude by extension) to validate the corrected behavior.

@greptile-apps

greptile-apps Bot commented Jun 5, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a bug where ADLS/HNS directory stubs — blobs without a trailing / but with hdi_isfolder=true metadata — were not being filtered during listing, causing the model streamer to attempt downloading them as files and hitting IsADirectoryError.

  • Adds _is_adls_directory(blob) helper that inspects blob.metadata for the hdi_isfolder key (checking both lower-case and title-case variants) and gates on blob.size == 0; both list_blobs (recursive path) and walk_blobs (non-recursive path) are updated to request metadata via include=[\"metadata\"].
  • Adds four unit tests covering recursive and non-recursive listing, allow-pattern, and ignore-pattern filtering; mock objects use spec=BlobProperties and set size/metadata explicitly to match the real SDK contract.

Confidence Score: 5/5

Safe to merge; the new filtering logic is correct, the None-safety guard is properly short-circuited, and the prior self-parameter crash has been resolved.

The core change — threading include=[metadata] into both listing calls and checking hdi_isfolder via a module-level helper — is logically sound and well-tested. The previously flagged self parameter crash is fixed, and the None guard is correct. The only remaining note is a defensive one about the size == 0 co-condition, which is not a current defect given that ADLS directory stubs reliably have size 0.

No files require special attention; both changed files are straightforward and the test coverage for the new helper is adequate.

Important Files Changed

Filename Overview
py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py Adds _is_adls_directory helper and threads include=[metadata] into both list_blobs and walk_blobs calls to filter out ADLS/HNS directory stubs. The self parameter bug from the previous review round has been removed and the None-safety guard via short-circuit and is correct.
py/runai_model_streamer_azure/runai_model_streamer_azure/files/tests/test_files.py Adds four new test cases covering recursive/non-recursive listing and allow/ignore pattern filtering with ADLS directory stubs; mock setup is correct. Tests use list-based return values for both list_blobs and walk_blobs.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[list_files called] --> B{recursive?}
    B -- yes --> C[list_blobs with include=metadata]
    B -- no --> D[walk_blobs with include=metadata]
    C --> E[for each item]
    D --> F[for each item]
    E --> G{hasattr name AND NOT _is_adls_directory}
    F --> H{hasattr name AND NOT endswith slash AND NOT _is_adls_directory}
    G -- pass --> I[append to paths]
    H -- pass --> I
    G -- filtered --> J[skip item]
    H -- filtered --> J
    I --> K[_filter_ignore paths asterisk-slash]
    K --> L{allow_pattern?}
    L -- yes --> M[_filter_allow]
    L -- no --> N{ignore_pattern?}
    M --> N
    N -- yes --> O[_filter_ignore]
    N -- no --> P[return results]
    O --> P
Loading

Reviews (6): Last reviewed commit: "addressed comments" | Re-trigger Greptile

Comment thread py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py Outdated
Comment thread py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py Outdated
Comment thread py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py Outdated
@wonwuakpa-msft wonwuakpa-msft changed the title Remove directory stub from listing in HNS accounts Filter out HNS directories from listing Jun 5, 2026
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

list_files now requests blob metadata for both recursive and non-recursive listings and excludes ADLS directory stubs by checking zero-byte blobs with hdi_isfolder=true metadata via a new _is_adls_directory helper.

Changes

ADLS Directory Filtering in File Listing

Layer / File(s) Summary
ADLS directory detection helper and types
py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py
BlobProperties is imported for typing. A new _is_adls_directory(blob) helper identifies ADLS directory stubs as zero-byte blobs with hdi_isfolder=true metadata, handling lowercase and capitalized key variants.
Integration into list_files
py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py
list_files requests metadata in recursive (list_blobs) and non-recursive (walk_blobs) listing calls and filters results using _is_adls_directory in addition to existing filename-suffix directory filtering.
Unit tests for list_files
py/runai_model_streamer_azure/runai_model_streamer_azure/files/tests/test_files.py
Adds tests with MagicMock and BlobProperties to validate recursive vs non-recursive listings and allow_pattern/ignore_pattern behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit peeks at metadata bright,
Sniffs zero bytes in the moonlight,
“Hdi_isfolder?” — a whispered clue,
Stubs vanish; listings breathe anew,
Hopping tidy through blobs tonight. 🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and specifically addresses the main change: filtering out HNS (Hierarchical Namespace Storage / ADLS Gen2) directories from blob listings to prevent them from being treated as files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py (1)

176-190: ⚡ Quick win

Add regression coverage for zero-byte files vs. HNS directory stubs.

This branch now changes download eligibility based on size == 0 plus metadata. Please add focused tests for both recursive and non-recursive listing so we lock in that ordinary zero-byte files are kept while hdi_isfolder=true stubs are excluded.

Also applies to: 225-235

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py`
around lines 176 - 190, The listing logic now excludes ADLS directory stubs
based on zero size plus metadata; add regression tests that cover both recursive
(code path using container_client.list_blobs) and non‑recursive
(container_client.walk_blobs) flows to assert that ordinary zero‑byte files
(size == 0 without hdi_isfolder metadata) are returned while ADLS directory
stubs (size == 0 with metadata hdi_isfolder=true) are excluded; create
fixtures/mocked BlobProperties and BlobPrefix items for both cases and assert
behavior for the functions that call _is_adls_directory and the two listing
branches (the block referencing container_client.list_blobs and the similar
block around container_client.walk_blobs, also mirrored in the second occurrence
around the 225-235 region).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py`:
- Around line 225-235: The helper function _is_adls_directory is defined with an
extra self parameter but is called with a single blob argument (e.g., from
glob() and pull_files()), causing a TypeError; change the signature of
_is_adls_directory to accept only a single parameter (blob: BlobProperties) and
keep the existing body intact so callers like glob() and pull_files() can pass
the blob without error.

---

Nitpick comments:
In `@py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py`:
- Around line 176-190: The listing logic now excludes ADLS directory stubs based
on zero size plus metadata; add regression tests that cover both recursive (code
path using container_client.list_blobs) and non‑recursive
(container_client.walk_blobs) flows to assert that ordinary zero‑byte files
(size == 0 without hdi_isfolder metadata) are returned while ADLS directory
stubs (size == 0 with metadata hdi_isfolder=true) are excluded; create
fixtures/mocked BlobProperties and BlobPrefix items for both cases and assert
behavior for the functions that call _is_adls_directory and the two listing
branches (the block referencing container_client.list_blobs and the similar
block around container_client.walk_blobs, also mirrored in the second occurrence
around the 225-235 region).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: a1db58a2-5a33-4e8f-b3aa-b0a9b8d7c75c

📥 Commits

Reviewing files that changed from the base of the PR and between 2c209e7 and ce4bba6.

📒 Files selected for processing (1)
  • py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py

Comment thread py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py Outdated
@kyleknap

kyleknap commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Just more of a heads up on this PR @noa-neria @hasethuraman ... @wonwuakpa-msft is on my team and this PR is aimed to address this issue encountered when trying to load from Azure Blob Storage accounts that have Azure Data Lake Storage (ADLS) Gen 2 enabled on them. Effectively, vLLM + model streamer crashes currently because ADLS Gen 2 returns directories back that look like files when listing blobs and will attempt to write those directories as files. I'll take lead at reviewing this and let you both know when it is ready for review.

@kyleknap kyleknap left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. I just had a few smaller suggestions.

Looking more through the code, I'm thinking we can add tests in this file: https://github.com/run-ai/runai-model-streamer/blob/master/py/runai_model_streamer_azure/runai_model_streamer_azure/files/tests/test_files.py for now. Unfortunately, because azurite does not support ADLS: Azure/Azurite#553 we cannot necessarily spin up azurite enabled with ADLS to get the test coverage here. So, the next best option I'm thinking is we mock the BlobServiceClient and inject it into list_files to simulate the different test cases/behaviors of ADLS.

Comment thread py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py Outdated
Comment thread py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py Outdated
Comment thread py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py Outdated
Comment thread py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
py/runai_model_streamer_azure/runai_model_streamer_azure/files/tests/test_files.py (1)

60-72: ⚡ Quick win

Consider testing the capitalized "Hdi_isfolder" metadata variant.

The production code's _is_adls_directory function checks both hdi_isfolder and Hdi_isfolder (capitalized) metadata keys, but this test only covers the lowercase variant. Adding a test blob with {"Hdi_isfolder": "true"} would ensure complete coverage of the ADLS directory detection logic.

📋 Suggested test case addition

Add a test blob to the test_blobs list:

     test_blobs = [
         self.make_blob("file1.txt"),
         self.make_blob("file2.txt"),
         self.make_blob("dir1", size=0, metadata={"hdi_isfolder": "true"}),  # ADLS directory stub
+        self.make_blob("dir1_cap", size=0, metadata={"Hdi_isfolder": "true"}),  # ADLS directory stub (capitalized)
         self.make_blob("dir2"),
         self.make_blob("dir3/", size=0, metadata={"hdi_isfolder": "false"}),
     ]

The expected result remains ["file1.txt", "file2.txt", "dir2"] since dir1_cap should also be filtered out.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@py/runai_model_streamer_azure/runai_model_streamer_azure/files/tests/test_files.py`
around lines 60 - 72, Add a test blob exercising the capitalized ADLS metadata
key so the ADLS-directory detection logic is covered: in
test_listfiles_recursive add a blob via make_blob (e.g. name "dir1_cap" or
similar) with size=0 and metadata={"Hdi_isfolder":"true"} to the test_blobs
list, keep client = self.make_mock_blob_client(..., recursive=True) and the call
to files.list_files(..., recursive=True) unchanged, and keep the expected
assertion result as ["file1.txt", "file2.txt", "dir2"] so the new ADLS-stub is
also filtered out by the production _is_adls_directory logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@py/runai_model_streamer_azure/runai_model_streamer_azure/files/tests/test_files.py`:
- Around line 60-72: Add a test blob exercising the capitalized ADLS metadata
key so the ADLS-directory detection logic is covered: in
test_listfiles_recursive add a blob via make_blob (e.g. name "dir1_cap" or
similar) with size=0 and metadata={"Hdi_isfolder":"true"} to the test_blobs
list, keep client = self.make_mock_blob_client(..., recursive=True) and the call
to files.list_files(..., recursive=True) unchanged, and keep the expected
assertion result as ["file1.txt", "file2.txt", "dir2"] so the new ADLS-stub is
also filtered out by the production _is_adls_directory logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 08636b09-9d6d-4a87-8d5a-d4c47c089d25

📥 Commits

Reviewing files that changed from the base of the PR and between ebcf7ce and 63af8a3.

📒 Files selected for processing (2)
  • py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py
  • py/runai_model_streamer_azure/runai_model_streamer_azure/files/tests/test_files.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py

@kyleknap kyleknap left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I just had a few more smaller comments.

Comment thread py/runai_model_streamer_azure/runai_model_streamer_azure/files/files.py Outdated

@kyleknap kyleknap left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one last small comment. Otherwise, this looks good to me. I was also able to confirm that we are now able to load weights from an Azure storage account with ADLS Gen 2 enabled.

@kyleknap kyleknap left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 🚢 @noa-neria this should be ready for review on your end.

@noa-neria noa-neria left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks LGTM

@noa-neria noa-neria merged commit b296b12 into run-ai:master Jun 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants