Skip to content

Add file type validation#13802

Open
spider-yamet wants to merge 4 commits intoinfiniflow:mainfrom
spider-yamet:fix/validation-file-type
Open

Add file type validation#13802
spider-yamet wants to merge 4 commits intoinfiniflow:mainfrom
spider-yamet:fix/validation-file-type

Conversation

@spider-yamet
Copy link
Copy Markdown
Contributor

@spider-yamet spider-yamet commented Mar 26, 2026

What problem does this PR solve?

This PR fixes WebDAV sync behavior for unsupported file types (#13795).

Previously, the WebDAV connector selected files primarily by modified time (and size threshold) and could still pass unsupported extensions into the download/document-generation path. This caused unnecessary processing and inconsistent behavior compared with connectors that validate file type earlier.

This change adds extension validation in two places:

  1. Early filter during recursive listing to skip unsupported files before they enter the download flow.
  2. Defensive filter before download/document creation to prevent unsupported files from being processed if any listing edge case slips through.

It also wires allow_images into the WebDAV sync path so image extension handling follows connector policy.

Scope is intentionally limited to WebDAV for a focused bug-fix PR.

Type of change

  • Bug Fix (non-breaking change which fixes an issue)

How was this tested?

  • Manual verification with mixed file types under the configured WebDAV path:
    • supported: .pdf, .txt, .md
    • unsupported: .exe, .bin, .dat
  • Triggered full sync and polling sync.
  • Confirmed unsupported files are skipped before download.
  • Confirmed supported files are still indexed normally.
  • Confirmed image handling follows allow_images setting.

Fixes: #13795

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. 🐞 bug Something isn't working, pull request that fix bug. labels Mar 26, 2026
@spider-yamet
Copy link
Copy Markdown
Contributor Author

@yingfeng Would love to hear your opinion on this PR. Thanks

@Magicbook1108 Magicbook1108 added the ci Continue Integration label Mar 26, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.72%. Comparing base (cb78ce0) to head (94cabba).

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #13802      +/-   ##
==========================================
- Coverage   98.14%   96.72%   -1.43%     
==========================================
  Files          10       10              
  Lines         702      702              
  Branches      112      112              
==========================================
- Hits          689      679      -10     
- Misses          3        5       +2     
- Partials       10       18       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@spider-yamet
Copy link
Copy Markdown
Contributor Author

Would appreciate your feedback @Magicbook1108 @yingfeng :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐞 bug Something isn't working, pull request that fix bug. ci Continue Integration size:S This PR changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: WebDAV sync does not filter unsupported files before processing

2 participants