Bug report: file_exclusions CSV corruption + request for unlimited import size #1334
Txindoki666
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I've been having trouble indexing certain folders and files with SC for a while now. I've been using Claude for several days to fix the issue, and I think it's a bug. I asked him to write a brief technical report on how he fixed it, though it's a temporary fix and I'm not sure if it will still work after an update. Basically, I’m trying to index very large notes and very small ones that make a lot of sense in my workflow. In the process, Claude found the bug that was preventing the indexing of normal notes.
Thank you very much.
Here is Claude’s message:
Hello,
I'd like to report a bug affecting Smart Connections indexing and make a related feature request.
BUG: file_exclusions CSV corruption when file paths contain commas
Smart Connections stores file exclusions as a comma-separated string in smart_env.json:
smart_sources.file_exclusions: "path/to/file,another/path"
The parsing logic (main.js ~line 7027) splits this field by comma and appends ** to each token, creating glob patterns:
csv.split(",").map(file =>
${file}**)The problem: file paths that contain commas — specifically, files inside folders whose names include commas, or files whose own names include commas (e.g. academic references with comma-separated author lists) — get fragmented when stored in this field.
Example: a file at path:
"Arts, cinema, photography/Author Name, Year.md"
when added to file_exclusions, produces these tokens after splitting:
In my vault of 8,212 notes, this caused two entire folders (1,161 notes total) to be silently excluded from the index. Smart Connections reported 100% embedding coverage but was only seeing ~90% of the vault. There were no warnings or errors — the notes simply never appeared in the embed queue.
The secondary mechanism that compounds this: the auto_excluded_files list (populated at list_recursive, line ~2014, for paths > 200 chars) is merged into file_exclusions on every save (line ~1342). This means the corruption regenerates automatically on each Obsidian session, making it impossible to fix by simply editing smart_env.json.
Workaround I applied: renamed the affected folders to remove commas from their names. This resolved the folder-level exclusion, but files with commas in their own names (common in academic note titles with author lists) remain at risk if they ever end up in file_exclusions for any reason.
Suggested fix: replace the comma separator in file_exclusions with a character that cannot appear in file paths (e.g. newline \n, pipe |, or null byte), or store exclusions as a JSON array instead of a CSV string.
FEATURE REQUEST: remove or greatly increase the max_import_size limit
The current default max_import_size of 1,000,000 bytes (~976 KB) silently excludes large notes — particularly web clips, academic papers, and reference documents — without any indication in the UI that they are being skipped. The only way to discover this is through the DevTools console ("Skipping large file" warnings).
Request: either remove the upper size limit entirely, or set a much higher default (e.g. 10–50 MB), and add a visible notification in the plugin UI when files are skipped due to size.
Thank you for the excellent plugin. These two changes would significantly improve reliability for users with large academic vaults.
System: Linux Mint 22.3/Windows 11, Obsidian 1.12.7, Smart Connections 4.5.0, Intel i5-4460 (no GPU), 16 GB RAM, 8,212 notes.
Additional finding: The auto_excluded_files merge mechanism (main.js ~line 1342) makes it impossible to permanently fix file_exclusions by editing smart_env.json while the vault contains any file whose path exceeds 200 characters. On every Obsidian session, SC resets auto_excluded_files = [] (line ~1618), repopulates it during list_recursive, then merges it back into file_exclusions on save. If any of those paths contain commas (possible in path components before the 200-char cutoff), the corruption regenerates automatically. The only permanent workaround available to users is renaming folders and files to remove commas from their names.
Beta Was this translation helpful? Give feedback.
All reactions