--ignore-failures— continues processing when decryption fails; logs errors with message identification (UID, From, Date, Subject) and reports them in the summary; exit code is non-zero if any failures occurred--move-failures— moves failed messages to a.failedsibling folder (e.g.INBOX→INBOX.failed), creating the folder if needed; implies continuing on decryption errors--additional-privatekey/--additional-passphrase— repeatable options to specify multiple PEM key files; on decryption failure with the primary key, additional keys are tried in order if the error looks like a key-mismatch (heuristic based on openssl error message)- Unencrypted key support —
load_private_key()tries loading without a passphrase first; if the key is unencrypted, the passphrase argument is ignored - Message identification on errors —
extract_message_info()andformat_message_id()extract From, Date, Subject from headers for all error messages - Ctrl-C handling — see Ctrl-C / Signal Handling below
- Dryrun safety — dryrun mode makes no mailbox modifications at all: no APPEND, no STORE, no folder creation, no moves
- Skip
\Deletedmessages — messages already marked\Deleted(e.g. from a previous interrupted run) are skipped to allow safe re-runs --debugflag — prints timestamped trace output for every IMAP operation to diagnose performance issues--workers N— parallel decryption within each folder via dual-connection pipeline architecture (see Parallelism Architecture)--connections N— folder-level parallelism with independent IMAP connections (see Parallelism Architecture)- Background progress ticker — live throughput display every 3 seconds with active folder list when
--connections > 1 - Per-folder and overall throughput metrics — msg/s rate in progress output, per-folder breakdown, and wall-clock rate in summary
- Dual-connection pipeline — when
--workers > 1, each folder uses two IMAP connections: a reader (FETCH on readonly SELECT) and a writer (APPEND + batch STORE\Deleted). The reader never blocks on write operations, keeping the decrypt worker pool saturated. The writer batches\Deletedflags across 10 messages, reducing SELECT/UNSELECT cycles by 10×. Increases throughput from ~32 msg/s to ~67 msg/s with--workers 32.
Problem: When a folder is SELECTed via IMAP, Dovecot holds a file-level dotlock on the Maildir. Any APPEND to the same folder (even from the same connection or a separate connection) must acquire the same lock. This causes:
- Single connection: APPEND succeeds but the server sends unsolicited
* N EXISTS/* N RECENTnotifications. Python'simaplibaccumulates these in its internal_untagged_responsebuffer, eventually corrupting the response parser and causing subsequent commands (STORE, FETCH) to hang indefinitely. - Dual connection: The second connection's APPEND blocks waiting for the first connection's dotlock. Dovecot logs show
27.149 in lockswaits, dotlock overrides, and eventual disconnection of the main connection for inactivity.
Workaround: UNSELECT the folder before each APPEND, then re-SELECT for STORE. Per-message flow:
FETCH UID (RFC822) → while folder is SELECTed
decrypt + reconstruct → in memory
UNSELECT → releases dotlock, no EXPUNGE
APPEND decrypted message → no competing lock
SELECT folder → re-open for STORE
STORE +FLAGS (\Deleted) → mark original
(repeat for next message)
CLOSE → expunge all \Deleted at end
UIDs are persistent across UNSELECT/SELECT cycles so the pre-fetched UID list remains valid.
Rejected alternatives:
- Three-phase batch (FETCH all → UNSELECT → batch APPEND → SELECT → batch STORE): avoids lock contention but holds all decrypted messages in memory (risk of OOM for large folders) and if interrupted between APPEND and STORE phases, leaves duplicates without originals marked
\Deleted. - Per-message approach is safer: each message is fully processed (APPEND + STORE) before moving to the next, so interruption leaves at most one duplicate which is handled by the
\Deletedskip logic on re-run.
Impact: Extra UNSELECT + SELECT per message adds ~1ms overhead. CLOSE at end expunges all \Deleted messages.
Problem: Python's imaplib did not properly consume unsolicited server responses (* EXISTS, * RECENT, * EXPUNGE, * FLAGS). These accumulated in imaplib._untagged_response and corrupted tagged response matching for subsequent commands.
Resolution: No longer applicable — imapclient handles unsolicited responses correctly. The UNSELECT-before-APPEND pattern (issue #1) is retained because it also addresses Dovecot dotlock contention independently of the IMAP library.
Problem: The original imaplib-based code conditionally included date_time in the argument list. When internaldate was None, only 3 arguments were passed — final_message (bytes) was interpreted as the date_time parameter, causing a TypeError.
Resolution: No longer applicable — imapclient.append() uses keyword arguments (flags=, msg_time=) so argument ordering issues cannot occur.
Problem: If the script is interrupted (Ctrl-C, crash, or hung connection) after APPEND but before the user runs EXPUNGE, the folder contains both the decrypted copy and the original (marked \Deleted). On the next run, the original would be decrypted again, creating a duplicate.
Fix: Messages with \Deleted in their flags are skipped. Additionally, \Deleted is stripped from flags when APPENDing decrypted copies so the new message doesn't inherit the delete marker.
Problem: The decrypted APPEND was copying all original flags including \Deleted, so the new decrypted message was immediately marked for deletion.
Fix: \Deleted is filtered out of the flags list before APPEND.
Problem: Dovecot rejects APPEND commands that include the \Recent system flag: BAD [Error in IMAP command APPEND: Invalid system flag \RECENT]. Per RFC 3501, \Recent is a server-managed flag — only the server can set it; clients cannot include it in APPEND.
Fix: Both APPEND paths (main decrypt and move_message_to_failed()) now filter out \Recent from the flags list before building the APPEND flags string.
Problem: The mail is stored on /Volumes/Media/ which is bind-mounted into Docker via VirtioFS. Dotlock operations (create → link → unlink) traverse Container → Linux VM → VirtioFS → macOS → external volume, making metadata operations very slow. This causes "dotlock was overridden (locked 0 secs ago)" warnings and ~3s stalls per lock contention — even with no indexer-worker involved.
The dovecot-uidlist.lock is always a dotlock regardless of the lock_method setting (which was already fcntl). This is hardcoded in Dovecot's Maildir implementation.
Fix: Move index and control files (which contain the dotlock files) to the container's native filesystem by setting mail_index_path and mail_control_path in dovecot.conf:
mail_index_path = /tmp/dovecot-index/%{user | lower}
mail_control_path = /tmp/dovecot-control/%{user | lower}
This keeps the actual mail on the bind-mounted volume but puts all lock/index operations on fast native ext4 inside the container.
Problem: After APPEND, Dovecot's indexer-worker fires asynchronously to index the new message (triggered by fts_autoindex = yes in the vendor FTS config). The indexer-worker and the IMAP process race on dovecot-uidlist.lock, causing "Our dotlock file … was overridden" warnings and "dotlock was immediately recreated under us" errors.
Attempted client-side mitigations (all insufficient):
time.sleep()after STORE — indexer can take unpredictable time- UNSELECT instead of CLOSE — avoids EXPUNGE but doesn't prevent indexer triggering on APPEND
- Two-phase (APPEND then STORE) — indexer still races during APPEND phase
- Three-phase (FETCH all, batch APPEND, batch STORE) — indexer still races between consecutive APPENDs
Root cause: fts_autoindex = yes in the vendor FTS config triggers the indexer-worker on every APPEND. This is a server-side issue that cannot be solved client-side.
Fix: Disable FTS auto-indexing in dovecot.conf by adding fts_autoindex = no after the !include_try directives to override the vendor default. The indexer can be triggered manually after migration is complete (doveadm index).
Note: process_limit = 0 for service indexer-worker was attempted but Dovecot 2.4.2 rejects it: process_limit must be higher than 0.
Problem: Python's ThreadPoolExecutor registers an atexit handler (_python_exit()) that calls thread.join() on all worker threads. This means sys.exit() blocks indefinitely when pool threads are still running — even after shutdown(wait=False).
Solution — three layers:
-
First Ctrl-C —
_handle_sigint()sets_interruptedflag viaset_interrupted(). All processing loops check this flag and stop after completing the current in-progress message. PendingThreadPoolExecutorfutures are cancelled. -
Second Ctrl-C — calls
os._exit(130)to terminate immediately, bypassing atexit handlers and stuck thread joins. -
Normal exit —
main()usesos._exit(exit_code)instead ofsys.exit()to avoid blocking on atexit handlers from lingering thread pool threads (both folder-level and inner decrypt-worker pools).
Additional measures for --connections > 1:
- Folder-level pool uses explicit
pool.shutdown(wait=False, cancel_futures=True)instead of context manager (with ThreadPoolExecutor()callsshutdown(wait=True)in__exit__) - Each
_process_one_folder()worker checksis_interrupted()at the very top before connecting to IMAP, so queued futures bail out immediately - Inner decrypt worker pools in
_process_parallel()also usepool.shutdown(wait=False)
Problem: With --connections > 1, multiple threads writing per-message \r progress updates garble the terminal output. Interleaved Processing: headers and result lines from different folders are hard to read.
Solution — quiet_progress flag:
When --connections > 1, process_folder() receives quiet_progress=True which suppresses:
- Per-message
\rprogress updates (e.g.[25/238] 28.6 msg/s — UID 25: decrypted) - "Processing N encrypted messages with M workers ..." banner
- "Stopping early due to interrupt" messages (one per connection)
- Final
print(flush=True)newline after\rprogress
What IS shown with --connections > 1:
Processing: FolderName ...header for each folder (thread-safe via_print_lock)- Per-folder result line showing total messages and encrypted count for every folder (plus decrypted count and msg/s rate for folders with encrypted messages)
- Background ticker every 3 seconds showing aggregate throughput and active folder names
- Error messages for decryption failures
- Summary with wall-clock time, overall rate, per-connection rate, and per-folder breakdown for all processed folders
Problem: The background progress ticker needs to show which folders are actively being processed. Naively tracking from the start of _process_one_folder() shows folders that are still connecting or scanning (no encrypted messages) as "active".
Solution: The on_decrypt_start callback in process_folder() is invoked only when encrypted messages are found and decryption is about to begin, passing the encrypted count. decrypt-smime.py passes on_decrypt_start=lambda enc: _add_active_folder(display_name, enc) so the folder only appears in the active dict during actual decryption work, along with its total encrypted count for progress tracking. _remove_active_folder() in the finally block removes it when done (safe no-op if never added).
Problem: With --connections > 1, per-folder scan counts (total messages, encrypted count) were only printed after the entire folder finished processing. Folders with encrypted messages that took a long time to decrypt would show in the [active:] ticker but with no context about how many messages they had. This made it unclear whether a folder had work to do or was just slow.
Solution: The on_scan_complete callback in process_folder() is invoked immediately after the scan phase with (total_messages, encrypted_count). _process_one_folder() prints scan counts right away (e.g. Archives/2012: 2742 messages, 126 encrypted), then prints a separate decrypt result line when the folder finishes (e.g. Archives/2012: 124 decrypted, 19.7 msg/s). This gives immediate visibility into which folders have encrypted messages and how many, before decryption even begins.
Problem: Python's imaplib.IMAP4.append() passed the mailbox name directly to the IMAP command without quoting. For folders with spaces (e.g. My Folder), the server received APPEND My Folder (flags) ... and parsed My as the mailbox and Folder as the next argument, returning [TRYCREATE] Mailbox doesn't exist: My.
Resolution: No longer applicable — imapclient quotes folder names correctly in all operations including append(), select_folder(), and list_folders().
Problem: Python 3.12 introduced strict RFC 5322 validation in email.policy.default. When parsing message headers with this policy, accessing address fields (From, To, etc.) that contain CR or LF characters from folded headers raises ValueError: invalid arguments; address parts cannot contain CR or LF. This caused ERROR processing INBOX: invalid arguments; address parts cannot contain CR or LF when extract_message_info() or is_smime_encrypted() accessed headers on real-world messages with non-standard folding.
Fix: Switched both is_smime_encrypted() and extract_message_info() from email.policy.default to email.policy.compat32, which does not enforce strict address validation. Additionally, extract_message_info() now wraps each header access in try/except so that even if an individual header is malformed, the other fields are still extracted and processing continues with an <invalid {field} header> placeholder.
Note: The planned refactoring item "Modernise email.policy" (switch to email.policy.default) has been cancelled — compat32 is required for compatibility with real-world mail.
Problem: Some older S/MIME encrypted messages (observed on 2012-era emails from pragmaticbookshelf.com) fail decryption with:
openssl cms -decrypt failed: Error reading SMIME Content Info
error:068000D1:asn1 encoding routines:SMIME_read_ASN1_ex:no content type:crypto/asn1/asn_mime.c:422:
OpenSSL's SMIME reader (SMIME_read_ASN1_ex) fails to parse the Content-Type header from the full RFC822 message. This can be caused by transport headers (long Received chains, unusual folding) confusing the parser, or by older S/MIME implementations using non-standard MIME formatting. The messages display fine in Thunderbird because Thunderbird extracts the PKCS7 payload directly rather than relying on OpenSSL's SMIME parser.
Fix: decrypt_smime_message() now uses a three-strategy fallback:
- Full message as SMIME (
-inform SMIME): Original behaviour, works for most messages. - Minimal SMIME wrapper (
_build_minimal_smime()): Strips all transport/envelope headers (Received, Return-Path, DKIM, etc.) and keeps onlyMIME-Version,Content-Type,Content-Transfer-Encoding, andContent-Disposition— the only headers OpenSSL's SMIME reader needs. This fixes cases where extra headers confuseSMIME_read_ASN1_ex. - Raw DER payload (
_extract_pkcs7_der()): Extracts the PKCS7 binary payload by parsing the email with Python'semailmodule (get_payload(decode=True)handles base64 decoding), then passes the raw DER bytes toopenssl cms -decrypt -inform DER. This bypasses OpenSSL's MIME parsing entirely, similar to how Thunderbird handles it.
The fallback only triggers on "content type" / "no content" errors. Other errors (wrong key, bad decrypt, corrupted data) propagate immediately without attempting fallback strategies. The shared _run_openssl_decrypt() helper eliminates code duplication across all three strategies.
The following changes to dovecot.conf are required for the decryption tool to work efficiently:
# Move index/control files to container-native filesystem (issues #7)
mail_index_path = /tmp/dovecot-index/%{user | lower}
mail_control_path = /tmp/dovecot-control/%{user | lower}
# Disable FTS auto-indexing (issue #8) — must be AFTER !include_try
fts_autoindex = noAfter making these changes, restart Dovecot: docker compose restart dovecot
Motivation: The single-file decrypt-smime.py grew to ~1170 lines, making it difficult to add parallelism and reason about individual concerns. Folders with thousands of messages need parallel decryption, so the architecture cleanly separates IMAP I/O from CPU/subprocess-bound work.
New structure:
| File | Lines | Responsibility |
|---|---|---|
decrypt-smime.py |
~490 | Entry point: signal handling, folder-level parallelism, progress ticker, summary |
smime/cli.py |
~60 | argparse definitions including --workers and --connections |
smime/imap.py |
~150 | All imapclient interaction: connect, login, folder ops, flag utilities, batch operations |
smime/crypto.py |
~450 | Key loading, S/MIME detection, openssl cms decryption (with SMIME/DER fallback), message reconstruction — thread-safe |
smime/processor.py |
~760 | Folder scanning, sequential and pipeline-parallel processing, IMAP replace/move, global decrypted counter |
Thread safety design:
smime/crypto.pyfunctions are thread-safe (no IMAP I/O, onlyopensslsubprocesses and in-memory operations)smime/imap.pyfunctions are NOT thread-safe (all use singleimapclientconnection)- Each parallel folder gets its own pair of
IMAPClientinstances (reader + writer)
Two-level parallelism:
-
--connections N— folder-level parallelism: N folders processed simultaneously, each on its own pair of IMAP connections. Safe because Dovecot dotlocks are per-folder, so different folders have independent locks.- Folders submitted incrementally (not all at once) so Ctrl-C stops new submissions immediately
- Completed futures batch-drained to keep pool saturated and active-folder list accurate
- Background ticker thread prints aggregate throughput every 3 seconds
-
--workers N— within each folder, a dual-connection pipeline separates read and write I/O:- Reader (connection 1, readonly SELECT): continuously FETCHes messages → submits to
ThreadPoolExecutorfor decryption. Never blocks on write operations. - Workers (thread pool): up to N
openssl cms -decryptsubprocesses run concurrently. - Writer (connection 2, dedicated thread): consumes completed decryptions from a queue → APPENDs each decrypted message → batch-STOREs
\Deletedon original UIDs every 10 messages to amortise SELECT/UNSELECT overhead. - Memory bounded to ~
workers + batch_sizefull messages per folder. - In
--dryrunmode, no writer connection is opened — falls back to single-connection parallel path.
- Reader (connection 1, readonly SELECT): continuously FETCHes messages → submits to
Both levels can be combined: --connections 5 --workers 32 runs 5 folders in parallel, each with 2 IMAP connections and 32 decrypt workers.
Throughput metrics:
- Background ticker every 3s with per-folder progress:
⏱ 253 decrypted, 9s elapsed, 28.1 msg/s [Archives/2012 50/126, Sent 200/10721] - Per-folder scan result (immediate):
Drafts: 112 messages, 0 encryptedorSent: 11487 messages, 10721 encrypted - Per-folder decrypt result (on completion):
Sent: 10721 decrypted, 33.5 msg/s - Summary per-folder breakdown shows all processed folders with total + encrypted counts
Performance observed (Dovecot 2.4.2 on Docker with VirtioFS, Mac mini M4):
| Configuration | Rate | Notes |
|---|---|---|
--workers 1 (sequential) |
~4.3 msg/s | Baseline, single connection |
--workers 10 |
~10 msg/s | 2.3× speedup (single-connection pipeline) |
--workers 32 |
~32 msg/s | 7.4× speedup (single-connection pipeline) |
--workers 32 (dual-conn pipeline) |
~67 msg/s | 15.6× speedup, reader never blocked |
--connections 5 --workers 32 |
~47 msg/s | Folder-level parallelism (pre-pipeline) |
The previous bottleneck at higher worker counts was the sequential IMAP replace phase on the same connection (UNSELECT → APPEND → SELECT → STORE per message, ~30-230ms each). The dual-connection pipeline eliminates this by running FETCH on a readonly reader connection while a dedicated writer thread handles APPEND + batch STORE on a separate connection. The writer batches \Deleted flags across 10 messages, reducing SELECT/UNSELECT cycles by 10×.
A thread-safe global counter in smime/processor.py (_global_decrypted with threading.Lock) is incremented at every successful decryption across all connections. This powers the background ticker in decrypt-smime.py without requiring cross-thread communication of per-folder results.
Functions: _increment_global_decrypted(), get_global_decrypted(), reset_global_decrypted().
When --connections > 1, a daemon thread _progress_ticker() prints aggregate throughput every 3 seconds with per-folder progress:
⏱ 253 decrypted, 9s elapsed, 28.1 msg/s [Archives/2012 50/126, Sent 200/10721]
Each active folder shows decrypted/total so you can see individual folder progress and identify which folders are making headway. The active folder dict is tracked via _active_folders with callbacks:
on_decrypt_start— adds folder with encrypted count when decryption beginson_message_decrypted— increments per-folder decrypted counter after each successful decrypt_remove_active_folder()in thefinallyblock removes the folder when done
The ticker is started before the folder pool and stopped in a finally block via _progress_stop threading Event. It uses _print_lock for thread-safe output.
See plans/refactor-smime-plan.md for the original phased implementation plan.
The original single-file decrypt-smime.py grew to ~1170 lines with significant duplication, manual IMAP response parsing via imaplib, and ad-hoc data structures. The refactoring simplified the codebase using imapclient, functional programming idioms, and Python standard library features.
The single biggest simplification. imapclient eliminated ~150 lines of manual IMAP response parsing:
parse_list_response()→ replaced byclient.list_folders()decode_modified_utf7()→ handled transparently by imapclientextract_flags_from_fetch(),extract_uid_from_fetch(),extract_internaldate_from_fetch()→ FETCH returns pre-parsed dicts with typed valuesformat_imap_flags()→imapclient.append()accepts flag lists natively- Folder quoting workarounds → imapclient quotes correctly (resolved Known Issues #2, #3, #13)
Replaced ad-hoc dicts with MessageRecord @dataclass. Provides IDE autocompletion, eliminates dict-key typo risks, and formalises the label field.
| Pattern | Location | Result |
|---|---|---|
| filter+map | scan_folder() |
filter(None, map(_parse_item, data)) replaces while-loop |
| List comprehensions | filter_encrypted() |
Two comprehensions replace manual loop+counter |
itertools.chain |
reconstruct_message() |
Three header assembly loops collapsed into one chain |
Precomputed frozenset |
_ENVELOPE_LOWER, _OVERRIDE_LOWER |
O(1) set lookup replaces O(n) list scan |
TemporaryDirectory |
decrypt_smime_message() |
Auto-cleaned temp dir replaces manual lifecycle |
| Dict comprehension | reconstruct_message() override_map |
Walrus operator dict comprehension |
| Pattern | Location | Result |
|---|---|---|
| Shared error handler | _handle_message_outcome() |
Extracted ~60 lines of shared decision tree |
Shared clean_flags() |
clean_flags() |
Extracted to smime/imap.py utility |
_submit_next() helper |
_submit_next() |
Three identical copies → one function |
_accumulate() helper |
_accumulate() |
Two identical result-accumulation blocks → one function |
Switch from email.policy.compat32 to email.policy.default in smime/crypto.py for cleaner header access.
Cancelled — see Known Issue #14. email.policy.default enforces strict RFC 5322 validation on address headers, which fails on real-world messages containing CR/LF in folded headers. The compat32 policy is required for compatibility.