fix: do not split "B"-encoded-words at UTF-8 char boundaries by link2xt · Pull Request #42 · stalwartlabs/mail-builder

link2xt · 2025-07-27T00:11:09Z

Fixes #40

link2xt · 2025-07-27T00:12:26Z

I also tested that it fixes the problem in chatmail/core#7039

link2xt · 2025-08-07T14:44:05Z

@mdecimus Could you review this? This is the remaining piece of fixing the issue with long unicode group names in Delta Chat: chatmail/core#7039

src/headers/text.rs

mdecimus · 2025-08-08T15:39:33Z

src/headers/text.rs

+                        let chunk = self.text.as_bytes().get(last_pos..pos).unwrap_or_default();
+                        base64_encode_mime(chunk, &mut output, true)?;
+
+                        output.write_all(b"?=\r\n\t=?utf-8?B?")?;


There is some code duplication here, =?utf-8?B? is being written here and also on line 47.
I prefer an approach using an iterator that returns the bytes at their correct offest which are then encoded and wrapped around =?utf-8?B? and ?=.
In fact, I don't think that a custom iterator is needed at all, you could use char_indices (and avoid having to do (ch as i8) >= -0x40), keep track of the offsets and then print the encoded word once you reached the right size.

I force-pushed the change, now using char_indices.

mdecimus · 2025-08-10T15:19:40Z

I made some changes to the B-encoding function, can you check that it works for you so I can release?

link2xt · 2025-08-11T09:57:56Z

I tested with commit 2ec9d02, it works.

Why remove debug_assert though?

-        // There is always a header or continuation whitespace before inline text.
-        debug_assert!(bytes_written > 0);
-

If the function is called with bytes_written equal to 0, it starts the header with \t. This should not happen, so debug_assert ensured that this kind of calls will not be introduced in the future.

link2xt mentioned this pull request Jul 27, 2025

Text headers break UTF-8 mid-character #40

Closed

mdecimus reviewed Aug 8, 2025

View reviewed changes

fix: do not split "B"-encoded-words at UTF-8 char boundaries

069efe1

link2xt force-pushed the link2xt/b-encoded-words-utf8-split branch from 1cfb664 to 069efe1 Compare August 8, 2025 23:39

mdecimus mentioned this pull request Aug 9, 2025

Fix latest Q-encoded-words missing question mark (#46) #48

Merged

mdecimus merged commit 177fa27 into stalwartlabs:main Aug 10, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: do not split "B"-encoded-words at UTF-8 char boundaries#42

fix: do not split "B"-encoded-words at UTF-8 char boundaries#42
mdecimus merged 1 commit intostalwartlabs:mainfrom
link2xt:link2xt/b-encoded-words-utf8-split

link2xt commented Jul 27, 2025

Uh oh!

link2xt commented Jul 27, 2025

Uh oh!

link2xt commented Aug 7, 2025

Uh oh!

Uh oh!

mdecimus Aug 8, 2025

Uh oh!

link2xt Aug 8, 2025

Uh oh!

Uh oh!

mdecimus commented Aug 10, 2025

Uh oh!

link2xt commented Aug 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

link2xt commented Jul 27, 2025

Uh oh!

link2xt commented Jul 27, 2025

Uh oh!

link2xt commented Aug 7, 2025

Uh oh!

Uh oh!

mdecimus Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

link2xt Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mdecimus commented Aug 10, 2025

Uh oh!

link2xt commented Aug 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants