Skip to content

Conversation

@DanAtkinson
Copy link
Contributor

@DanAtkinson DanAtkinson commented Oct 15, 2025

Hello

The idea of this PR is to provide a working example of being able to dictate how HtmlToOpenXml should handle images that are.

This is done by overriding ImageProcessing in the HtmlConverter constructor thus:

HtmlConverter htmlConverter = new(mainPart, new DefaultWebRequest())
{
    // Default value of ImageProcessingMode is ImageProcessingMode.Embed (the current behaviour).
    ImageProcessing = ImageProcessingMode.LinkExternal
};

I've also provided handling for data uris but my primary focus here has been to swap external images out so that they use Blip.Link instead of Blip.Embed. As stated above, the default behaviour remains the same (e.g. ImageProcessing.Embed) unless explicitly overridden.

Example implementation - taken from a modified form of examples\Demo\Program.cs

string inputFile = "C:\\ConvertedHtml.html";
string htmlContent = await File.ReadAllTextAsync(inputFile);

string outputPath = "C:\\ConvertedHtml.docx";
using WordprocessingDocument wordDoc = WordprocessingDocument.Create(outputPath, WordprocessingDocumentType.Document);
MainDocumentPart mainPart = wordDoc.AddMainDocumentPart();
mainPart.Document = new Document(new DocumentFormat.OpenXml.Wordprocessing.Body());

HtmlConverter htmlConverter = new(mainPart, new DefaultWebRequest())
{
    ImageProcessing = ImageProcessingMode.LinkExternal
};

await htmlConverter.ParseBody(htmlContent);

AssertThatOpenXmlDocumentIsValid(wordDoc);

mainPart.Document.Save();

During my tests, I have found that a Word document with 30 embedded external images has been reduced from 4MB to 10KB, so there is a sizeable reduction.

Further changes could be made later, for example providing additional rules for dictating which domains should/shouldn't be embedded, or perhaps providing a mechanism for HTML-level customisation of image processing (for example by use of an attribute such as <img src="..." data-imageProcessing="Embed" />

I also added 5 new unit tests (in ImgTests.cs) to support my changes and made some highlighted spelling corrections to some usages of asynchronous and intended. My apologies if some changes to formatting occurred. I use ReSharper and my Visual Studio formats the documents such as braces, imports, access modifiers, etc, and I've done my best to discard those changes in my PR.

Kind regards
Dan Atkinson - Atcore Technology Ltd

…dled during conversion.

The idea of this commit is to provide a working example of being able to dictate how HtmlToOpenXml should handle images marked as HTTP/HTTPS. I've also provided handling for data uris so that only data URI images can be embedded.

Example implementation - taken from a modified form of examples\Demo\Program.cs

    string inputFile = "C:\\ConvertedHtml.html";
    string htmlContent = await File.ReadAllTextAsync(inputFile);

    string outputPath = "C:\\ConvertedHtml.docx";
    using WordprocessingDocument wordDoc = WordprocessingDocument.Create(outputPath, WordprocessingDocumentType.Document);
    MainDocumentPart mainPart = wordDoc.AddMainDocumentPart();
    mainPart.Document = new Document(new DocumentFormat.OpenXml.Wordprocessing.Body());

    HtmlConverter htmlConverter = new(mainPart, new DefaultWebRequest())
    {
        ImageProcessing = ImageProcessingMode.LinkExternal
    };

    await htmlConverter.ParseBody(htmlContent);

    AssertThatOpenXmlDocumentIsValid(wordDoc);

    mainPart.Document.Save();
@sonarqubecloud
Copy link

@onizet
Copy link
Owner

onizet commented Oct 16, 2025

This is a delightful PR! Complete, with test cases, documented. That's very nice, thank you very much.
There is some small improvement, like src is not used inside ImageExpression.CreateBlip().
Or asserting that the HttpMock hasn't been called with LinkExternal.

By experience, I know that when I request some change in a PR, I never see back the author. So I'm going to merge this PR and do those minor changes myself.
Thank you for your valuable support.

@onizet onizet merged commit 10d0f9b into onizet:dev Oct 16, 2025
3 checks passed
onizet added a commit that referenced this pull request Oct 19, 2025
@onizet onizet mentioned this pull request Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants