OnRemovingTag not called for malformed or obfuscated tags (due to AngleSharp skipping them) #571
arunsurfer
started this conversation in
General
Replies: 3 comments
-
|
I can't repro. RemovingTag is getting called. Can you provide a short code snippet that illustrates the issue? |
Beta Was this translation helpful? Give feedback.
0 replies
-
public class HtmlSanitizerWrapper
{
private HtmlSanitizer _htmlSanitizer;
private bool _inputHtmlModified = false;
public HtmlSanitizerWrapper()
{
_htmlSanitizer = new HtmlSanitizer();
_htmlSanitizer.RemovingTag += OnRemovingTag;
_htmlSanitizer.RemovingAttribute += OnRemovingAttribute;
_htmlSanitizer.RemovingStyle += OnRemovingStyle;
}
public (string sanitizedHtml, bool _inputHtmlModified) SanitizeHtml(string html)
{
var sanitizedHtml = _htmlSanitizer.Sanitize(html);
return (sanitizedHtml, _inputHtmlModified) ;
}
private void OnRemovingTag(object sender, RemovingTagEventArgs eventArgs)
{
_inputHtmlModified = true;
}
private void OnRemovingAttribute(object sender, RemovingAttributeEventArgs eventArgs)
{
_inputHtmlModified = true;
}
private void OnRemovingStyle(object sender, RemovingStyleEventArgs eventArgs)
{
_inputHtmlModified = true;
}
}
public class HtmlSanitizerWrapperTests
{
[TestMethod]
[DataRow("%22INJECT+HERE%22TEST<object+d\r\nata=\"javascript:alert%26%230000000040%271%27%29\"", true)]
[DataRow("Test<object+data=\"javascript:alert%26%230000000040%271%27%29\"", true)]
public void SanitizeHtml_ShouldReturnExpectedResults(string inputHtml, bool expectedModification)
{
// Arrange
var sanitizer = new HtmlSanitizerWrapper();
// Act
var (sanitizedHtml, inputHtmlModified) = sanitizer.SanitizeHtml(inputHtml);
// Assert
Assert.AreEqual(expectedModification, inputHtmlModified);
}
} |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
The input strings are different here than in the first comment. In particular, the closing |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, Thanks for the awesome library. It been extremely helpful for keeping our input safe. I wanted to raise a concern about how malformed or obfuscated html tags (e.g <sc++ipt>, <ob+ect>,<object+data> or special charters like <script>) are being silently skipped by Anglesharp during parsing
When content contains malformed tags, Anglesharp often discards them as text, because of this
example : "TEST"<obj+ect data="javascript:alert('1')">
This content is:
Would it be possible to:
Expose malformed or skipped tags as part of the sanitization pipeline?
I understand this may be a limitation of AngleSharp’s compliant parsing — but even a utility hook or helper in HtmlSanitizer to pre-scan for malformed tags would be hugely helpful.
Thanks again for your amazing work. Happy to help test or contribute if this direction is supported.
Beta Was this translation helpful? Give feedback.
All reactions