Summary
The <base> tag passes through the default Cleaner configuration. While page_structure=True removes html, head, and title tags, there is no specific handling for <base>, allowing an attacker to inject it and hijack relative links on the page.
Details
The <base> tag is not currently in the page_structure kill set. Even though the specification says <base> must be inside <head>, browsers accept <base> tags outside of the head.
If an attacker injects a <base> tag, it changes the base URL for all relative URLs on the page (links, images, scripts) to a domain controlled by the attacker.
PoC
from lxml_html_clean import clean_html
# The base tag is preserved in the output
result = clean_html('<base href="http://evil.com/"><a href="/account">Account</a>')
print(result)
# Output: <div><base href="http://evil.com/">...<a href="/account">Account</a></div>
Impact
The injection of a <base> tag allows an attacker to hijack the resolution of all relative URLs on the page. This results in three critical attack vectors:
- Phishing & Redirection: Attackers can redirect user navigation (e.g.,
<a href="/login">) and form submissions (e.g., <form action="/auth">) to an attacker-controlled domain, effectively stealing credentials or sensitive data without the user realizing they have left the legitimate site.
- Cross-Site Scripting (XSS): If the victim application loads JavaScript files using relative paths (e.g.,
<script src="assets/app.js">), the browser will attempt to fetch the script from the attacker's domain. This upgrades the vulnerability from HTML injection to full Stored XSS.
- Defacement: Relative references to images (
<img>) and stylesheets (<link>) will be loaded from the attacker's server, allowing for UI redressing or defacement.
References
Summary
The
<base>tag passes through the defaultCleanerconfiguration. Whilepage_structure=Trueremoveshtml,head, andtitletags, there is no specific handling for<base>, allowing an attacker to inject it and hijack relative links on the page.Details
The
<base>tag is not currently in thepage_structurekill set. Even though the specification says<base>must be inside<head>, browsers accept<base>tags outside of the head.If an attacker injects a
<base>tag, it changes the base URL for all relative URLs on the page (links, images, scripts) to a domain controlled by the attacker.PoC
Impact
The injection of a
<base>tag allows an attacker to hijack the resolution of all relative URLs on the page. This results in three critical attack vectors:<a href="/login">) and form submissions (e.g.,<form action="/auth">) to an attacker-controlled domain, effectively stealing credentials or sensitive data without the user realizing they have left the legitimate site.<script src="assets/app.js">), the browser will attempt to fetch the script from the attacker's domain. This upgrades the vulnerability from HTML injection to full Stored XSS.<img>) and stylesheets (<link>) will be loaded from the attacker's server, allowing for UI redressing or defacement.References