Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 33 additions & 10 deletions src/url.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,35 @@
use base64::{prelude::BASE64_STANDARD, Engine};
use percent_encoding::percent_decode_str;
use percent_encoding::{percent_decode_str, percent_encode, AsciiSet, CONTROLS};
pub use url::Url;

use crate::core::{detect_media_type, parse_content_type};

pub const EMPTY_IMAGE_DATA_URL: &str = "data:image/png,\
%89PNG%0D%0A%1A%0A%00%00%00%0DIHDR%00%00%00%0D%00%00%00%0D%08%04%00%00%00%D8%E2%2C%F7%00%00%00%11IDATx%DAcd%C0%09%18G%A5%28%96%02%00%0A%F8%00%0E%CB%8A%EB%16%00%00%00%00IEND%AEB%60%82";
// https://datatracker.ietf.org/doc/html/rfc3986#section-2.2
const DATA_ESC: &AsciiSet = &CONTROLS
.add(b' ')
.add(b':')
.add(b'/')
.add(b'?')
.add(b'#')
.add(b'[')
.add(b']')
.add(b'@')
.add(b'!')
.add(b'$')
.add(b'&')
.add(b'\'')
.add(b'(')
.add(b')')
.add(b'*')
.add(b'+')
.add(b',')
.add(b';')
.add(b'=')
// make nesting and HTML embedding safe
.add(b'"')
.add(b'%');

pub fn clean_url(url: Url) -> Url {
let mut url = url.clone();
Expand Down Expand Up @@ -33,15 +57,14 @@ pub fn create_data_url(media_type: &str, charset: &str, data: &[u8], final_asset
"".to_string()
};

data_url.set_path(
format!(
"{}{};base64,{}",
media_type,
c,
BASE64_STANDARD.encode(data)
)
.as_str(),
);
let base64 = BASE64_STANDARD.encode(data);
let urlenc = percent_encode(data, DATA_ESC).to_string();

if urlenc.len() < base64.len() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your logic, but I worry that using base64 and percent-encoding every single asset will eat both more CPU and RAM, and I don't see any benefit of using base64 for plaintext data anyway, even if somehow it manages to be a few bytes shorter. I think the best way to go here is default to percent_encode for plaintext data, and use base64 for non-printable data URLs (fonts, non-SVG images, etc). There's a data type detector somewhere in this codebase, I think it's called "is_plaintext()", that should be enough to make this function here decide if it needs to be base64 or not. I also believe it's not necessarily about file size, but might be more about how much CPU time it takes to decode that into a blob, and something tells me base64 takes more than percent-encode, but I might be wrong. Last but not least, it's priceless to see for humans what's in the data URL without having to decode it, hence percent-encoding should be preferable here, not just because of shorter length of the data URL.

data_url.set_path(format!("{}{},{}", media_type, c, urlenc).as_str());
} else {
data_url.set_path(format!("{}{};base64,{}", media_type, c, base64).as_str());
}

data_url
}
Expand Down
6 changes: 3 additions & 3 deletions tests/cli/basic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -109,11 +109,11 @@ mod passing {

@charset "UTF-8";

@import "data:text/css;base64,Ym9keXtiYWNrZ3JvdW5kLWNvbG9yOiMwMDA7Y29sb3I6I2ZmZn0K";
@import "data:text/css,body{background-color%3A%23000%3Bcolor%3A%23fff}%0A";

@import url("data:text/css;base64,Ym9keXtiYWNrZ3JvdW5kLWNvbG9yOiMwMDA7Y29sb3I6I2ZmZn0K");
@import url("data:text/css,body{background-color%3A%23000%3Bcolor%3A%23fff}%0A");

@import url("data:text/css;base64,Ym9keXtiYWNrZ3JvdW5kLWNvbG9yOiMwMDA7Y29sb3I6I2ZmZn0K");
@import url("data:text/css,body{background-color%3A%23000%3Bcolor%3A%23fff}%0A");

</style>
<meta name="robots" content="none"></meta></head><body></body></html>
Expand Down
4 changes: 2 additions & 2 deletions tests/css/embed_css.rs
Original file line number Diff line number Diff line change
Expand Up @@ -175,9 +175,9 @@ mod passing {
"\
@charset \"UTF-8\";\n\
\n\
@import \"data:text/css;base64,aHRtbHtiYWNrZ3JvdW5kLWNvbG9yOiMwMDB9\";\n\
@import \"data:text/css,html{background-color%3A%23000}\";\n\
\n\
@import url(\"data:text/css;base64,aHRtbHtjb2xvcjojZmZmfQ==\")\n\
@import url(\"data:text/css,html{color%3A%23fff}\")\n\
"
);
}
Expand Down
4 changes: 2 additions & 2 deletions tests/session/retrieve_asset.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ mod passing {
assert_eq!(&charset, "US-ASCII");
assert_eq!(
url::create_data_url(&media_type, &charset, &data, &final_url),
Url::parse("data:text/html;base64,dGFyZ2V0").unwrap(),
Url::parse("data:text/html,target").unwrap(),
);
assert_eq!(
final_url,
Expand Down Expand Up @@ -70,7 +70,7 @@ mod passing {
.unwrap();
assert_eq!(&media_type, "text/javascript");
assert_eq!(&charset, "");
let data_url = "data:text/javascript;base64,ZG9jdW1lbnQuYm9keS5zdHlsZS5iYWNrZ3JvdW5kQ29sb3IgPSAiZ3JlZW4iOwpkb2N1bWVudC5ib2R5LnN0eWxlLmNvbG9yID0gInJlZCI7Cg==";
let data_url = "data:text/javascript,document.body.style.backgroundColor%20%3D%20%22green%22%3B%0Adocument.body.style.color%20%3D%20%22red%22%3B%0A";
assert_eq!(
url::create_data_url(&media_type, &charset, &data, &final_url),
Url::parse(data_url).unwrap()
Expand Down