Skip to content

Commit 33bc1bb

Browse files
committed
AG-36910 Improve 'href-sanitizer' — add 'removeParam' and 'removeHash' values in 'transform' option.
Squashed commit of the following: commit 737dda8 Author: jellizaveta <[email protected]> Date: Mon Oct 28 20:26:25 2024 +0300 add comment commit da34a66 Author: jellizaveta <[email protected]> Date: Mon Oct 28 20:04:41 2024 +0300 update script commit f3c3616 Author: jellizaveta <[email protected]> Date: Mon Oct 28 19:49:42 2024 +0300 fix comments, update script commit 99439d6 Author: jellizaveta <[email protected]> Date: Mon Oct 28 15:45:40 2024 +0300 refactor commit bafa849 Author: jellizaveta <[email protected]> Date: Mon Oct 28 15:16:15 2024 +0300 update compatibility table commit 83e1d96 Merge: 7fb9cf0 e4cb5f3 Author: jellizaveta <[email protected]> Date: Mon Oct 28 15:05:16 2024 +0300 Merge branch 'fix/AG-36910' of ssh://bit.int.agrd.dev:7999/adguard-filters/scriptlets into fix/AG-36910 commit 7fb9cf0 Author: jellizaveta <[email protected]> Date: Mon Oct 28 15:01:33 2024 +0300 update var names, docs, conditions commit c601f8d Author: jellizaveta <[email protected]> Date: Fri Oct 25 20:37:26 2024 +0300 update docs commit ff6f047 Author: jellizaveta <[email protected]> Date: Fri Oct 25 20:31:22 2024 +0300 moved the calculations inside the function commit e4cb5f3 Author: Slava Leleka <[email protected]> Date: Fri Oct 25 20:10:51 2024 +0300 src/scriptlets/href-sanitizer.ts edited online with Bitbucket commit 9736333 Author: jellizaveta <[email protected]> Date: Fri Oct 25 19:59:04 2024 +0300 fix docs commit 1ecfacb Merge: 5ccadb6 a875fdf Author: jellizaveta <[email protected]> Date: Fri Oct 25 19:47:47 2024 +0300 merge master, resolve conflicts commit 5ccadb6 Author: jellizaveta <[email protected]> Date: Fri Oct 25 19:37:30 2024 +0300 AG-36910 Improve 'href-sanitizer' — add 'removeParam' and 'removeHash' values in 'transform' option. #460
1 parent a875fdf commit 33bc1bb

File tree

4 files changed

+183
-11
lines changed

4 files changed

+183
-11
lines changed

Diff for: CHANGELOG.md

+2
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ The format is based on [Keep a Changelog], and this project adheres to [Semantic
1818
- `prevent-canvas` scriptlet [#451]
1919
- `parentSelector` option to search for nodes for `remove-node-text` scriptlet [#397]
2020
- `transform` option with `base64decode` value for `href-sanitizer` scriptlet [#455]
21+
- `removeParam` and `removeHash` values in `transform` option for `href-sanitizer` scriptlet [#460]
2122
- new values to `set-cookie` and `set-local-storage-item` scriptlets: `forbidden`, `forever` [#458]
2223

2324
### Changed
@@ -35,6 +36,7 @@ The format is based on [Keep a Changelog], and this project adheres to [Semantic
3536
[#397]: https://github.com/AdguardTeam/Scriptlets/issues/397
3637
[#458]: https://github.com/AdguardTeam/Scriptlets/issues/458
3738
[#457]: https://github.com/AdguardTeam/Scriptlets/issues/457
39+
[#460]: https://github.com/AdguardTeam/Scriptlets/issues/460
3840

3941
## [v1.12.1] - 2024-09-20
4042

Diff for: scripts/compatibility-table.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,7 @@
212212
},
213213
{
214214
"adg": "set-attr",
215-
"ubo": "set-attr.js"
215+
"ubo": "set-attr.js (removed)"
216216
},
217217
{
218218
"adg": "set-constant",

Diff for: src/scriptlets/href-sanitizer.ts

+126-6
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,12 @@ import {
3131
* - `text` — use the text content of the matched element,
3232
* - `[<attribute-name>]` copy the value from attribute `attribute-name` on the same element,
3333
* - `?<parameter-name>` copy the value from URL parameter `parameter-name` of the same element's `href` attribute.
34-
* - `transform` — optional, defaults to no transforming:
34+
* - `transform` — optional, defaults to no transforming. Possible values:
3535
* - `base64decode` — decode the base64 string from specified attribute.
36+
* - `removeHash` — remove the hash from the URL.
37+
* - `removeParam[:<parameters>]` — remove the specified parameters from the URL,
38+
* where `<parameters>` is a comma-separated list of parameter names;
39+
* if no parameter is specified, remove all parameters.
3640
*
3741
* > Note that in the case where the discovered value does not correspond to a valid URL with the appropriate
3842
* > http or https protocols, the value will not be set.
@@ -111,6 +115,60 @@ import {
111115
* </div>
112116
* ```
113117
*
118+
* 5. Remove the hash from the URL:
119+
*
120+
* ```adblock
121+
* example.org#%#//scriptlet('href-sanitizer', 'a[href*="foo.com"]', '[href]', 'removeHash')
122+
* ```
123+
*
124+
* ```html
125+
* <!-- before -->
126+
* <div>
127+
* <a href="http://www.foo.com/out/#aHR0cDovL2V4YW1wbGUuY29tLz92PTEyMw=="></a>
128+
* </div>
129+
*
130+
* <!-- after -->
131+
* <div>
132+
* <a href="http://www.foo.com/out/"></a>
133+
* </div>
134+
* ```
135+
*
136+
* 6. Remove the all parameter(s) from the URL:
137+
*
138+
* ```adblock
139+
* example.org#%#//scriptlet('href-sanitizer', 'a[href*="foo.com"]', '[href]', 'removeParam')
140+
* ```
141+
*
142+
* ```html
143+
* <!-- before -->
144+
* <div>
145+
* <a href="https://foo.com/123123?utm_source=nova&utm_medium=tg&utm_campaign=main"></a>
146+
* </div>
147+
*
148+
* <!-- after -->
149+
* <div>
150+
* <a href="https://foo.com/123123"></a>
151+
* </div>
152+
* ```
153+
*
154+
* 7. Remove the specified parameter(s) from the URL:
155+
*
156+
* ```adblock
157+
* example.org#%#//scriptlet('href-sanitizer', 'a[href*="foo.com"]', '[href]', 'removeParam:utm_source,utm_medium')
158+
* ```
159+
*
160+
* ```html
161+
* <!-- before -->
162+
* <div>
163+
* <a href="https://foo.com/123123?utm_source=nova&utm_medium=tg&utm_campaign=main"></a>
164+
* </div>
165+
*
166+
* <!-- after -->
167+
* <div>
168+
* <a href="https://foo.com/123123?utm_campaign=main"></a>
169+
* </div>
170+
* ```
171+
*
114172
* @added v1.10.25.
115173
*/
116174

@@ -125,7 +183,13 @@ export function hrefSanitizer(
125183
return;
126184
}
127185

128-
const BASE64_TRANSFORM_MARKER = 'base64decode';
186+
// transform markers
187+
const BASE64_DECODE_TRANSFORM_MARKER = 'base64decode';
188+
const REMOVE_HASH_TRANSFORM_MARKER = 'removeHash';
189+
const REMOVE_PARAM_TRANSFORM_MARKER = 'removeParam';
190+
// separator markers
191+
const MARKER_SEPARATOR = ':';
192+
const COMMA = ',';
129193

130194
// Regular expression to find not valid characters at the beginning and at the end of the string,
131195
// \x21-\x7e is a range that includes the ASCII characters from ! (hex 21) to ~ (hex 7E).
@@ -337,14 +401,64 @@ export function hrefSanitizer(
337401
return validEncodedHash ? decodeBase64SeveralTimes(validEncodedHash, DECODE_ATTEMPTS_NUMBER) : '';
338402
};
339403

404+
/**
405+
* Removes the hash from the URL.
406+
* @param url URL to remove the hash from
407+
* @returns URL without the hash or empty string if no hash is found
408+
*/
409+
const removeHash = (url: string) => {
410+
const urlObj = new URL(url, window.location.origin);
411+
412+
if (!urlObj.hash) {
413+
return '';
414+
}
415+
416+
urlObj.hash = '';
417+
return urlObj.toString();
418+
};
419+
420+
/**
421+
* Removes the specified parameter from the URL.
422+
* @param url URL to remove the parameter from
423+
* @param transformValue parameter value(s) to remove with marker
424+
* @returns URL without the parameter(s) or empty string if no parameter is found
425+
*/
426+
const removeParam = (url: string, transformValue: string) => {
427+
const urlObj = new URL(url, window.location.origin);
428+
429+
// get the parameter values to remove
430+
const paramNamesToRemoveStr = transformValue.split(MARKER_SEPARATOR)[1];
431+
432+
if (!paramNamesToRemoveStr) {
433+
urlObj.search = '';
434+
return urlObj.toString();
435+
}
436+
437+
const initSearchParamsLength = urlObj.searchParams.toString().length;
438+
439+
const removeParams = paramNamesToRemoveStr.split(COMMA);
440+
removeParams.forEach((param) => {
441+
if (urlObj.searchParams.has(param)) {
442+
urlObj.searchParams.delete(param);
443+
}
444+
});
445+
446+
// if the parameter(s) is not found, return empty string
447+
if (initSearchParamsLength === urlObj.searchParams.toString().length) {
448+
return '';
449+
}
450+
451+
return urlObj.toString();
452+
};
453+
340454
/**
341455
* Extracts the base64 part from a string.
342456
* If no base64 string is found, `null` is returned.
343457
* @param url String to extract the base64 part from.
344458
* @returns The base64 part of the string, or `null` if none is found.
345459
*/
346460
const decodeBase64URL = (url: string) => {
347-
const { search, hash } = new URL(url);
461+
const { search, hash } = new URL(url, document.location.href);
348462

349463
if (search.length > 0) {
350464
return decodeSearchString(search);
@@ -394,13 +508,19 @@ export function hrefSanitizer(
394508
return;
395509
}
396510
let newHref = extractNewHref(elem, attribute);
397-
398511
// apply transform if specified
399512
if (transform) {
400-
switch (transform) {
401-
case BASE64_TRANSFORM_MARKER:
513+
switch (true) {
514+
case transform === BASE64_DECODE_TRANSFORM_MARKER:
402515
newHref = base64Decode(newHref);
403516
break;
517+
case transform === REMOVE_HASH_TRANSFORM_MARKER:
518+
newHref = removeHash(newHref);
519+
break;
520+
case transform.startsWith(REMOVE_PARAM_TRANSFORM_MARKER): {
521+
newHref = removeParam(newHref, transform);
522+
break;
523+
}
404524
default:
405525
logMessage(source, `Invalid transform option: "${transform}"`);
406526
return;

Diff for: tests/scriptlets/href-sanitizer.test.js

+54-4
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,12 @@ const createElem = (href, text, attributeName, attributeValue) => {
2727
};
2828

2929
const removeElem = () => {
30-
const elem = document.getElementById('testHref');
31-
if (elem) {
32-
elem.remove();
33-
}
30+
const elem = document.querySelectorAll('#testHref');
31+
elem.forEach((el) => {
32+
if (el) {
33+
el.remove();
34+
}
35+
});
3436
};
3537

3638
const beforeEach = () => {
@@ -64,6 +66,54 @@ test('Checking if alias name works', (assert) => {
6466
assert.strictEqual(codeByAdgParams, codeByUboParams, 'ubo name - ok');
6567
});
6668

69+
test('Sanitize href - remove all parameters from href', (assert) => {
70+
const expectedHref = 'https://foo.com/123123';
71+
const elem = createElem('https://foo.com/123123?utm_source=nova&utm_medium=tg&utm_campaign=main');
72+
const selector = 'a[href^="https://foo.com/123123"]';
73+
74+
const scriptletArgs = [selector, '[href]', 'removeParam'];
75+
runScriptlet(name, scriptletArgs);
76+
77+
assert.strictEqual(elem.getAttribute('href'), expectedHref, 'all params from href was removed');
78+
assert.strictEqual(window.hit, 'FIRED');
79+
});
80+
81+
test('Sanitize href - remove parameters from href', (assert) => {
82+
const expectedHref = 'https://foo.com/watch?utm_campaign=main';
83+
const elem = createElem('https://foo.com/watch?v=dbjPnXaacAU&pp=ygUEdGVzdA%3D%3D&utm_campaign=main');
84+
const selector = 'a[href^="https://foo.com/watch"]';
85+
86+
const scriptletArgs = [selector, '[href]', 'removeParam:v,pp'];
87+
runScriptlet(name, scriptletArgs);
88+
89+
assert.strictEqual(elem.getAttribute('href'), expectedHref, 'v and pp params from href was removed');
90+
assert.strictEqual(window.hit, 'FIRED');
91+
});
92+
93+
test('Sanitize href - remove parameter from href', (assert) => {
94+
const expectedHref = 'https://example.org/watch?v=dbjPnXaacAU';
95+
const elem = createElem('https://example.org/watch?v=dbjPnXaacAU&pp=ygUEdGVzdA%3D%3D');
96+
const selector = 'a[href^="https://example.org/watch"]';
97+
98+
const scriptletArgs = [selector, '[href]', 'removeParam:pp'];
99+
runScriptlet(name, scriptletArgs);
100+
101+
assert.strictEqual(elem.getAttribute('href'), expectedHref, 'pp param from href was removed');
102+
assert.strictEqual(window.hit, 'FIRED');
103+
});
104+
105+
test('Sanitize href - remove hash', (assert) => {
106+
const expectedHref = 'https://example.org/?article';
107+
const elem = createElem('https://example.org/?article#utm_source=Facebook');
108+
const selector = 'a[href]';
109+
110+
const scriptletArgs = [selector, '[href]', 'removeHash'];
111+
runScriptlet(name, scriptletArgs);
112+
113+
assert.strictEqual(elem.getAttribute('href'), expectedHref, 'hash from href was removed');
114+
assert.strictEqual(window.hit, 'FIRED');
115+
});
116+
67117
test('Sanitize href - no URL was found in base64', (assert) => {
68118
// encoded string is 'some text, no urls'
69119
const hrefWithBase64 = 'http://foo.com/#c29tZSB0ZXh0LCBubyB1cmxz';

0 commit comments

Comments
 (0)