You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Payload-aware heuristics**: Flags long hidden sequences, dense suspicious regions, and explicit payload-plus-decoder correlations while keeping standalone decoder noise out of default results.
50
-
-**Context-aware severity**: Uses bounded content-based file shape checks, file-kind classification, local finding region checks, and decoder proximity to reduce low-value invisible-character noise without downgrading bidi controls or long suspicious runs.
50
+
-**Context-aware severity**: Uses bounded content-based file shape checks, conservative file-role hints, local finding region checks, and decoder proximity to reduce low-value invisible-character noise without downgrading bidi controls, long suspicious runs, or build and release contexts.
51
51
-**Noise reduction for asset contexts**: Suppresses obvious private-use glyph mappings in font-like SVG assets so icon fonts do not dominate the report.
52
52
-**Safe repository traversal**: Skips symlinks, binary files, oversize files, and common dependency or build directories.
53
53
-**CI-friendly behavior**: Uses deterministic ordering, human or JSON output, and exit codes `0`, `1`, and `2`.
@@ -223,22 +223,24 @@ Every finding is assigned one of four severity levels: `LOW`, `MEDIUM`, `HIGH`,
223
223
224
224
### How Severity Is Computed
225
225
226
-
Severity is derived from four inputs, all computed from file content and local context:
226
+
Severity is derived from five inputs, all computed from file content and local context:
227
227
228
228
1.**Sequence length** — how many suspicious runes appear in the finding. Isolated characters (1) are treated differently from short runs (2–5), medium runs (6–15), long runs (16–63), and very long runs (64+). Longer sequences receive higher severity regardless of context.
229
229
230
230
2.**File shape** — the file is classified as `code_like`, `data_like`, `prose_like`, or `unknown` based on bounded content analysis (first 64 KiB / 400 non-empty lines). Code-like files with brackets, operators, and keywords produce higher severity for the same finding than prose-like files with natural language.
231
231
232
-
3.**Finding region** — the immediate context around each finding is classified as whitespace-like, string-like, comment-like, token-like, prose-like, or unknown. An invisible character inside an identifier (`token_like`) is more severe than one inside a comment or whitespace region.
232
+
3.**File role hints** — conservative path and filename hints distinguish locale data, ordinary test source, and build or release paths. These hints are advisory only. They never suppress bidi controls, payloads, correlations, long suspicious runs, or `testdata` and fixture inputs.
233
233
234
-
4.**Decoder proximity** — if a decode or dynamic-execution marker (`eval(`, `Buffer.from(`, `atob(`, etc.) appears within 5 lines of a finding, severity is escalated by one level. Markers within 20 lines escalate findings that are already `HIGH`.
234
+
4.**Finding region** — the immediate context around each finding is classified as whitespace-like, string-like, comment-like, token-like, prose-like, or unknown. An invisible character inside an identifier (`token_like`) is more severe than one inside a comment or whitespace region.
235
+
236
+
5.**Decoder proximity** — if a decode or dynamic-execution marker (`eval(`, `Buffer.from(`, `atob(`, etc.) appears within 5 lines of a finding, severity is escalated by one level. Markers within 20 lines escalate findings that are already `HIGH`.
235
237
236
238
### Per-Rule Behavior
237
239
238
240
| Rule | Base severity logic |
239
241
|------|-------------------|
240
242
|`unicode/bidi`| Always `HIGH`. Bidi controls are never downgraded by context, comments, prose, or path hints. |
241
-
|`unicode/invisible`| Ranges from `LOW` to `CRITICAL` depending on sequence length, file shape, and region. A file-start BOM is suppressed. A single non-leading `U+FEFF` is still reported but defaults to `LOW`; isolated characters in identifiers are `HIGH`; long runs are `CRITICAL`. |
243
+
|`unicode/invisible`| Ranges from `LOW` to `CRITICAL` depending on sequence length, file shape, file role, and region. A file-start BOM is suppressed. A single non-leading `U+FEFF` is still reported but defaults to `LOW`; isolated characters in identifiers are `HIGH`; long runs are `CRITICAL`. |
242
244
|`unicode/private-use`|`CRITICAL` for long runs, `HIGH` for short/medium runs and code-like token regions, `MEDIUM` in prose or data contexts. |
243
245
|`unicode/payload`|`HIGH` for normal sequences, `CRITICAL` for long runs. |
244
246
|`unicode/correlation`| Always `CRITICAL`. A payload near a decoder is the strongest signal. |
@@ -254,6 +256,8 @@ ghostscan treats isolated and very short invisible-character findings differentl
254
256
255
257
- isolated invisible characters default to `LOW` unless they appear inside a token-like region or are elevated by nearby decode/execute markers
256
258
- short runs in prose-like, comment-like, whitespace-like, and data-like contexts default to `LOW`
259
+
- low-signal invisible findings may be suppressed in ordinary test source only when they appear in benign string, comment, whitespace, or prose contexts with no nearby decode, execution, shell, or build markers
260
+
- build, release, packaging, CI, shell, and parser-sensitive fixture inputs are not softened by test-like path hints alone
257
261
- short runs in code-like strings or unknown regions stay visible and usually land at `MEDIUM`
258
262
- token-like invisible findings remain `HIGH`
259
263
- long invisible runs and payload findings stay strong regardless of surrounding file shape
0 commit comments