Skip to content

Commit c01415f

Browse files
committed
Matching engine design
1 parent eab5d05 commit c01415f

File tree

1 file changed

+117
-38
lines changed

1 file changed

+117
-38
lines changed

src/Imazen.Routing/Matching/matching.md

+117-38
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,133 @@
11
# Design of route matcher syntax
22

3+
This system never backtracks, ensuring that the matching process is always O(n) in the length of the path. Not all conditions are involved in capturing; many are validated after input is segmented.
34

45

5-
/images/{seo_string_ignored}/{sku:guid}/{image_id:int:range(0,1111111)}{width:int:after(_):optional}.{format:only(jpg|png|gif)}
6-
/azure/centralstorage/skus/${sku}/images/${image_id}.${format}?format=avif&w=${width}
6+
When a querystring is specified in the expression, it is structurally parsed and matched regardless of
7+
how the querystring is arranged, and extra unspecified keys are ignored.
8+
9+
`/images/{sku:int}/{image_id:int}.{format:only(jpg|png|gif)}?w={width:int:?}`
10+
`/san/productimages/{{image_id}.{{format}}?format=webp&w={{width:default(40)}}`
11+
12+
`/images/{sku:int}/{image_id:int}.{format:only(jpg|png|gif)}?w={w:int:?}&width={width:int:?}&http-accept={:contains(image/webp)}`
13+
14+
15+
`/san/productimages/{{image_id}.{{format}}?format=webp&w={{width:or-var(w):default(40)}}`
16+
17+
18+
# MatchExpression Syntax Reference
19+
20+
## Segment Boundary Matching
21+
22+
** These affect where captures start and stop, and how the input is divided up **
23+
24+
- `equals(string)`: Matches a segment that equals the specified string.
25+
- `equals-i(string)`: Matches a segment that equals the specified string, ignoring case.
26+
- `starts-with(string)`: Matches a segment that starts with the specified string.
27+
- `starts-with-i(string)`: Matches a segment that starts with the specified string, ignoring case.
28+
- `ends-with(string)`: Matches a segment that ends with the specified string.
29+
- `ends-with-i(string)`: Matches a segment that ends with the specified string, ignoring case.
30+
- `len(int)`: Matches a segment with a fixed length specified by the integer.
31+
- `equals(char)`: Matches a segment that equals the specified character.
32+
- `prefix(string)`: Matches a segment that starts with the specified string, not including it in the captured value.
33+
- `prefix-i(string)`: Matches a segment that starts with the specified string, ignoring case and not including it in the captured value.
34+
- `suffix(string)`: Matches a segment that ends with the specified string, not including it in the captured value.
35+
- `suffix-i(string)`: Matches a segment that ends with the specified string, ignoring case and not including it in the captured value.
36+
37+
## After-matching Segment Conditions
38+
39+
** These are validated *after* the input is parsed into matching segments. Thus, they do not affect
40+
where a capture starts and stops. **
41+
42+
- `alpha()`: Matches a segment that contains only alphabetic characters.
43+
- `alpha-lower()`: Matches a segment that contains only lowercase alphabetic characters.
44+
- `alpha-upper()`: Matches a segment that contains only uppercase alphabetic characters.
45+
- `alphanumeric()`: Matches a segment that contains only alphanumeric characters.
46+
- `hex()`: Matches a segment that contains only hexadecimal characters.
47+
- `int32()`: Matches a segment that represents a valid 32-bit integer.
48+
- `int64()`: Matches a segment that represents a valid 64-bit integer.
49+
- `guid()`: Matches a segment that represents a valid GUID.
50+
- `equals(string1|string2|...)`: Matches a segment that equals one of the specified strings.
51+
- `equals-i(string1|string2|...)`: Matches a segment that equals one of the specified strings, ignoring case.
52+
- `starts-with(string1|string2|...)`: Matches a segment that starts with one of the specified strings.
53+
- `starts-with-i(string1|string2|...)`: Matches a segment that starts with one of the specified strings, ignoring case.
54+
- `ends-with(string1|string2|...)`: Matches a segment that ends with one of the specified strings.
55+
- `ends-with-i(string1|string2|...)`: Matches a segment that ends with one of the specified strings, ignoring case.
56+
- `contains(string)`: Matches a segment that contains the specified string.
57+
- `contains-i(string)`: Matches a segment that contains the specified string, ignoring case.
58+
- `contains(string1|string2|...)`: Matches a segment that contains one of the specified strings.
59+
- `contains-i(string1|string2|...)`: Matches a segment that contains one of the specified strings, ignoring case.
60+
- `range(min,max)`: Matches a segment that represents an integer within the specified range (inclusive).
61+
- `range(min,)`: Matches a segment that represents an integer greater than or equal to the specified minimum value.
62+
- `range(,max)`: Matches a segment that represents an integer less than or equal to the specified maximum value.
63+
- `length(min,max)`: Matches a segment with a length within the specified range (inclusive).
64+
- `length(min,)`: Matches a segment with a length greater than or equal to the specified minimum length.
65+
- `length(,max)`: Matches a segment with a length less than or equal to the specified maximum length.
66+
- `image-ext-supported()`: Matches a segment that represents a supported image file extension.
67+
- `allowed-chars(CharacterClass)`: Matches a segment that contains only characters from the specified character class.
68+
- `starts-with-chars(count,CharacterClass)`: Matches a segment that starts with a specified number of characters from the given character class.
69+
- `image-ext-supported()`: Matches a segment that represents a supported (for image processing) image file extension.
70+
71+
## Optional and Wildcard Segments
72+
73+
- `{segment:condition1:condition2:...:?}`: Marks a segment as optional by appending `?` to the end of the segment conditions.
74+
- `{?}`: Matches any segment optionally.
75+
76+
## Character Classes
77+
78+
Character classes can be specified using square bracket notation, such as `[a-zA-Z]` to match alphabetic characters or `[0-9]` to match digits. Character classes are not affected by [ignore-case]
79+
80+
## Expression flags
81+
82+
At the end of the match express, you can specify `[flags,commma-separated]`
83+
84+
* `ignore-case` Makes path matching case-insensitive, except for character classes.
85+
* `case-sensitive` Makes path matching case-sensitive
86+
* `raw` Matches the raw path and querystring together, rather than structurally parsing and matching the querystring
87+
* `sort-raw-query-first` Alphabetically sorts the querystring key/value pairs before performing raw matching
88+
* `ignore-path` Applies the given query matcher to all paths.
89+
* `require-accept-webp` Only matches if the Accept header is present and includes `image/webp` specifically.
90+
91+
## Escaping Special Characters
92+
93+
Special characters like `{`, `}`, `:`, `?`, `*`, `[`, `]`, `(`, `)`, `|`, and `\` can be escaped using a backslash (`\`) to match them literally in segment conditions or literals.
94+
95+
## URL rewriting and querystring merging
796

97+
# URL templates
898

9-
/images/{path:*:has_supported_image_type}
10-
/azure/container/${path}
99+
Variables can be inserted in target strings using ${name} or ${name:transform:transform2}
11100

101+
### Transformations
102+
* `lower` e.g. {var:lower}
103+
* `upper`
104+
* more to come
12105

106+
## Flags
13107

108+
* `[stop-here]` - prevents application of further rewrite rules
109+
*
14110

15111

112+
TODO: sha256/auth stuff
113+
114+
process_image=true
115+
pass_through=true
116+
allow_pass_through=true
117+
stop_here=true
118+
case_sensitive=true/false (IIS/ASP.NET default to insensitive, but it's a bad default)
119+
120+
121+
122+
/images/{seo_string_ignored}/{sku:guid}/{image_id:int:range(0,1111111)}{width:int:after(_):optional}.{format:only(jpg|png|gif)}
123+
/azure/centralstorage/skus/{sku:lower}/images/{image_id}.{format}?format=avif&w={width}
124+
125+
/images/{path:has_supported_image_type}
126+
/azure/container/{path}
16127

17128

18129

19130

20-
We only want non-backtracking functionality.
21-
all conditions are AND, and variable strings are parsed before conditions are applied, with the following exceptions:
22-
after, until.
23-
If a condition lacks until, it is taken from the following character.
24131

25132

26133

@@ -33,12 +140,12 @@ They will terminate their matching when the character that follows them is reach
33140
"/image_{id:int}_seoname"
34141
"/image_{id:int}_{w:int}_seoname"
35142
"/image_{id:int}_{w:int:until(_):optional}seoname"
36-
"/image_{id:int}_{w:int:until(_)}/{**}"
143+
"/image_{id:int}_{w:int:until(_)}/{}"
37144

38145
A trailing ? means the variable (and its trailing character (leading might be also useful?)) is optional.
39146

40147
Partial matches
41-
match_path="/images/{path:**}"
148+
match_path="/images/{path}"
42149
remove_matched_part_for_children
43150

44151
or
@@ -51,8 +158,6 @@ match_path_and_query
51158
match_query
52159

53160

54-
Variables can be inserted in target strings using ${name:transform}
55-
where transform can be `lower`, `upper`, `trim`, `trim(a-zA-Z\t\:\\-))
56161

57162

58163
## conditions
@@ -63,32 +168,6 @@ ends_with(.jpg|.png|.gif), until(), after(), includes(),
63168

64169
until and after specify trailing and leading characters that are part of the matching group, but are only useful if combined with `optional`.
65170

66-
TODO: sha256/auth stuff
67-
68-
69-
respond_400_on_variable_condition_failure=true
70-
process_image=true
71-
pass_throgh=true
72-
allow_pass_through=true
73-
stop_here=true
74-
case_sensitive=true/false (IIS/ASP.NET default to insensitive, but it's a bad default)
75-
76-
[routes.accepts_any]
77-
accept_header_has_type="*/*"
78-
add_query_value="accepts=*"
79-
set_query_value="format=auto"
80-
81-
[routes.accepts_webp]
82-
accept_header_has_type="image/webp"
83-
add_query_value="accepts=webp"
84-
set_query_value="format=auto"
85-
86-
[routes.accepts_avif]
87-
accept_header_has_type="image/avif"
88-
add_query_value="accepts=avif"
89-
set_query_value="format=auto"
90-
91-
92171
# Escaping characters
93172

94173
JSON/TOML escapes include

0 commit comments

Comments
 (0)