Commit 1e2da6d
authored
fix: ipv4 address regex (#3808)
I noticed the ipv4 regex is wrong (it only capture one or two-digit
octets, e.g. `n.nn.n.nn`). Here's a correction and a bumped test for it.
If you wish I can break out the ipv4 test to its own case, so we don't
interfere with the existing `EMAIL_META_DATA_INPUT` ipv6 extraction
test.
Side note: The comment at `unstructured/nlp/patterns.py#95` includes a
bad ipv4 address example (last octet is wrongfully left-padded with a
zero). I left it as it is because I'm not sure if the intention is to
include "non-conventional" ipv4 addresses, like octal or hexadecimal
octets.1 parent 4379d88 commit 1e2da6d
File tree
3 files changed
+8
-4
lines changed- test_unstructured/cleaners
- unstructured/nlp
3 files changed
+8
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
3 | 7 | | |
4 | 8 | | |
5 | 9 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
40 | | - | |
| 40 | + | |
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
95 | | - | |
| 95 | + | |
96 | 96 | | |
97 | | - | |
| 97 | + | |
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
| |||
0 commit comments