Skip to content

Commit c57f2ec

Browse files
committed
Update documentation for first release [skip ci]
1 parent a23bb4f commit c57f2ec

File tree

8 files changed

+134
-159
lines changed

8 files changed

+134
-159
lines changed

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2025 WISC Lab
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 110 additions & 152 deletions
Original file line numberDiff line numberDiff line change
@@ -1,216 +1,174 @@
1-
# ShredGuard
1+
# ShredGuard by [WISC Lab](https://kidspeech.wisc.edu/)
22

3-
A CLI tool to scan files for configurable regex patterns (PHI identifiers) and optionally replace matches with deterministic pseudonyms. Integrates with pre-commit framework.
3+
[![CI](https://github.com/WiscLab/shred-guard/actions/workflows/ci.yml/badge.svg)](https://github.com/WISCLab/shred-guard/actions/workflows/ci.yml) [![CD](https://github.com/WISCLab/shred-guard/actions/workflows/cd.yml/badge.svg)](https://github.com/WISCLab/shred-guard/actions/workflows/cd.yml)
44

5-
## Features
5+
Scan files for PHI (Protected Health Information) patterns and replace them with deterministic pseudonyms. Integrates seamlessly with pre-commit hooks.
66

7-
- Scan files for PHI patterns using configurable regex
8-
- Replace matches with deterministic pseudonyms (same value = same ID)
9-
- Pre-commit integration for automated checks
10-
- Respects `.gitignore` patterns
11-
- Ruff-style output format (`file:line:col: SGxxx Message`)
12-
- Cross-platform support (Windows, macOS, Linux)
137

14-
## Installation
8+
## Appendix
159

16-
```bash
17-
pip install shredguard
18-
```
10+
- [Value Proposition](https://raw.githubusercontent.com/WiscLab/shred-guard/main/value-proposition.svg)
11+
- [Installation](#installation)
12+
- [Quick Start](#quick-start)
13+
- [Commands](#commands)
14+
- [shredguard init](#shredguard-init)
15+
- [shredguard check](#shredguard-check)
16+
- [shredguard fix](#shredguard-fix)
17+
- [Configuration](#configuration)
18+
- [Pre-commit](#pre-commit)
19+
- [Reference](#reference)
20+
- [CLI Options](#cli-options)
21+
- [Configuration Reference](#configuration-reference)
22+
- [Built-in Pattern Suggestions](#built-in-pattern-suggestions)
23+
- [Exit Codes](#exit-codes)
24+
- [Binary File Handling](#binary-file-handling)
25+
- [License](#license)
1926

20-
Or install from source:
27+
## Installation
2128

2229
```bash
23-
pip install -e .
30+
pip install shred-guard
2431
```
2532

2633
## Quick Start
2734

28-
1. Add configuration to your `pyproject.toml`:
29-
30-
```toml
31-
[tool.shredguard]
32-
[[tool.shredguard.patterns]]
33-
regex = "SUB-\\d{4,6}"
34-
description = "Subject ID"
35-
36-
[[tool.shredguard.patterns]]
37-
regex = "\\b\\d{3}-\\d{2}-\\d{4}\\b"
38-
description = "SSN-like pattern"
39-
40-
[[tool.shredguard.patterns]]
41-
regex = "MRN\\d{6,10}"
42-
description = "Medical Record Number"
43-
```
44-
45-
2. Run a check:
35+
Run the interactive setup wizard:
4636

4737
```bash
48-
shredguard check .
38+
shredguard init
4939
```
5040

51-
3. Fix (replace) found patterns:
41+
This walks you through:
42+
- Selecting PHI patterns to detect (SSNs, emails, MRNs, custom patterns)
43+
- Configuring file restrictions
44+
- Setting up pre-commit hooks
5245

53-
```bash
54-
shredguard fix --prefix REDACTED .
55-
```
46+
## Commands
5647

57-
## Pre-commit Setup
48+
### `shredguard init`
5849

59-
Add to your `.pre-commit-config.yaml`:
60-
61-
```yaml
62-
repos:
63-
- repo: https://github.com/shredguard/shredguard
64-
rev: v0.1.0
65-
hooks:
66-
- id: shredguard-check
67-
```
68-
69-
Or to automatically fix:
70-
71-
```yaml
72-
repos:
73-
- repo: https://github.com/shredguard/shredguard
74-
rev: v0.1.0
75-
hooks:
76-
- id: shredguard-fix
77-
args: [--prefix, REDACTED]
78-
```
79-
80-
## CLI Reference
50+
Interactive setup wizard. Creates your configuration and optionally sets up pre-commit integration.
8151

8252
### `shredguard check`
8353

84-
Scan files for PHI patterns.
54+
Scan for PHI patterns:
8555

8656
```bash
87-
shredguard check [OPTIONS] [FILES]...
57+
shredguard check . # Scan current directory
58+
shredguard check data/ notes.txt # Scan specific paths
8859
```
8960

90-
**Arguments:**
91-
- `FILES` - Files or directories to scan (default: current directory)
92-
93-
**Options:**
94-
- `--all-files` - Scan all files (typically used with pre-commit)
95-
- `--no-gitignore` - Don't respect `.gitignore` patterns
96-
- `--config PATH` - Path to config file (default: searches for `pyproject.toml`)
97-
- `--verbose, -v` - Show verbose output (skipped files, etc.)
98-
99-
**Exit codes:**
100-
- `0` - No matches found
101-
- `1` - Matches found or error
61+
Output uses ruff-style formatting:
62+
```
63+
patient_notes.txt:1:9: SG001 Subject ID [SUB-1234]
64+
patient_notes.txt:2:6: SG002 SSN [123-45-6789]
65+
```
10266

10367
### `shredguard fix`
10468

105-
Replace PHI patterns with pseudonyms.
69+
Replace PHI with pseudonyms:
10670

10771
```bash
108-
shredguard fix [OPTIONS] [FILES]...
72+
shredguard fix . # Replace with REDACTED-0, REDACTED-1, ...
73+
shredguard fix --prefix ANON . # Custom prefix: ANON-0, ANON-1, ...
74+
shredguard fix --output-map mapping.json . # Save original -> pseudonym mapping
10975
```
11076

111-
**Arguments:**
112-
- `FILES` - Files or directories to scan and fix (default: current directory)
113-
114-
**Options:**
115-
- `--prefix PREFIX` - Prefix for pseudonyms (default: `REDACTED`)
116-
- `--output-map PATH` - Write JSON mapping of originals to pseudonyms
117-
- `--all-files` - Scan all files
118-
- `--no-gitignore` - Don't respect `.gitignore` patterns
119-
- `--config PATH` - Path to config file
120-
- `--verbose, -v` - Show verbose output
77+
Replacements are deterministic: and the same value always gets the same pseudonym within a run.
12178

122-
## Configuration Reference
79+
## Configuration
12380

124-
Configuration is read from `pyproject.toml` under the `[tool.shredguard]` section.
81+
Configuration lives in `pyproject.toml` (or `/*/*.toml` set with --config):
12582

126-
### Patterns
83+
```toml
84+
[tool.shredguard]
12785

128-
Define patterns to scan for:
86+
[[tool.shredguard.patterns]]
87+
regex = "SUB-\\d{4,6}"
88+
description = "Subject ID"
12989

130-
```toml
13190
[[tool.shredguard.patterns]]
132-
regex = "SUB-\\d{4,6}" # Required: regex pattern
133-
description = "Subject ID" # Optional: description for output
134-
files = ["*.csv", "data/**"] # Optional: only scan matching files
135-
exclude_files = ["*_test.*"] # Optional: exclude matching files
91+
regex = "\\b\\d{3}-\\d{2}-\\d{4}\\b"
92+
description = "SSN"
13693
```
13794

138-
**Pattern fields:**
139-
- `regex` (required) - Regular expression pattern to match
140-
- `description` (optional) - Human-readable description shown in output
141-
- `files` (optional) - List of glob patterns; only scan files matching these
142-
- `exclude_files` (optional) - List of glob patterns; skip files matching these
143-
144-
### Pattern Codes
145-
146-
Patterns are assigned stable codes (SG001, SG002, etc.) based on their order in the config file.
95+
Each pattern can optionally include `files` and `exclude_files` globs to control which files are scanned.
14796

148-
## Example Workflow
97+
## Pre-commit
14998

150-
1. Create test files with PHI:
99+
Add to `.pre-commit-config.yaml`:
151100

152-
```bash
153-
echo "Patient SUB-1234 was enrolled on 2024-01-15" > patient_notes.txt
154-
echo "SSN: 123-45-6789" >> patient_notes.txt
101+
```yaml
102+
repos:
103+
- repo: local
104+
hooks:
105+
- id: shredguard-check
106+
name: shredguard check
107+
entry: shredguard check
108+
language: system
109+
types: [text]
155110
```
156111
157-
2. Check for PHI:
112+
Or let `shredguard init` set this up for you.
158113

159-
```bash
160-
$ shredguard check patient_notes.txt
161-
patient_notes.txt:1:9: SG001 Subject ID [SUB-1234]
162-
patient_notes.txt:2:6: SG002 SSN-like pattern [123-45-6789]
114+
## Reference
163115

164-
Found 2 matches in 1 file
165-
```
116+
### CLI Options
166117

167-
3. Replace PHI with pseudonyms:
118+
**`shredguard check [OPTIONS] [FILES]...`**
168119

169-
```bash
170-
$ shredguard fix --prefix ANON --output-map mapping.json patient_notes.txt
171-
Replaced 2 occurrences of 2 unique values in 1 file
172-
Mapping written to: mapping.json
173-
```
120+
| Option | Description |
121+
|--------|-------------|
122+
| `--all-files` | Scan all files recursively |
123+
| `--no-gitignore` | Don't respect `.gitignore` patterns |
124+
| `--config PATH` | Path to config file |
125+
| `-v, --verbose` | Show verbose output (skipped files, etc.) |
174126

175-
4. Verify replacement:
127+
**`shredguard fix [OPTIONS] [FILES]...`**
176128

177-
```bash
178-
$ cat patient_notes.txt
179-
Patient ANON-0 was enrolled on 2024-01-15
180-
SSN: ANON-1
181-
182-
$ cat mapping.json
183-
{
184-
"SUB-1234": "ANON-0",
185-
"123-45-6789": "ANON-1"
186-
}
187-
```
129+
| Option | Description |
130+
|--------|-------------|
131+
| `--prefix TEXT` | Prefix for pseudonyms (default: `REDACTED`) |
132+
| `--output-map PATH` | Write JSON mapping of originals to pseudonyms |
133+
| `--all-files` | Scan all files recursively |
134+
| `--no-gitignore` | Don't respect `.gitignore` patterns |
135+
| `--config PATH` | Path to config file |
136+
| `-v, --verbose` | Show verbose output |
188137

189-
## Deterministic Replacement
138+
### Configuration Reference
190139

191-
ShredGuard uses deterministic pseudonym assignment:
192-
- The same matched value always gets the same pseudonym within a single run
193-
- IDs are assigned in order of first encounter (0, 1, 2, ...)
194-
- The mapping can be saved to a JSON file for reference
140+
```toml
141+
[[tool.shredguard.patterns]]
142+
regex = "SUB-\\d{4,6}" # Required: regex pattern
143+
description = "Subject ID" # Optional: shown in output
144+
files = ["*.csv", "data/**"] # Optional: only scan matching files
145+
exclude_files = ["*_test.*"] # Optional: skip matching files
146+
```
195147

196-
## Binary File Detection
148+
### Built-in Pattern Suggestions
197149

198-
Binary files are automatically detected and skipped using a null byte heuristic (checking first 8KB). Use `--verbose` to see which files are skipped.
150+
When running `shredguard init`, you can choose from these common patterns:
199151

200-
## Development
152+
| Pattern | Description |
153+
|---------|-------------|
154+
| `SUB-\d{4,6}` | Subject ID |
155+
| `\b\d{3}-\d{2}-\d{4}\b` | Social Security Number |
156+
| `MRN\d{6,10}` | Medical Record Number |
157+
| `[email pattern]` | Email addresses |
158+
| `[phone pattern]` | Phone numbers (10 digits) |
159+
| `\b\d{5}(?:-\d{4})?\b` | ZIP codes |
201160

202-
Install development dependencies:
161+
### Exit Codes
203162

204-
```bash
205-
pip install -e ".[dev]"
206-
```
163+
| Code | Meaning |
164+
|------|---------|
165+
| `0` | Success (no matches found for `check`) |
166+
| `1` | Matches found or error |
207167

208-
Run tests:
168+
### Binary File Handling
209169

210-
```bash
211-
pytest
212-
```
170+
Binary files are automatically detected and skipped (null byte check in first 8KB). Use `--verbose` to see skipped files.
213171

214172
## License
215173

216-
MIT
174+
[MIT](https://github.com/WiscLab/shred-guard/blob/main/LICENSE)

pyproject.toml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,12 @@ name = "shred-guard"
77
dynamic = ["version"]
88
description = "A CLI tool to scan files for configurable regex patterns (PHI identifiers) and optionally replace matches with deterministic pseudonyms"
99
readme = "README.md"
10-
license = "MIT"
1110
requires-python = ">=3.10"
1211
authors = [
13-
{ name = "ShredGuard Contributors" }
12+
{ name = "@BeckettFrey" }
1413
]
1514
keywords = ["phi", "pii", "redaction", "pre-commit", "security"]
1615
classifiers = [
17-
"Development Status :: 4 - Beta",
1816
"Environment :: Console",
1917
"Intended Audience :: Developers",
2018
"License :: OSI Approved :: MIT License",

src/shredguard/cli.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -373,7 +373,7 @@ def init() -> None:
373373
click.echo(f" [{i}] {pattern['description']}")
374374
click.secho(f" regex: {pattern['regex']}", fg="bright_black")
375375

376-
if click.confirm(f" Include this pattern?", default=True):
376+
if click.confirm(" Include this pattern?", default=True):
377377
selected_patterns.append(pattern.copy())
378378
click.echo()
379379

tests/test_fixer.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
from shredguard.config import Pattern
88
from shredguard.fixer import (
99
Fixer,
10-
FixResult,
1110
PrefixCollisionError,
1211
check_prefix_collisions,
1312
apply_fixes,

tests/test_gitignore.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
"""Tests for shredguard.gitignore module."""
22

3-
import pytest
43
from pathlib import Path
54

65
from shredguard.gitignore import (

tests/test_scanner.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
"""Tests for shredguard.scanner module."""
22

3-
import pytest
43
from pathlib import Path
54

65
from shredguard.config import Pattern

value-proposition.svg

Lines changed: 1 addition & 0 deletions
Loading

0 commit comments

Comments
 (0)