Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .github/workflows/unit-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,22 @@ jobs:

- name: Run unit tests
run: just test-unit

permission-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v5

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Install the project
run: uv sync --all-extras

- name: Run permission tests
run: uv run pytest packages/syft-permissions/tests/ -n auto -v
223 changes: 223 additions & 0 deletions packages/syft-permissions/docs/permission-user-docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
# SyftBox Permissions Guide

## The basics

You control who can access files in your datasite by placing a **`syft.pub.yaml`** file in any directory. The datasite owner always has full access to their own datasite -- you cannot lock yourself out.

When your datasite is created, SyftBox sets up your root directory as private and your public folder as readable by everyone. Everything else starts private until you add permission files.

## 1. Simple permissions

Place a `syft.pub.yaml` in a directory. It controls access to all files in that directory and its subdirectories.

```yaml
terminal: false

rules:
- pattern: '**'
access:
admin: []
write: []
read: ['*']
```

- **`pattern`** -- which files the rule applies to (`"**"` means everything)
- **`access`** -- who gets what:
- **`read`**: can read files
- **`write`**: can read and create/modify files
- **`admin`**: full control, including changing the permission file itself
- **`terminal`** -- controls whether subdirectories can have their own permission files (explained in section 4)

Values in the access lists are email addresses. Two special values:

- `*` -- everyone (public)
- `*@company.com` -- everyone at a domain

**Make a folder publicly readable:**

```yaml
rules:
- pattern: '**'
access:
read: ['*']
write: []
admin: []
```

**Let your company read and write, one person as admin:**

```yaml
rules:
- pattern: '**'
access:
read: ['*@company.com']
write: ['*@company.com']
admin: ['alice@company.com']
```

## 2. Patterns

You can have multiple rules in one file with different patterns to give different permissions to different file types.

**Share CSVs with specific people, keep everything else private:**

```yaml
rules:
- pattern: '**/*.csv'
access:
read: ['alice@example.com', 'bob@example.com']
write: []
admin: []

- pattern: '**'
access:
read: []
write: []
admin: []
```

When a file matches more than one pattern, **the most specific pattern wins**:

- Longer, more precise patterns beat shorter ones
- Exact paths (`reports/q1.csv`) beat wildcards (`**/*.csv`)
- `"**"` is always the lowest priority -- it's your catch-all fallback

So in the example above, a `.csv` file matches both `**/*.csv` and `**`. The CSV rule is more specific, so Alice and Bob can read it. A `.txt` file only matches `**`, so nobody can read it.

Common patterns:

| Pattern | Matches |
| ----------------------- | -------------------------------------------------- |
| `"**"` | Everything (catch-all) |
| `"*.csv"` | CSV files in this directory |
| `"**/*.csv"` | CSV files in this directory and all subdirectories |
| `"reports/**"` | Everything inside `reports/` |
| `"reports/2024/q1.csv"` | That one specific file |

## 3. The `USER` token

There is a special `USER` token you can use in access lists. It means "whoever is currently requesting access." This only makes sense when combined with a **template pattern** that includes the user's identity in the path.

**Example: give each user access to their own folder**

Imagine you have a directory structure like:

```
shared/
alice@example.com/
bob@example.com/
```

You can write one rule that gives each user access to only their own folder:

```yaml
rules:
- pattern: '{{.UserEmail}}/**'
access:
read: ['USER']
write: ['USER']
admin: []
```

When Alice requests `shared/alice@example.com/file.txt`, the template `{{.UserEmail}}` resolves to `alice@example.com`, the path matches, and `USER` resolves to Alice. So she gets access to her folder but not Bob's.

Available template variables: `{{.UserEmail}}`, `{{.UserHash}}`, `{{.Year}}`, `{{.Month}}`, `{{.Date}}`.

**Important:** if you use `USER` without a template pattern, it just means "any authenticated user" -- the same as `*`.

## 4. Multiple permission files across directories

You can place `syft.pub.yaml` files at different levels of your directory tree. When someone accesses a file, the system walks from the root toward the file and picks **the single closest `syft.pub.yaml`**. Only that file's rules are used.

**There is no merging.** The closest file fully replaces any parent permission files. The parent's rules are completely ignored.

### Example

```
~/Datasite/
syft.pub.yaml # (A)
projects/
syft.pub.yaml # (B)
reports/
syft.pub.yaml # (C)
q1.csv
readme.txt
notes/
todo.txt
```

**File (A)** at the root -- everything private:

```yaml
rules:
- pattern: '**'
access:
read: []
write: []
admin: []
```

**File (B)** in `projects/` -- company can read:

```yaml
rules:
- pattern: '**'
access:
read: ['*@company.com']
write: []
admin: []
```

**File (C)** in `projects/reports/` -- only Alice can read CSVs:

```yaml
rules:
- pattern: '**/*.csv'
access:
read: ['alice@example.com']
write: []
admin: []

- pattern: '**'
access:
read: []
write: []
admin: []
```

Now let's trace what happens for each file:

**Accessing `projects/reports/q1.csv`:**
The closest permission file is **(C)**. The system checks (C)'s rules and finds that `**/*.csv` matches -- Alice can read. Note that the company-wide access from **(B)** does **not** apply here. File (B) is ignored because (C) is closer.

**Accessing `projects/reports/readme.txt`:**
The closest permission file is still **(C)**. The `**/*.csv` pattern doesn't match a `.txt` file. The `**` catch-all does match, and it says `read: []`. So nobody can read it. Again, (B)'s company-wide access does **not** fill in -- (C) is in charge and (C) says no.

**Accessing `projects/notes/todo.txt`:**
There is no permission file inside `notes/`. The system walks up and finds **(B)** as the closest. Anyone at `@company.com` can read.

**Accessing a file at the root:**
Only **(A)** exists on the path. Nobody can read (everything is private).

### What if no rule matches?

If the closest permission file has rules but none of them match the file being accessed, **access is denied**. The system does not fall back to a parent permission file.

### The `terminal` flag

Setting `terminal: true` on a permission file **prevents subdirectories from overriding it**. The system stops walking deeper and won't look at any `syft.pub.yaml` files further down the tree.

```yaml
terminal: true

rules:
- pattern: '**'
access:
read: []
write: []
admin: []
```

In the example above: if file **(B)** had `terminal: true`, then file **(C)** would be completely ignored. Everything under `projects/` -- including `projects/reports/q1.csv` -- would be governed by (B)'s rules only.

This is useful when you want to guarantee a folder stays locked down and no nested permission file can accidentally open it up.
20 changes: 20 additions & 0 deletions packages/syft-permissions/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
[project]
name = "syft-permissions"
version = "0.1.0"
description = "Permission system for Syft datasites"
authors = [{ name = "OpenMined", email = "info@openmined.org" }]
license = { text = "Apache-2.0" }
requires-python = ">=3.10"

dependencies = [
"pydantic>=2.11.7",
"pyyaml>=6.0",
"wcmatch>=10.0",
]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.hatch.build.targets.wheel]
packages = ["src/syft_permissions"]
16 changes: 16 additions & 0 deletions packages/syft-permissions/src/syft_permissions/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
from syft_permissions.engine.service import ACLService
from syft_permissions.engine.request import ACLRequest, AccessLevel, User
from syft_permissions.spec.ruleset import RuleSet, PERMISSION_FILE_NAME
from syft_permissions.spec.rule import Rule
from syft_permissions.spec.access import Access

__all__ = [
"ACLService",
"ACLRequest",
"AccessLevel",
"User",
"RuleSet",
"Rule",
"Access",
"PERMISSION_FILE_NAME",
]
Empty file.
101 changes: 101 additions & 0 deletions packages/syft-permissions/src/syft_permissions/engine/compiled_rule.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
import fnmatch
from dataclasses import dataclass

from syft_permissions.engine.matchers import (
SUPPORTED_TEMPLATE,
Matcher,
create_matcher,
)
from syft_permissions.engine.request import ACLRequest, AccessLevel
from syft_permissions.spec.access import Access
from syft_permissions.spec.rule import Rule


USER_PLACEHOLDER = "USER"


@dataclass
class ResolvedAccess:
"""Access lists after USER placeholders have been resolved.

Values can be:
- exact email: "alice@example.com"
- domain wildcard: "*@example.com"
- everyone: "*"

Unlike Access, "USER" never appears here — it has been replaced
with the requesting user's email (in template rules) or "*" (otherwise).
"""

admin: list[str]
write: list[str]
read: list[str]


class ACLRule:
def __init__(self, pattern: str, access: ResolvedAccess, matcher: Matcher):
self.pattern = pattern
self.access = access
self.matcher = matcher

def match(self, path: str, user: str) -> bool:
return self.matcher.match(path, user)

def has_admin(self, user_email: str) -> bool:
return _user_in_list(user_email, self.access.admin)

def has_write(self, user_email: str) -> bool:
return self.has_admin(user_email) or _user_in_list(
user_email, self.access.write
)

def has_read(self, user_email: str) -> bool:
return self.has_write(user_email) or _user_in_list(user_email, self.access.read)

def check_access(self, request: ACLRequest) -> bool:
user_email = request.user.id
if request.level == AccessLevel.ADMIN:
return self.has_admin(user_email)
if request.level == AccessLevel.WRITE:
return self.has_write(user_email)
return self.has_read(user_email)


def compile_rule(rule: Rule, user: str) -> ACLRule:
"""Compile a spec Rule into an ACLRule with resolved pattern and access lists."""
pattern = rule.pattern
access = rule.access

if SUPPORTED_TEMPLATE in pattern:
pattern = pattern.replace(SUPPORTED_TEMPLATE, user)
resolved = _resolve_access(access, user)
else:
resolved = _resolve_access(access, "*")

matcher = create_matcher(pattern)
return ACLRule(pattern=pattern, access=resolved, matcher=matcher)


def _resolve_access(access: Access, user_replacement: str) -> ResolvedAccess:
"""Resolve USER placeholders in access lists."""
return ResolvedAccess(
admin=_replace_in_list(access.admin, user_replacement),
write=_replace_in_list(access.write, user_replacement),
read=_replace_in_list(access.read, user_replacement),
)


def _replace_in_list(lst: list[str], replacement: str) -> list[str]:
return [replacement if item == USER_PLACEHOLDER else item for item in lst]


def _user_in_list(user_email: str, allowed: list[str]) -> bool:
for pattern in allowed:
if pattern == "*":
return True
if "*" in pattern or "?" in pattern:
if fnmatch.fnmatch(user_email, pattern):
return True
elif user_email == pattern:
return True
return False
Loading