OpenMined
diff --git a/‎README.md‎
Lines changed: 8 additions & 1 deletion b/‎README.md‎
Lines changed: 8 additions & 1 deletion
diff --git a/‎docs/API.md‎
Lines changed: 199 additions & 0 deletions b/‎docs/API.md‎
Lines changed: 199 additions & 0 deletions
diff --git a/‎docs/auth.md‎
Lines changed: 84 additions & 0 deletions b/‎docs/auth.md‎
Lines changed: 84 additions & 0 deletions
diff --git a/‎docs/workflow.md‎
Lines changed: 43 additions & 0 deletions b/‎docs/workflow.md‎
Lines changed: 43 additions & 0 deletions
@@ -8,6 +8,13 @@
 
 Syft client lets data scientists submit computations which are ran by data owners on private data — all through cloud storage their organizations already use (Google Drive, Microsoft 365, etc.). No new infrastructure required.
 
+## Docs
+
+- [Workflow](docs/workflow.md) — End-to-end privacy-preserving data analysis workflow
+- [API Reference](docs/API.md) — All public client methods and properties
+- [Authentication & Setup](docs/auth.md) — Google Cloud OAuth setup for local/Jupyter usage
+- [Background Services](packages/syft-bg/README.md) — Email notifications, auto-approval, and TUI dashboard
+
 ## Features
 
 - **Privacy-preserving** — Private data never leaves the data owner's machine; only approved results are shared
@@ -28,7 +35,7 @@ import syft_client as sc
 ```
 
 ```python
-# Login
+# Login (colab auth, for non-colab pass token_path)
 do = sc.login_do(email="do@org.com")
 ds = sc.login_ds(email="ds@org.com")
 
 
@@ -0,0 +1,199 @@
+# Client API Reference
+
+## Creating a Client
+
+### `login_do(email, token_path=None)`
+
+Create a Data Owner client.
+
+```python
+# Google Colab
+do_client = login_do(email="owner@example.com")
+
+# Jupyter Lab (local)
+do_client = login_do(email="owner@example.com", token_path="path/to/token.json")
+```
+
+### `login_ds(email, token_path=None)`
+
+Create a Data Scientist client.
+
+```python
+# Google Colab
+ds_client = login_ds(email="scientist@example.com")
+
+# Jupyter Lab (local)
+ds_client = login_ds(email="scientist@example.com", token_path="path/to/token.json")
+```
+
+---
+
+## Properties
+
+### `client.email`
+
+The email address of the client.
+
+### `client.peers`
+
+Get the list of peers. Auto-syncs before returning.
+
+- **DO**: Returns approved peers followed by pending peer requests.
+- **DS**: Returns all connected peers.
+
+Returns a `PeerList`.
+
+### `client.jobs`
+
+Get the list of jobs. Auto-syncs before returning.
+
+Returns a `JobsList`.
+
+### `client.datasets`
+
+Get the dataset manager. Auto-syncs before returning.
+
+Returns a `SyftDatasetManager`. Use `.get_all()` or `.get(name, datasite)` to query datasets.
+
+---
+
+## Peer Management
+
+### `client.add_peer(peer_email)`
+
+Request a peer connection.
+
+- **DS** calls this to request access to a DO.
+- The DO must approve the request before syncing is enabled.
+
+```python
+ds_client.add_peer("owner@example.com")
+```
+
+### `client.load_peers()`
+
+Reload the peer list from the transport layer.
+
+### `client.approve_peer_request(email_or_peer)`
+
+Approve a pending peer request. **DO only.**
+
+```python
+do_client.approve_peer_request("scientist@example.com")
+```
+
+### `client.reject_peer_request(email_or_peer)`
+
+Reject a pending peer request. **DO only.**
+
+```python
+do_client.reject_peer_request("scientist@example.com")
+```
+
+---
+
+## Syncing
+
+### `client.sync(auto_checkpoint=True, checkpoint_threshold=50)`
+
+Sync local state with Google Drive.
+
+- **DO**: Pulls incoming messages from approved peers and optionally creates a checkpoint.
+- **DS**: Pushes pending changes and pulls results from peers.
+
+```python
+client.sync()
+```
+
+---
+
+## Datasets
+
+### `client.create_dataset(name, mock_path, private_path=None, summary=None, users=None, upload_private=False)`
+
+Create and upload a dataset. **DO only.**
+
+- `mock_path`: Path to public mock data (shared with approved peers).
+- `private_path`: Path to private data (never leaves the DO).
+- `users`: List of emails to share with, or `"any"` for all approved peers.
+
+```python
+do_client.create_dataset(
+    name="my dataset",
+    mock_path="/path/to/mock.csv",
+    private_path="/path/to/private.csv",
+    summary="Example dataset",
+    users=["scientist@example.com"],
+)
+```
+
+### `client.delete_dataset(name, datasite)`
+
+Delete a dataset. **DO only.**
+
+```python
+do_client.delete_dataset(name="my dataset", datasite="owner@example.com")
+```
+
+### `client.share_dataset(tag, users)`
+
+Share an existing dataset with additional users. **DO only.**
+
+- `tag`: Dataset name.
+- `users`: List of email addresses or `"any"`.
+
+```python
+do_client.share_dataset("my dataset", users=["new_user@example.com"])
+```
+
+---
+
+## Jobs
+
+### `client.submit_python_job(user, code_path, job_name=None, entrypoint=None)`
+
+Submit a Python job to a Data Owner. **DS only.**
+
+- `user`: DO email to submit the job to.
+- `code_path`: Path to a Python script or folder.
+- `entrypoint`: Entry script (auto-detected if `main.py` exists in folder).
+
+```python
+ds_client.submit_python_job(
+    user="owner@example.com",
+    code_path="/path/to/script.py",
+)
+```
+
+### `client.submit_bash_job(user, code_path, job_name=None)`
+
+Submit a bash job to a Data Owner. **DS only.**
+
+```python
+ds_client.submit_bash_job(
+    user="owner@example.com",
+    code_path="/path/to/script.sh",
+)
+```
+
+### `client.process_approved_jobs(stream_output=True, timeout=None, force_execution=False)`
+
+Run all approved jobs. **DO only.**
+
+- `stream_output`: Stream stdout/stderr in real-time.
+- `timeout`: Timeout in seconds per job (default: 300).
+- `force_execution`: Skip version compatibility checks.
+
+```python
+do_client.process_approved_jobs()
+```
+
+---
+
+## Cleanup
+
+### `client.delete_syftbox(verbose=True, broadcast_delete_events=True)`
+
+Delete all SyftBox state: Google Drive files, local caches, and local folder.
+
+- `broadcast_delete_events`: Notify approved peers about deleted files before cleanup.
@@ -0,0 +1,84 @@
+# Authentication
+
+When you log in with a Gmail account in Google Colab, Colab handles authentication automatically via a browser pop-up. Once authenticated, Syft Client uses Google Drive as its communication protocol — all messages, events, and files are synced through the Drive API.
+
+**If you're using Google Colab, you can skip the rest of this page.**
+
+## Local / Jupyter Lab Setup
+
+To use Syft Client outside of Google Colab, you need to set up a Google Cloud project with OAuth credentials.
+
+## Step 1: Create a Google Cloud Project
+
+1. Go to [Google Cloud Console](https://console.cloud.google.com/)
+2. Click **Select a project** in the top navigation bar
+3. Click **New Project** in the dialog that appears
+4. Enter a project name (e.g., "Syft Client")
+5. Click **Create**
+6. Wait for the project to be created, then select it
+
+## Step 2: Enable the Google Drive API
+
+1. In your project, go to **APIs & Services** > **Library**
+2. Search for "Google Drive API"
+3. Click on **Google Drive API**
+4. Click **Enable**
+
+## Step 3: Configure OAuth Consent Screen
+
+1. Go to **APIs & Services** > **OAuth consent screen**
+2. Select **External** user type (unless you have a Google Workspace organization)
+3. Click **Create**
+4. Fill in the required fields:
+   - **App name**: "Syft Client" (or your preferred name)
+   - **User support email**: Your email address
+   - **Developer contact information**: Your email address
+5. Click **Save and Continue**
+6. On the **Scopes** page:
+   - Click **Add or Remove Scopes**
+   - Search for and select `https://www.googleapis.com/auth/drive`
+   - Click **Update**
+   - Click **Save and Continue**
+7. On the **Test users** page:
+   - Click **Add Users**
+   - Add the email addresses of users who will test the app
+   - Click **Save and Continue**
+8. Review the summary and click **Back to Dashboard**
+
+## Step 4: Create OAuth Client Credentials
+
+1. Go to **APIs & Services** > **Credentials**
+2. Click **Create Credentials** > **OAuth client ID**
+3. Select **Desktop app** as the application type
+4. Enter a name (e.g., "Syft Client Desktop")
+5. Click **Create**
+6. **Download the JSON file** - this contains your client credentials
+7. Save this file securely (e.g., as `credentials.json`)
+
+## Step 5: Publish the App
+
+For testing, your app can remain in "Testing" mode with up to 100 test users. To allow any Google user to authenticate:
+
+1. Go to **APIs & Services** > **OAuth consent screen**
+2. Click **Publish App**
+3. Confirm the publishing
+
+**Important:** If your app is not published (i.e., remains in "Testing" mode), OAuth tokens expire every 7 days and users will need to re-authenticate. Publishing the app removes this limitation.
+
+> **Note:** Publishing may require verification for apps requesting sensitive scopes like Google Drive access.
+
+## Generating a Token
+
+Once you've completed the Google Cloud Console setup, generate a token:
+
+```bash
+python scripts/create_token.py --credentials path/to/credentials.json --output token.json
+```
+
+Then pass the token path when logging in:
+
+```python
+do_client = login_do(email="your@email.com", token_path="path/to/token.json")
+```
+
+If your app is not published, tokens expire every 7 days and you'll need to regenerate them.
@@ -0,0 +1,43 @@
+# Privacy-Preserving Data Analysis Workflow
+
+The following diagram demonstrates the complete workflow for privacy-preserving data analysis using Beach Notebooks, involving both the Data Owner (DO) and Data Scientist (DS).
+
+```mermaid
+sequenceDiagram
+    participant DO as Data Owner
+    participant DON as DO Notebook
+    participant DSN as DS Notebook
+    participant DS as Data Scientist
+
+    Note over DO,DON: 1. Dataset Publication
+    DON->>DO: Create & publish dataset
+    DO-->>DS: Dataset available
+
+    Note over DS,DSN: 2. Mock Data Testing
+    DSN->>DS: Download mock data
+    DS->>DSN: Test analysis code
+
+    Note over DS,DSN: 3. Job Submission
+    DSN->>DO: Submit analysis job
+
+    Note over DO,DON: 4. Job Review
+    DON->>DO: View pending jobs
+    DO->>DON: Review code
+    DON->>DO: Approve job
+
+    Note over DO,DON: 5. Job Processing
+    DON->>DO: Process approved jobs
+    DO->>DS: Results available
+
+    Note over DS,DSN: 6. View Results
+    DSN->>DS: Retrieve results
+```
+
+## Workflow Steps
+
+1.  **Dataset Publication**: The Data Owner publishes a dataset with both mock (public) and private components.
+2.  **Mock Data Testing**: The Data Scientist downloads the mock data to explore the structure and test their analysis code locally.
+3.  **Job Submission**: Once satisfied with the code on mock data, the Data Scientist submits the analysis job to the Data Owner.
+4.  **Job Review**: The Data Owner views pending jobs, reviews the code for safety and privacy, and approves it.
+5.  **Job Processing**: The Data Owner processes the approved jobs, executing the code on the private data in a controlled environment.
+6.  **View Results**: The Data Scientist retrieves the results of the analysis.