diff --git a/docs/source/auth-providers.md b/docs/source/auth-providers.md index e5ca201..ce812eb 100644 --- a/docs/source/auth-providers.md +++ b/docs/source/auth-providers.md @@ -23,6 +23,7 @@ Giftless provides the following authentication and authorization modules by defa * `giftless.auth.jwt:JWTAuthenticator` - uses [JWT tokens](https://jwt.io/) to both identify the user and grant permissions based on scopes embedded in the token payload. +* `giftless.auth.github:GithubAuthenticator` - uses [GitHub Personal Access Tokens](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) to both identify the user and grant permissions based on those for a GitHub repository of the same organization/name. * `giftless.auth.allow_anon:read_only` - grants read-only permissions on everything to every request; Typically, this is only useful in testing environments or in very limited deployments. @@ -75,7 +76,7 @@ Basic HTTP authentication. You can disable this functionality or change the expected username using the `basic_auth_user` configuration option. -### Configuration Options +### `giftless.auth.jwt` Configuration Options The following options are available for the `jwt` auth module: * `algorithm` (`str`): JWT algorithm to use, e.g. `HS256` (default) or `RS256`. Must match the algorithm @@ -191,6 +192,37 @@ The `leeway` parameter allows for providing a leeway / grace time to be considered when checking expiry times, to cover for clock skew between servers. +## GitHub Authenticator +This authenticator lets you provide a frictionless LFS backend for existing GitHub repositories. It plays nicely with `git` credential helpers and allows you to use GitHub as the single authentication & authorization provider. + +### Details +The authenticator uses [GitHub Personal Access Tokens](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens), the same ones used for cloning a GitHub repo over HTTPS. The provided token is used in a couple GitHub API calls that identify the token's identity and [its permissions](https://docs.github.com/en/rest/collaborators/collaborators?apiVersion=2022-11-28#get-repository-permissions-for-a-user) for the GitHub organization & repository. The token is supposed to be passed in the password part of the `Basic` HTTP auth (username is ignored). `Bearer` token HTTP auth is also supported, although no git client will likely use it. + +For the authenticator to work properly the token must have the `read:org` for "Classic" or `metadata:read` permission for the fine-grained kind. + + Note: Authentication via SSH that could be used to verify the user is [not possible with GitHub at the time of writing](https://github.com/datopian/giftless/issues/128#issuecomment-2037190728). + +The GitHub repository permissions are mapped to [Giftless permissions](#permissions) in the straightforward sense that those able to write will be able to write, same with read; invalid tokens or identities with no repository access will get rejected. + +To minimize the traffic to GitHub for each LFS action, most of the auth data is being temporarily cached in memory, which improves performance, but naturally also ignores immediate changes for identities with changed permissions. + +### GitHub Auth Flow +Here's a description of the authentication & authorization flow. If any of these steps fails, the request gets rejected. + +1. The URI of the primary git LFS (HTTP) [`batch` request](https://github.com/git-lfs/git-lfs/blob/main/docs/api/batch.md) is used (as usual) to determine what GitHub organization and repository is being targeted (e.g. `https:////.git/info/lfs/...`). The request's `Authentication` header is also searched for the required GitHub personal access token. +2. The token is then used in a [`/user`](https://docs.github.com/en/rest/users/users?apiVersion=2022-11-28#get-the-authenticated-user) GitHub API call to get its identity data. +3. Further on the GitHub API is asked for the [user's permissions](https://docs.github.com/en/rest/collaborators/collaborators?apiVersion=2022-11-28#get-repository-permissions-for-a-user) to the org/repo in question. +4. Based on the information above the user will be granted or rejected access. + +### `giftless.auth.github` Configuration Options +* `api_url` (`str` = `"https://api.github.com"`): Base URL for the GitHub API (enterprise servers have API at `"https:///api/v3/"`). +* `api_version` (`str | None` = `"2022-11-28"`): Target GitHub API version; set to `None` to use GitHub's latest (rather experimental). +* `cache` (`dict`): Cache configuration section + * `token_max_size` (`int` = `32`): Max number of entries in the token -> user LRU cache. This cache holds the authentication data for a token. Evicted tokens will need to be re-authenticated. + * `auth_max_size` (`int` = `32`): Max number of [un]authorized org/repos TTL(LRU) for each user. Evicted repos will need to get re-authorized. + * `auth_write_ttl` (`float` = `15 * 60`): Max age [seconds] of user's org/repo authorizations able to `WRITE`. A repo writer will also need to be re-authorized after this period. + * `auth_other_ttl` (`float` = `30`): Max age [seconds] of user's org/repo authorizations **not** able to `WRITE`. A repo reader or a rejected user will get a chance for a permission upgrade after this period. + ## Understanding Authentication and Authorization Providers This part is more abstract, and will help you understand how Giftless handles @@ -220,6 +252,10 @@ Very simply, an `Identity` object encapsulates information about the current use request, and is expected to have the following interface: ```python +from typing import Optional +from giftless.auth.identity import Permission + + class Identity: name: Optional[str] = None id: Optional[str] = None @@ -244,9 +280,12 @@ Authorizer classes may use the default built-in `DefaultIdentity`, or implement subclass of their own. #### Permissions -Giftless defines the following permissions on entites: +Giftless defines the following permissions on entities: ```python +from enum import Enum + + class Permission(Enum): READ = "read" READ_META = "read-meta" diff --git a/docs/source/conf.py b/docs/source/conf.py index 2e1204a..58de00d 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -11,7 +11,7 @@ # documentation root, use os.path.abspath to make it absolute, like shown here. # import os -import importlib +import importlib.metadata from recommonmark.transform import AutoStructify diff --git a/docs/source/configuration.md b/docs/source/configuration.md index 735082b..84cb552 100644 --- a/docs/source/configuration.md +++ b/docs/source/configuration.md @@ -126,5 +126,17 @@ clients using these URLs. By default, the JWT auth provider is used here. There is typically no need to override the default behavior. +#### `LEGACY_ENDPOINTS` +This is a `bool` flag, default `true` (deprecated, use `false` where possible), that affects the base URI of all the service endpoints. Previously, the endpoints didn't adhere to the rules for [automatic LFS server discovery](https://github.com/git-lfs/git-lfs/blob/main/docs/api/server-discovery.md), which needed additional routing or client configuration. + +The default base URI for all giftless endpoints is now `//.git/info/lfs` while the legacy one is `//`. +* `` is a simple organization name not containing slashes (common for GitHub) +* `` is a more versatile organization path which can contain slashes (common for GitLab) +* `` is a simple repository name not containing slashes + +With `LEGACY_ENDPOINTS` set to `true`, **both the current and legacy** endpoints work simultaneously. When using the `basic_streamimg` transfer adapter, for backward compatibility it is the **legacy URI** that is being used for the object URLs in the batch API responses. + +Setting `LEGACY_ENDPOINTS` to `false` makes everything use the current base URI, requests to the legacy URIs will get rejected. + #### `DEBUG` If set to `true`, enables more verbose debugging output in logs. diff --git a/docs/source/github-lfs.md b/docs/source/github-lfs.md new file mode 100644 index 0000000..3768c24 --- /dev/null +++ b/docs/source/github-lfs.md @@ -0,0 +1,58 @@ +Shadowing GitHub LFS +==================== + +This guide shows how to use Giftless as the LFS server for an existing GitHub repository (not using GitHub LFS). Thanks to a handful tricks it also acts as a full remote HTTPS-based `git` repository, making this a zero client configuration setup. + +This guide uses `docker compose`, so you need to [install it](https://docs.docker.com/compose/install/). It also relies on you using HTTPS for cloning GitHub repos. The SSH way is not supported. + +### Running docker containers +To run the setup, `git clone https://github.com/datopian/giftless`, step into the `examples/github-lfs` and run `docker compose up`. + +This will run two containers: +- `giftless`: Locally built Giftless server configured to use solely the [GitHub authentication provider](auth-providers.md#github-authenticator) and a local docker compose volume as the storage backend. +- `proxy`: An [Envoy reverse proxy](https://www.envoyproxy.io/) which acts as the frontend listening on a local port 5000, configured to route LFS traffic to `giftless` and pretty much anything else to `[api.]github.com`. **The proxy listens at an unencrypted HTTP**, setting the proxy to provide TLS termination is very much possible, but isn't yet covered (your turn, thanks for the contribution!). + +Feel free to explore the `compose.yaml`, which contains all the details. + +### Cloning a GitHub repository via proxy +The frontend proxy forwards the usual `git` traffic to GitHub, so go there and pick/create some testing repository where you have writable access and clone it via the proxy hostname (just change `github.com` for wherever you host): +```shell +git clone http://localhost:5000/$YOUR_ORG/$YOUR_REPO +``` +When you don't use a credential helper, you might get asked a few times for the same credentials before the call gets through. [Make sure to get one](https://git-scm.com/doc/credential-helpers) before it drives you insane. + +Thanks to the [automatic LFS server discovery](https://github.com/git-lfs/git-lfs/blob/main/docs/api/server-discovery.md) this is all you should need to become LFS-enabled! + +### Pushing binary blobs +Let's try pushing some binary blobs then! See also [Quickstart](quickstart.md#create-a-local-repository-and-push-some-file). +```shell +# create some blob +dd if=/dev/urandom of=blob.bin bs=1M count=1 +# make it tracked by LFS +git lfs track blob.bin +# the LFS tracking is written in .gitattributes, which you also want committed +git add .gitattributes blob.bin +git commit -m 'Hello LFS!' +# push it, assuming the local branch is main +# this might fail for the 1st time, when git automatically runs 'git config lfs.locksverify false' +git push -u origin main +``` + +This should eventually succeed, and you will find the LFS digest in place of the blob on GitHub and the binary blob on your local storage: +```shell +docker compose exec -it giftless find /lfs-storage +/lfs-storage +/lfs-storage/$YOUR_ORG +/lfs-storage/$YOUR_ORG/$YOUR_REPO +/lfs-storage/$YOUR_ORG/$YOUR_REPO/deadbeefb10bb10bad40beaa8c68c4863e8b00b7e929efbc6dcdb547084b01 +``` + +Next time anyone clones the repo (via the proxy), the binary blob will get properly downloaded. Failing to use the proxy hostname will make `git` use GitHub's own LFS, which is a paid service you are obviously trying to avoid. + +### Service teardown + +Finally, to shut down your containers, break (`^C`) the current compose run and clean up dead containers with: +```shell +docker compose down [--volumes] +``` +Using `--volumes` tears down the `lfs-storage` volume too, so make sure it's what you wanted. \ No newline at end of file diff --git a/docs/source/guides.rst b/docs/source/guides.rst index fdca645..bee0b81 100644 --- a/docs/source/guides.rst +++ b/docs/source/guides.rst @@ -9,3 +9,4 @@ This section includes several how-to guides designed to get you started with Gif quickstart using-gcs jwt-auth-guide + github-lfs diff --git a/examples/github-lfs/.env b/examples/github-lfs/.env new file mode 100644 index 0000000..6f6f800 --- /dev/null +++ b/examples/github-lfs/.env @@ -0,0 +1,6 @@ +# listening (proxy) port on the host +SERVICE_PORT=5000 +# inner port giftless listens on +GIFTLESS_PORT=5000 +# inner port the reverse proxy listens on +PROXY_PORT=8080 diff --git a/examples/github-lfs/compose.yaml b/examples/github-lfs/compose.yaml new file mode 100644 index 0000000..65f928a --- /dev/null +++ b/examples/github-lfs/compose.yaml @@ -0,0 +1,162 @@ +name: github-lfs + +volumes: + lfs-storage: {} + +services: + giftless: + image: docker.io/datopian/giftless:latest + volumes: + - lfs-storage:/lfs-storage + environment: + GIFTLESS_DEBUG: "1" + GIFTLESS_CONFIG_STR: | + # use endpoints at //.git/info/lfs/ only + LEGACY_ENDPOINTS: false + AUTH_PROVIDERS: + - factory: giftless.auth.github:factory + TRANSFER_ADAPTERS: + basic: + factory: giftless.transfer.basic_streaming:factory + options: + # use the lfs-storage volume as local storage + storage_class: giftless.storage.local_storage:LocalStorage + storage_options: + path: /lfs-storage + # disable the default JWT pre-auth provider, object up/downloads get also authorized via GitHub + PRE_AUTHORIZED_ACTION_PROVIDER: null + command: "--http=0.0.0.0:$GIFTLESS_PORT -M -T --threads 2 -p 2 --manage-script-name --callable app" + pull_policy: never # prefer local build + build: + cache_from: + - docker.io/datopian/giftless:latest + context: ../.. + + proxy: + image: docker.io/envoyproxy/envoy:v1.30-latest + configs: + - source: envoy + target: /etc/envoy/envoy.yaml + command: "/usr/local/bin/envoy -c /etc/envoy/envoy.yaml" + ports: + - "$SERVICE_PORT:$PROXY_PORT" + depends_on: + giftless: + condition: service_started + +configs: + envoy: + content: | + static_resources: + listeners: + - address: + socket_address: + address: 0.0.0.0 + port_value: $PROXY_PORT # proxy port + filter_chains: + - filters: + - name: envoy.filters.network.http_connection_manager + typed_config: + "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager + stat_prefix: ingress_http + http_filters: + - name: envoy.filters.http.router + typed_config: + "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router + suppress_envoy_headers: true + access_log: + - name: envoy.access_loggers.file + typed_config: + "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog + path: /dev/stdout + generate_request_id: false + preserve_external_request_id: true + route_config: + name: ingress_route + virtual_hosts: + - name: giftless + domains: + - "*" + routes: + - name: giftless + # Only this goes to the giftless service + match: + safe_regex: + regex: (?:/[^/]+){2,}\.git/info/lfs(?:/.*|$) + route: + timeout: 0s # don't break long-running downloads + cluster: giftless + - name: api_github_com + # Routing 3rd party tools assuming this is a GitHub Enterprise URL /api/v#/X to public api.github.com/X + match: + safe_regex: &api_regex + regex: /api/v\d(?:/(.*)|$) + route: + regex_rewrite: + pattern: *api_regex + substitution: /\1 + host_rewrite_literal: api.github.com + timeout: 3600s + cluster: api_github_com + request_headers_to_remove: + - x-forwarded-proto + - name: github_com + # Anything else is forwarded directly to GitHub + match: + prefix: "/" + route: + host_rewrite_literal: github.com + timeout: 3600s + cluster: github_com + request_headers_to_remove: + - x-forwarded-proto + clusters: + - name: giftless + connect_timeout: 0.25s + type: strict_dns + lb_policy: round_robin + load_assignment: + cluster_name: giftless + endpoints: + - lb_endpoints: + - endpoint: + address: + socket_address: + address: giftless # inner giftless hostname + port_value: $GIFTLESS_PORT # local giftless port + - name: api_github_com + type: logical_dns + # Comment out the following line to test on v6 networks + dns_lookup_family: v4_only + load_assignment: + cluster_name: api_github_com + endpoints: + - lb_endpoints: + - endpoint: + address: + socket_address: + address: api.github.com + port_value: 443 + transport_socket: + name: envoy.transport_sockets.tls + typed_config: + "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext + sni: api.github.com + - name: github_com + type: logical_dns + # Comment out the following line to test on v6 networks + dns_lookup_family: v4_only + load_assignment: + cluster_name: github_com + endpoints: + - lb_endpoints: + - endpoint: + address: + socket_address: + address: github.com + port_value: 443 + transport_socket: + name: envoy.transport_sockets.tls + typed_config: + "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext + sni: github.com