Skip to content

Commit 9179c5e

Browse files
seanshahkaramiFranciscoLozCodingCopilot
authored
Add tools for loading manifest data from inventory tools (#64)
* added initial loading script * add inventory tools integration to manifest loading script * add cron job to periodically run manifest loading script * added TODO * refactor: migrate to new Docker setup with separate dev and prod configurations * chore: add 'env' to .dockerignore to exclude environment files * chore: add issuer-key.pem to .gitignore to exclude sensitive key files * refactor: update Makefile to support environment-specific Docker Compose configurations * docs: update README to clarify environment configurations and Make commands * remove not needed token * Make script executable * Update README.md remove duplicate word Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Move the dockerfile edits to dev, so I can test them first * fix: update cron job command to use manage.py shell for loading manifest * Added TODO * fix: update cron job command to redirect output and set permissions * fix: update docker-compose to set environment variables and start cron service * feat: add support for force-recreate in docker-compose for dev environment * fix: use settings.INVENTORY_TOOLS directly in load_manifest.py for cloning repository * feat: update inventory tools settings to include repository and version management * feat: update inventory tools configuration in settings and docker-compose for clarity * feat: add username and token settings for inventory tools authentication in load_manifest.py * feat: update inventory tools settings to remove username and streamline token usage * chore: update .gitignore to include data directory * feat: refactor subprocess handling in load_manifest.py for improved logging * Added instructions for setting up keys for INVENTORY_TOOLS * feat: add SSH directory validation and logging in load_manifest.py * updated readme * feat: add INV_TOOLS_SSH_DIR environment variable for SSH directory configuration * feat: configure SSH directory in docker-compose for inventory tools * feat: add INV_TOOLS_SSH_TOOLS environment variable for SSH tools directory * feat: update docker-compose to configure SSH tools and directory paths * feat: validate INV_TOOLS_SSH_TOOLS setting in load_manifest.py * feat: update Dockerfile to install rsync and configure safe directory for git * refactor: remove unused import of shutil in load_manifest.py * update TODOs * refactor: remove INV_TOOLS_SSH_DIR setting and only use INV_TOOLS_SSH_TOOLS * feat: configure safe directory for git in load_manifest.py * change SSH directory path in load_manifest.py * feat: add INV_TOOLS_SSH_TOOLS_PW to environment configuration * docs: add INV_TOOLS_SSH_TOOLS_PW passphrase requirement to README * feat: add INV_TOOLS_SSH_CONFIG and INV_TOOLS_SSH_TOOLS_PW to base settings * feat: enhance SSH configuration setup in load_manifest.py * feat: update SSH tools and configuration paths in docker-compose.yaml * feat: add shell option to run_subprocess and update ssh-agent setup * docs: update README with TODOs for INVENTORY_TOOLS environment variables and SSH setup instructions * feat: change directory to repo after executing git commands in load_manifest.py * refactor: clean up TODO comments in load_manifest.py * added TODO * refactor: changed the script to implement django's custom commands * refactor: remove unused Inventory Tools keys from base settings * feat: change working directory to REPO_DIR before scraping nodes and loading manifests * Added TODO * fix: update cron job to run autoloadmanifest every 30 minutes * added update create modem data handling in loadmanifest command * feat: enhance manifest loading by syncing node, modem, computes, and sensors * feat: add custom hardware option and improve hardware retrieval in sync process * feat: add comprehensive tests for loadmanifest command functionality * Made _sync_computes() easily extendable without changing core logic * fix: change custom hardware naming * Added compute and sensor mappers for easy extensions * refactor: abstract compute and sensor alias resolution with mappers * fix: skip processing for unreachable devices in loadmanifest command * refactor: update hardware resolution to use SensorHardware model and get_or_create * added TODO * add tests for handling reachable and unreachable devices * add notes for adding compute and sensor mappers * add k8s source in sensor mappers * feat: add node resource handling to load manifest command and tests * fix: update get_repo method to require token options when repo is url * uncomment line * fix: update inventory tools repository path and adjust comments for clarity * fix: update inventory tools repository path in docker-compose configuration * using vsns from inventory tools. added --no-scrape flag. check if node id exists before updating models. * updated hardware mapper to determined rpi 4gb vs 8gb from device model and resources. * added migration * included memory in test data * updated manifest data in tests to include model and memory --------- Co-authored-by: FranciscoLozCoding <francisco.lozano33@outlook.com> Co-authored-by: Francisco Lozano <77420455+FranciscoLozCoding@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent bd3cfa1 commit 9179c5e

File tree

13 files changed

+997
-16
lines changed

13 files changed

+997
-16
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,5 @@ fixtures/
77
.coverage
88
.DS_Store
99
.coverage.*
10-
issuer-key.pem
10+
data/
11+
*.pem

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ COPY requirements /app/requirements
66
RUN pip install --no-cache-dir -r requirements/prod.txt
77
COPY . /app
88
# include staticfiles for whitenoise
9-
RUN python manage.py collectstatic --noinput
9+
RUN python manage.py collectstatic --noinput

README.md

Lines changed: 53 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,18 +14,43 @@ There are two development / deployment configurations both in `env` folder:
1414
* dev: intended for fast, local dev on host machine. debug flags are enabled.
1515
* prod: intended for testing in docker compose prior to deploying to production cluster. debug flags are disabled and more security settings are enabled.
1616

17-
For both environments, you will have to set up the keys in `env/<environment>/.env` so that the `downloads` app can work.
17+
Optionally, for either of these environments you can configure...
18+
* user login via Globus OIDC
19+
* http redirect to files captured by nodes (`downloads` app)
20+
* Automatic updates of node manifests (`INVENTORY_TOOLS`)
1821

19-
- S3_ACCESS_KEY: This is the access key for your S3 storage
20-
- S3_SECRET_KEY: This is the secret key for your S3 storage
21-
- PELICAN_KEY_PATH: This is the path to the .pem file for Pelican in the docker container
22-
- PELICAN_KEY_ID: The id for the jwt public key used for Pelican found in `jwks.json` (https://sagecontinuum.org/.well-known/openid-configuration)
22+
Optionally, you can configure user login via Globus OIDC for either of these environments.
2323

24-
>NOTE: If you are not working on the `downloads` app this can be ignored.
24+
### Keys
25+
For both environments, you will have to set up these keys in `env/<environment>/.env` so that the `downloads` app can work.
2526

26-
Optionally, you can configure user login via Globus OIDC for either of these environments.
27+
- `S3_ACCESS_KEY`: This is the access key for your S3 storage
28+
- `S3_SECRET_KEY`: This is the secret key for your S3 storage
29+
- `PELICAN_KEY_PATH`: This is the path to the .pem file for Pelican in the docker container
30+
- `PELICAN_KEY_ID`: The id for the jwt public key used for Pelican found in `jwks.json` (https://sagecontinuum.org/.well-known/openid-configuration)
31+
32+
> NOTE: If you are not working on the `downloads` app this can be ignored.
33+
34+
You will also have to set up these keys in `env/<environment>/.env` so that `INVENTORY_TOOLS` can work. `INVENTORY_TOOLS` is used to update the manifest automatically.
35+
36+
- INV_TOOLS_TOKEN: This is a github token with clone/pull access to our [inventory tools repo](https://github.com/waggle-sensor/waggle-inventory-tools)
37+
- INV_TOOLS_SSH_TOOLS_PW: This is the passphrase for our ecdsa-sage-waggle SSH IdentityFile
38+
39+
>NOTE: If you are not working on `INVENTORY_TOOLS` this can be ignored.
40+
41+
### Environment Variables
42+
43+
>TODO: add instructions for INVENTORY_TOOLS env variables and how to set them up
44+
45+
### Volumes
46+
47+
>TODO: Add instructions in setting up waggle_inv_tools for ssh access to nodes within the django container
48+
49+
>TODO: Add a make command to set up INVENTORY_TOOLS ssh and ssh tools volume. aka clone repos and set up ssh config
2750
28-
### Local development using dev configuration
51+
>NOTE: If you are not working on `INVENTORY_TOOLS` this can be ignored.
52+
53+
## Local development using dev configuration
2954

3055
_I highly recommend creating a virtual env when working on the app. I typically use:_
3156

@@ -79,6 +104,25 @@ To implement the model edits to the server run:
79104
make migrate
80105
```
81106

107+
### Local development Inventory Tools
108+
109+
>TODO: add docs on running inventory tools locally
110+
111+
```sh
112+
./manage.py loadmanifest --repo <inventory_tools_local_path> --vsns
113+
W08E
114+
```
115+
116+
or to run for one node
117+
118+
```sh
119+
./manage.py loadmanifest --repo <inventory_tools_local_path> --vsns
120+
W08E
121+
```
122+
123+
>TODO: add docs on how to add mappers for computes and sensors
124+
125+
## Running a local production server
82126
### Running a local production server
83127

84128
To stand up the prod environment in docker compose, simply run:
@@ -105,7 +149,7 @@ Finally, when you're done working, you can stop everything using:
105149
make stop ENV=prod
106150
```
107151

108-
### Enable user login via Globus OIDC
152+
## Enable user login via Globus OIDC
109153

110154
You can configure user login via Globus OIDC by performing the following _one time_ setup:
111155

env/dev/.env

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,10 @@
11
S3_ACCESS_KEY=FILL_IN
22
S3_SECRET_KEY=FILL_IN
33
PELICAN_KEY_PATH=/app/env/issuer-key.pem #path to pem file in docker container
4-
PELICAN_KEY_ID=FILL_IN
4+
<<<<<<< HEAD
5+
PELICAN_KEY_ID=FILL_IN
6+
INV_TOOLS_TOKEN=FILL_IN
7+
INV_TOOLS_SSH_TOOLS_PW=FILL_IN
8+
=======
9+
PELICAN_KEY_ID=FILL_IN
10+
>>>>>>> main

env/dev/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ COPY requirements /app/requirements
66
RUN pip install --no-cache-dir -r requirements/dev.txt
77
COPY . /app
88
# include staticfiles for whitenoise
9-
RUN python manage.py collectstatic --noinput
9+
RUN python manage.py collectstatic --noinput

env/dev/docker-compose.yaml

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,12 @@ services:
3333
- "PELICAN_LIFETIME=60"
3434
- "PELICAN_ROOT_URL=https://nrdstor.nationalresearchplatform.org:8443/sage"
3535
- "PELICAN_ROOT_FOLDER=/node-data"
36-
env_file:
37-
- .env
36+
- "INV_TOOLS_REPO=/app/waggle-inventory-tools"
37+
#NOTE: uncomment these lines for automatic updates of the inventory tools repo
38+
# - "INV_TOOLS_REPO=https://github.com/waggle-sensor/waggle-inventory-tools.git"
39+
# - "INV_TOOLS_TOKEN=${INV_TOOLS_TOKEN}"
40+
- "INV_TOOLS_SSH_TOOLS=/root/git"
41+
- "INV_TOOLS_SSH_CONFIG=/root/ssh"
42+
- "INV_TOOLS_SSH_TOOLS_PW=${INV_TOOLS_SSH_TOOLS_PW}"
3843
volumes:
39-
- ../../:/app:rw
44+
- ../../:/app:rw
Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
"""Custom Django command to Automatically load manifest data using data scraped from nodes into the database."""
2+
import os
3+
import subprocess
4+
import re
5+
from environ import Env
6+
from manifests.management.commands.loadmanifest import Command as LoadManifestCommand
7+
8+
class Command(LoadManifestCommand):
9+
help = """
10+
Automatically Load manifest data using data scraped from nodes into the database. This builds off of
11+
the loadmanifest command but sets up SSH and scraping tools for nodes.
12+
"""
13+
env = Env()
14+
15+
def handle(self, *args, **options):
16+
"""
17+
Handle the command execution.
18+
"""
19+
# Check if required options are set
20+
if not self.check_required_options(options):
21+
return
22+
self.set_constants(options)
23+
24+
# Check if SSH directories exist
25+
if not self.check_ssh_dirs():
26+
return
27+
28+
self.log("Starting manifest loading process...")
29+
os.chdir(self.WORKDIR)
30+
31+
# Set up SSH and clone the repo
32+
self.set_ssh()
33+
self.get_repo(options)
34+
35+
# Get the list of VSNs to scrape/load
36+
vsns = self.get_vsns(options)
37+
self.scrape_nodes(vsns)
38+
self.load_manifests(vsns)
39+
40+
self.log("Manifest loading process completed.")
41+
42+
def add_arguments(self, parser):
43+
"""
44+
Add command line arguments for the command.
45+
"""
46+
parser.add_argument("--repo", type=str, default=self.env("INV_TOOLS_REPO", str, None), help="Inventory Tools Repository URL or local path")
47+
parser.add_argument("--token", type=str, default=self.env("INV_TOOLS_TOKEN", str, None), help="Github Token with access to INV_TOOLS_REPO URL")
48+
parser.add_argument("--repo_ver", type=str, default=self.env("INV_TOOLS_VERSION", str, None), help="Branch, tag, or commit SHA of INV_TOOLS_REPO URL to use")
49+
parser.add_argument("--ssh-tools", type=str, default=self.env("INV_TOOLS_SSH_TOOLS", str, None), help="Directory holding SSH tools used for SSHing into nodes")
50+
parser.add_argument("--ssh-config", type=str, default=self.env("INV_TOOLS_SSH_CONFIG", str, None), help="SSH directory holding config files")
51+
parser.add_argument("--ssh-pw", type=str, default=self.env("INV_TOOLS_SSH_TOOLS_PW", str, None), help="Password for SSH IdentityFile")
52+
parser.add_argument("--vsns", nargs="+", type=str, default=None, help="Optional list of VSNs to scrape/load. If not provided, all from DB will be used.")
53+
54+
def check_required_options(self, options, required=None):
55+
"""
56+
Check if required options are set. If not, log a message and return False.
57+
"""
58+
if required is None:
59+
required = {
60+
"repo": "INV_TOOLS_REPO",
61+
"ssh_tools": "INV_TOOLS_SSH_TOOLS",
62+
"ssh_config": "INV_TOOLS_SSH_CONFIG",
63+
"ssh_pw": "INV_TOOLS_SSH_TOOLS_PW",
64+
}
65+
for opt_key, setting_name in required.items():
66+
if not options.get(opt_key):
67+
self.log(f"{setting_name} is not set (add arg --{opt_key} or env variable {setting_name}). Manifest loading will fail.")
68+
return False
69+
return True
70+
71+
def set_constants(self, options):
72+
"""
73+
Set constants for the Command.
74+
"""
75+
self.WORKDIR = "/app"
76+
self.REPO = options["repo"]
77+
self.REPO_VERSION = options["repo_ver"]
78+
self.REPO_TOKEN = options["token"]
79+
self.SSH_PWD = options["ssh_pw"]
80+
self.SSH_TOOLS_DIR = options["ssh_tools"]
81+
self.SSH_TEMPLATE = options["ssh_config"]
82+
self.REPO_DIR = os.path.join(self.WORKDIR, "waggle-inventory-tools")
83+
self.DATA_DIR = os.path.join(self.REPO_DIR, "data")
84+
self.SSH_CONFIG_TEMPLATE = os.path.join(self.SSH_TEMPLATE, "config")
85+
self.HONEYHOUSE_DIR = os.path.join(self.SSH_TOOLS_DIR, "honeyhouse-config")
86+
self.PRIV_CONFIG_DIR = os.path.join(self.SSH_TOOLS_DIR, "private_config")
87+
self.DEVOPS_DIR = os.path.join(self.SSH_TOOLS_DIR, "devOps")
88+
89+
def check_ssh_dirs(self):
90+
"""
91+
Check if the SSH directories and files exist.
92+
"""
93+
paths = [self.SSH_TEMPLATE, self.SSH_CONFIG_TEMPLATE, self.HONEYHOUSE_DIR, self.PRIV_CONFIG_DIR, self.DEVOPS_DIR]
94+
for path in paths:
95+
if not os.path.exists(path):
96+
self.log(f"{path} does not exist. Manifest loading will not proceed.")
97+
return False
98+
return True
99+
100+
def is_commit_sha(self, ref):
101+
"""Return True if ref looks like a git commit SHA."""
102+
return len(ref) >= 7 and all(c in "0123456789abcdef" for c in ref.lower())
103+
104+
def get_repo(self, options):
105+
"""
106+
Clone the inventory tools repository if it doesn't exist, or use the cached one.
107+
Then checkout the specified version (branch, tag, or commit SHA).
108+
"""
109+
if os.path.exists(self.REPO): # Local repo directory
110+
self.log(f"Using local repo directory: {self.REPO}")
111+
self.REPO_DIR = self.REPO # Override default REPO_DIR
112+
else:
113+
# Check if required options are set for remote repo
114+
required = {"repo": "INV_TOOLS_REPO", "token": "INV_TOOLS_TOKEN",}
115+
if not self.check_required_options(options, required):
116+
return
117+
# Remote repo URL (e.g., https://...)
118+
auth_repo_url = self.REPO.replace("https://", f"https://{self.REPO_TOKEN}@")
119+
if not os.path.exists(self.REPO_DIR):
120+
self.log(f"Cloning repo from {self.REPO} to {self.REPO_DIR}")
121+
self.run_subprocess(["git", "clone", auth_repo_url, self.REPO_DIR])
122+
else:
123+
self.log(f"Using cached repo at {self.REPO_DIR}")
124+
125+
# Ensure Git knows it's safe to operate here
126+
self.run_subprocess(["git", "config", "--global", "--add", "safe.directory", self.REPO_DIR])
127+
self.run_subprocess(["git", "-C", self.REPO_DIR, "fetch", "--all", "--tags"])
128+
129+
# Checkout branch/tag/commit
130+
if not self.REPO_VERSION:
131+
self.log("Checking out latest from main branch.")
132+
self.run_subprocess(["git", "-C", self.REPO_DIR, "checkout", "main"])
133+
self.run_subprocess(["git", "-C", self.REPO_DIR, "pull", "origin", "main"])
134+
else:
135+
self.log(f"Checking out version: {self.REPO_VERSION}")
136+
self.run_subprocess(["git", "-C", self.REPO_DIR, "fetch", "--all", "--tags"])
137+
if self.is_commit_sha(self.REPO_VERSION):
138+
# It's a commit SHA
139+
self.run_subprocess(["git", "-C", self.REPO_DIR, "checkout", self.REPO_VERSION])
140+
else:
141+
# Check if the version is a branch or tag
142+
result = subprocess.run(
143+
["git", "-C", self.REPO_DIR, "rev-parse", "--verify", f"origin/{self.REPO_VERSION}"],
144+
stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL
145+
)
146+
if result.returncode == 0:
147+
# It's a branch
148+
self.run_subprocess(["git", "-C", self.REPO_DIR, "checkout", self.REPO_VERSION])
149+
self.run_subprocess(["git", "-C", self.REPO_DIR, "pull", "origin", self.REPO_VERSION])
150+
else:
151+
# It's a tag
152+
self.run_subprocess(["git", "-C", self.REPO_DIR, "checkout", f"tags/{self.REPO_VERSION}"])
153+
154+
# Set working directory to the repo
155+
os.chdir(self.REPO_DIR)
156+
157+
def set_ssh(self):
158+
"""
159+
Set up SSH configuration for the nodes.
160+
"""
161+
ssh_dir = os.path.expanduser("~/.ssh")
162+
ssh_config = os.path.join(ssh_dir, "config")
163+
ssh_key_path = os.path.join(self.PRIV_CONFIG_DIR ,"misc/waggle-sage/ecdsa-sage-waggle")
164+
165+
# Check if SSH config file exists
166+
if not os.path.exists(ssh_config):
167+
self.log("Setting up SSH config for nodes.")
168+
self.run_subprocess(["cp", "-r", self.SSH_TEMPLATE, ssh_dir])
169+
self.run_subprocess(["mkdir", "-p", os.path.join(ssh_dir, "master-socket")])
170+
171+
# Start ssh-agent
172+
self.log("Starting ssh-agent...")
173+
output = subprocess.check_output(["ssh-agent", "-s"], text=True)
174+
175+
# Extract SSH_AUTH_SOCK and SSH_AGENT_PID using regex
176+
for line in output.splitlines():
177+
match = re.match(r'(\w+)=([^;]+);', line)
178+
if match:
179+
key, value = match.groups()
180+
os.environ[key] = value
181+
182+
# Add the key to the agent
183+
self.run_subprocess(["ssh-add", ssh_key_path], input_data=self.SSH_PWD + "\n")
184+
185+
# NOTE:
186+
# - The scraped node data will go into the directory /app/waggle-inventory-tools/data/<vsn>/.
187+
# /app/waggle-inventory-tools/data will be an attached volume to systems' /etc/waggle/manifest so users can access it outside of the container/pod.

0 commit comments

Comments
 (0)