The metaeffekt-scancode-service is a service using and extendind the AboutCode ScanCode Toolkit. At the core metaeffekt-scancode-service package is a web service that can answer any number of scan requests after launch.
Using the ScanCode Toolkit on command-line-level comes with a certain penalty for bootstrapping the scan. To avoid these costs a local service is established that performs the bootstrapping on startup and then can happily can be used to execute scans on the local filesystem.
Furthermore, we activate/add some features of ScanCode that are otherwise not easily accessible:
- Including all rights reserved in copyright statements
- Preserve punctuation in copyright statements
This project uses a modern pyproject.toml
. You can use it with any compatible packaging tool.
I.e. use pythons build frontend for packaging. Call from the project directory
python -m build
or
python3 -m build
This will build a python wheel that can be installed by pip, for example.
Please note that in some systems python3 must be called not with the python
command but with python3
.
It's recommended to install into a virtual environment. Either use
python -m venv scancode-extensions
source scancode-extensions/bin/activate
python -m pip install WHEEL_FILE
or if you have pipx installed, use
python -m pipx install WHEEL_FILE
or
pipx install WHEEL_FILE --force
The service requires that a few environment variables are set. At startup, it checks if these variables exist. If not the service refuses to start. The variables configure the paths for ScanCode Toolkits cache, index cache and temporary files. To configure these use
export SCANCODE_TEMP=/var/opt/scancode/temp
export SCANCODE_CACHE=/var/opt/scancode/cache
export SCANCODE_LICENSE_INDEX_CACHE=/var/opt/scancode/lcache
Recommended configuration on macOS:
export SCANCODE_TEMP=/var/tmp/scancode/temp
export SCANCODE_CACHE=/var/tmp/scancode/cache
export SCANCODE_LICENSE_INDEX_CACHE=/var/tmp/scancode/lcache
You can use any paths you want. We would recommend to use different directories for each of these.
To configure the number of threads used to scan the given input files there is an environment variable. The following line configures the service to use 6 processes in parallel, which is the default.
export SCANCODE_SERVICE_PROCESSES=6
Scancode-Toolkit can sometimes take an excessive amount of time to scan large files, which can lead to long wait times for the scan results.
To address this issue, Scancode-Toolkit provides a 'deadline' parameter that can be used to stop a scan after a certain amount of time.
The deadline
is calculated by adding a time delta to the current timestamp. This delta can be configured using an environment variable named SCANCODE_SERVICE_DELTA_T
.
That environment variable is used to set the time delta (in seconds) that is added to the current timestamp to determine the deadline
for the Scancode-Toolkit scan.
Configure the delta as following:
export SCANCODE_SERVICE_DELTA_T=48
Build the image with
docker build -t scancode-service .
and start the container
docker run -p 8000:8000 --mount type=bind,source=/metaeffekt-scancode-toolkit/tests/cluecode/data/,target=/metaeffekt-scancode-toolkit/tests/cluecode/data/ scancode-service
It is important to bind mount the scan directory exactly to the same location into the container as on the host.
Type
scancode-service
which will start the service.
At http://localhost:8000/docs you will find a documentation of the API. Scan requests can be initiated by a post request to http://localhost:8000/scan. For the status of the service and an overview over the current scans send a get request to http://localhost:8000/scan.
Given one has installed Scancode Extensions into /var/opt/scancode-service
with permissions for user scancode
.
The following example configuration could help to start it as a systemd service.
Create file /etc/systemd/system/metaeffekt-scancode.service
with the following content.
[Unit]
Description=metaeffekt scancode service
## make sure we only start the service after network is up
Wants=network-online.target
After=network.target
[Service]
## here we can set custom environment variables
Environment=SCANCODE_TEMP=/var/opt/scancode/temp
Environment=SCANCODE_CACHE=/var/opt/scancode/cache
Environment=SCANCODE_LICENSE_INDEX_CACHE=/var/opt/scancode/lcache
Environment=UVICORN_LOG_CONFIG=/var/opt/scancode/logging.yaml
Type=notify
ExecStart=/opt/metaeffekt/scancode/scancode-service.sh --log-config /opt/metaeffekt/scancode/logging.yaml
WatchdogSec=60
Restart=on-watchdog
User=scancode
NotifyAccess=all
# Useful during debugging; remove it once the service is working
StandardOutput=journal
[Install]
WantedBy=multi-user.target
Next create /opt/metaeffekt/scancode/scancode-service.sh
and insert the following.
#!/usr/bin/env bash
CONNECT_TIMEOUT=${HTTP_CONNECT_TIMEOUT:-1}
HTTP_ADDR=${FILEBEAT_HTTP_ADDR:-http://localhost:8000/scan}
REPORT_TIME=$(($WATCHDOG_USEC / 2000000))
SD_NOTIFY=${SD_NOTIFY_PATH:-/bin/systemd-notify}
set -euo pipefail
function watchdog() {
READY=0
sleep $(echo $REPORT_TIME*1.5 | bc)
while true ; do
info=$(curl -fs --connect-timeout "${CONNECT_TIMEOUT}" "${HTTP_ADDR}")
beat=$(echo "${info}" | jq -r .status)
current_scans=$(echo "${info}" | jq -r .scans)
if [[ $? == 0 ]] ; then
if [[ $READY == 0 ]] ; then
"${SD_NOTIFY}" --ready
READY=1
fi
"${SD_NOTIFY}" WATCHDOG=1
"${SD_NOTIFY}" STATUS="metaeffekt-scancode-service is ${beat}. Active scans: ${current_scans}."
else
"${SD_NOTIFY}" WATCHDOG=trigger
"${SD_NOTIFY}" STATUS=Beat not responding
exit 1
fi
sleep ${REPORT_TIME}
done
}
watchdog &
exec /opt/metaeffekt/scancode/venv/bin/scancode-service "$@"
With sudo systemctl start metaeffekt-scancode
you could start the service. sudo systemctl status metaeffekt-scancode
returns the current status of the service.
The original ScanCode Toolkit code is licensed under Apache License 2.0. The modification, extensions and configuration of the metaeffekt-scancode-service are also provided under Apache License 2.0. Please see the LICENSE and NOTICE files for details.