A DS team repository for shared data ingestion utilities.
- For aaq:
AAQ_API_KEY="<secret>"
AAQ_API_BASE_URL="<url>"
- For content_repo:
CONTENT_REPO_API_KEY="<secret>"
CONTENT_REPO_BASE_URL="<url>"
- For flow_results:
FLOW_RESULTS_API_KEY="<secret>"
FLOW_RESULTS_API_BASE_URL="<url>"
- For rapidpro:
RAPIDPRO_API_KEY="<secret>"
RAPIDPRO_API_BASE_URL="<url>"
- For the turn_bq API:
TURN_BQ_API_KEY="<secret>"
TURN_BQ_API_BASE_URL="<url>"
If you want to use the s3 utilities (that allow you to read and write specific parquet files amongst other things), the following variables should be set:
S3_KEY="<key>"
S3_SECRET="<secret>"
There are 2 ways of doing this.
- Versioned install from github:
rdw-ingestion-tools is public!
pip3 install git+https://github.com/praekeltfoundation/[email protected]
- From clone (with uv). This is recommended:
git clone [email protected]:praekeltfoundation/rdw-ingestion-tools.git
uv sync
For more examples on how to interact with particular API endpoints, see the examples file. These
contain examples for each supported third party service and the endpoint associated with each.
For instance, to get flows from the Flow Results Specification API, the example is as follows:
from api.flow_results import pyFlows
flows = pyFlows.flows.get_flows()
print(flows.keys())
To access some of the s3 utilities used in ingestion.
import os
from s3 import pyS3
bucket=os.environ["BUCKET_NAME"]
prefix=os.environ["PREFIX"]
pyS3.s3.get_filenames(bucket=bucket, prefix=prefix)
uv syncoruv sync --extra polarsfor Turn BQ and AAQV2 examplesuv run --env-file .env examples/{path}e.g.uv run --env-file .env examples/turn_bq/cards.py
uv sync --dev --extra polarsuv run pytest -vv --cov
To release a new version of the package:
- Update the version number in
pyproject.toml - Run
uv lock - Post the changes to Slack for approval
- Once approved, push to main
- Repeat the above post-release, incrementing and adding
.dev0to the version number.