Thank you for your interest in contributing to gdrive_data_ingestion_framework.
By participating in this project, you agree to be respectful and constructive in all interactions.
- Read the project documentation in
gdrive_to_udl_documentation.md. - Check existing issues and pull requests to avoid duplicate work.
- Open an issue for major changes before implementation.
Use this setup to run and validate changes from your laptop or local VM.
- Python 3.8+
- Access to Google Drive OAuth credentials (client_id, client_secret, refresh_token, token_uri)
- Optional: AWS access to assume the target role for S3 testing
cd gdrive_to_dbx_data_ingestion/gdrive_framework
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pippip install google-api-python-client oauth2client boto3 pandas httplib2Create a JSON value with this minimum contract:
{
"client_id": "...",
"client_secret": "...",
"refresh_token": "...",
"token_uri": "https://oauth2.googleapis.com/token"
}Export as an environment variable:
export GDRIVE_CREDS_JSON='{"client_id":"...","client_secret":"...","refresh_token":"...","token_uri":"https://oauth2.googleapis.com/token"}'python gdrive_to_udl.py list \
--credentials_json "$GDRIVE_CREDS_JSON"Folder-scoped recursive listing:
python gdrive_to_udl.py list \
--credentials_json "$GDRIVE_CREDS_JSON" \
--ids <folder_id>python gdrive_to_udl.py download \
--credentials_json "$GDRIVE_CREDS_JSON" \
--ids <file_or_folder_id> \
--volume_path ./tmp_downloads
# Optional: force an export format for Google-native files
# --file_type csvpython gdrive_to_udl.py download \
--credentials_json "$GDRIVE_CREDS_JSON" \
--ids <file_or_folder_id> \
--s3_bucket <bucket_name> \
--s3_prefix <prefix> \
--s3_role_arn arn:aws:iam::<account_id>:role/<role_name>
# Optional: force an export format for Google-native files
# --file_type csvUse this setup when validating framework behavior in Databricks jobs or notebooks.
- DBR with Python 3.8+
- Access to a Unity Catalog Volume for output testing
- Secret scope configured for OAuth JSON and (optionally) AWS role ARN
In a notebook cell:
%pip install google-api-python-client oauth2client boto3 pandas httplib2Restart Python after install if prompted.
credentials_json = dbutils.secrets.get(scope="<scope>", key="gdrive-creds-json")import sys
sys.path.append("/Workspace/Repos/<user-or-org>/<repo>/gdrive_to_dbx_data_ingestion/gdrive_framework")
from gdrive_to_udl import create_drive_service, list_metadata
service = create_drive_service(credentials_json)
df = list_metadata(service, folder_id="<folder_id>")
display(df.head())- Volume/local output: pass a valid
/Volumes/...path and verify file creation. - S3 output: provide a valid assume-role ARN and verify object creation in target bucket/prefix.
- Metadata listing works for both file and folder IDs.
- Shared Drive access works (
supportsAllDrives=Truepaths). - Google-native exports use expected output formats.
- Shortcut targets resolve correctly.
- Errors are logged clearly and processing continues where designed.
- Never commit OAuth JSON, tokens, or AWS secrets.
- Use environment variables locally and secret scopes in Databricks.
- Use role assumption for S3 access; avoid static long-lived AWS credentials.
- Fork the repository and create a feature branch.
- Make focused changes with clear commit messages.
- Add or update tests and documentation where relevant.
- Ensure code quality checks pass locally.
- Submit a pull request with a clear summary of:
- What changed
- Why it changed
- How it was tested
- Keep pull requests small and reviewable.
- Reference related issue IDs in the PR description.
- Include sample commands or output when behavior changes.
- Update docs for any user-facing or configuration changes.
When filing a bug, please include:
- Environment details (Databricks runtime, Python version, cloud target)
- Reproduction steps
- Expected vs actual behavior
- Relevant logs or stack traces (without secrets)
Do not open public issues for potential security vulnerabilities. Report security concerns privately to the maintainers through your internal security process.
By submitting a contribution, you agree that your contributions are licensed under the project license described in LICENSE.md