Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
008f009
feat: update Python versions and update GitHub Actions
ChasNelson1990 Dec 8, 2025
382553e
feat: rename requirements.txt file
ChasNelson1990 Dec 8, 2025
f3d58ca
fix: readd pip install for requirements in test workflow
ChasNelson1990 Dec 8, 2025
6f71610
Initial plan
Copilot Dec 8, 2025
f731792
fix: remove --user root option from container and fix lint error
Copilot Dec 8, 2025
bf6199f
fix: restore --user root and add harvester initdb command
Copilot Dec 8, 2025
b8a84e2
fix: add pyOpenSSL and cryptography version constraints
Copilot Dec 8, 2025
e75db77
fix: update pyOpenSSL version in requirements.txt
Copilot Dec 8, 2025
c428e1c
fix: remove deprecated urllib3.contrib.pyopenssl usage
Copilot Dec 8, 2025
57db72f
fix: remove pyopenssl mocks from tests
Copilot Dec 8, 2025
14ce765
Merge pull request #14 from fjelltopp/copilot/sub-pr-13
ChasNelson1990 Dec 8, 2025
16acdc4
Migrate to CKAN 2.11: Fix database initialization and test framework
A-Souhei Jan 7, 2026
bb45fec
Fix remote group creation for CKAN 2.11
A-Souhei Jan 7, 2026
3aa0e74
Fix group creation tests for CKAN 2.11 compatibility
A-Souhei Jan 7, 2026
62571ac
Fix test_update source_type typo
A-Souhei Jan 7, 2026
440db95
Fix harvest_job_create authorization for CKAN 2.11
A-Souhei Jan 7, 2026
af57609
Fix blueprint test authorization for CKAN 2.11
A-Souhei Jan 8, 2026
93fc378
Fix _get_logic_functions to handle Flask LocalProxy and modules witho…
A-Souhei Jan 8, 2026
ecea6c3
Add package_update auth override for CKAN 2.11 Flask compatibility
A-Souhei Jan 8, 2026
0287f83
Fix mutable default argument bug in _get_logic_functions
A-Souhei Jan 8, 2026
bd190c6
Fix flake8 linting errors
A-Souhei Jan 8, 2026
1c1ebb3
Refactor auth functions to eliminate code duplication and improve safety
A-Souhei Jan 8, 2026
56e70b9
fix(templates): update harvest route names for CKAN 2.11
A-Souhei Jan 22, 2026
326abbe
fix(auth): tighten sysadmin form-view bypass in package_create/update
A-Souhei Apr 17, 2026
ec615a8
fix(auth): use has_request_context() instead of LocalProxy truthiness
A-Souhei Apr 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 13 additions & 28 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,45 +2,28 @@ on: [pull_request]
jobs:
lint:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: [ 2.7, 3.9 ]
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python-version }}
python-version: "3.10"
- name: Install requirements
run: pip install flake8 pycodestyle
- name: Check syntax
run: |
flake8 . --count --max-line-length=127 --show-source --statistics

test:
name: CKAN
name: CKAN 2.11
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
include:
- ckan-container-version: "2.9-py2"
ckan-postgres-version: "2.9"
ckan-solr-version: "2.9-solr8"
- ckan-container-version: "2.9"
ckan-postgres-version: "2.9"
ckan-solr-version: "2.9-solr8"
- ckan-container-version: "2.10"
ckan-postgres-version: "2.10"
ckan-solr-version: "2.10"

container:
image: openknowledge/ckan-dev:${{ matrix.ckan-container-version }}
image: ckan/ckan-dev:2.11-py3.10
options: --user root
services:
solr:
image: ckan/ckan-solr:${{ matrix.ckan-solr-version }}
image: ckan/ckan-solr:2.11-solr9
postgres:
image: ckan/ckan-postgres-dev:${{ matrix.ckan-postgres-version }}
image: ckan/ckan-postgres-dev:2.11
env:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
Expand All @@ -56,16 +39,18 @@ jobs:
CKAN_REDIS_URL: redis://redis:6379/1

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v6
- name: Install requirements
Comment thread
A-Souhei marked this conversation as resolved.
run: |
pip install -r pip-requirements.txt
pip install -r requirements.txt
pip install -r dev-requirements.txt
pip install -e .
# Replace default path to CKAN core config file with the one on the container
sed -i -e 's/use = config:.*/use = config:\/srv\/app\/src\/ckan\/test-core.ini/' test.ini
- name: Setup extension
run: |
ckan -c test.ini db init
ckan -c test.ini harvester initdb
- name: Run tests
run: pytest --ckan-ini=test.ini --cov=ckanext.harvest --disable-warnings ckanext/harvest/tests
run: |
pytest --ckan-ini=test.ini --cov=ckanext.harvest --disable-warnings ckanext/harvest/tests/
11 changes: 4 additions & 7 deletions ckanext/harvest/harvesters/ckanharvester.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
from requests.exceptions import HTTPError, RequestException

import datetime
from urllib3.contrib import pyopenssl

from six.moves.urllib.parse import urlencode
from ckan import model
Expand Down Expand Up @@ -41,8 +40,6 @@ def _get_content(self, url):
if api_key:
headers['Authorization'] = api_key

pyopenssl.inject_into_urllib3()

try:
http_request = requests.get(url, headers=headers)
except HTTPError as e:
Expand Down Expand Up @@ -442,12 +439,12 @@ def import_stage(self, harvest_object):
log.error('Could not get remote group %s', group_)
continue

for key in ['packages', 'created', 'users', 'groups', 'tags', 'extras', 'display_name']:
for key in ['packages', 'created', 'users', 'groups', 'tags', 'extras', 'display_name', 'id']:
group.pop(key, None)

get_action('group_create')(base_context.copy(), group)
log.info('Group %s has been newly created', group_)
validated_groups.append({'id': group['id'], 'name': group['name']})
created_group = get_action('group_create')(base_context.copy(), group)
log.info('Group %s has been newly created', created_group)
validated_groups.append({'id': created_group['id'], 'name': created_group['name']})

package_dict['groups'] = validated_groups

Expand Down
75 changes: 75 additions & 0 deletions ckanext/harvest/logic/auth/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
from ckan.plugins import toolkit as pt
from ckanext.harvest import model as harvest_model

try:
from flask import request, has_request_context
except ImportError:
# Flask not available (shouldn't happen in CKAN 2.11+)
request = None

def has_request_context():
return False


def user_is_sysadmin(context):
'''
Expand All @@ -17,6 +26,72 @@ def user_is_sysadmin(context):
return user_obj.sysadmin


def load_user_from_flask_request(context):
"""
Load user from Flask request environ into context if not already present.

In CKAN 2.11, when accessing harvest forms, authorization checks happen
before the user is loaded into context, even though REMOTE_USER is set
in the Flask request environ. This helper ensures the user is loaded
for proper authorization checks.

Args:
context: CKAN context dict

Returns:
None (modifies context in-place)
"""
# Use has_request_context() instead of `if not request`: the Flask
# `request` LocalProxy raises RuntimeError when evaluated outside an
# active request context (e.g. background queue, CLI).
if not has_request_context():
return

user = context.get('user', '')
if not user:
try:
user = request.environ.get('REMOTE_USER', '')
if user:
context['user'] = user
# Load user object into context for sysadmin checks
model = context.get('model')
if model:
user_obj = model.User.get(user)
# Mirror CKAN's own _get_user contract: only inject active
# users so a deleted/blocked account whose name lingers in
# REMOTE_USER (e.g. from a still-valid session cookie) does
# not retain auth privileges.
if user_obj and getattr(user_obj, 'state', None) == 'active':
context['auth_user_obj'] = user_obj
except Exception:
# Intentionally ignore failures when loading user from request
# so that authorization can safely fall back to CKAN defaults
pass


def is_harvest_form_view():
"""
True only for GET requests serving the harvest creation/edit forms.

Used to safely identify the "form view" case in CKAN 2.11 where
package_create/package_update auth fires before the form renders with
a sparse data_dict. Anchored on the request path so unrelated callers
that happen to pass an empty data_dict do not bypass auth.
"""
if not has_request_context():
return False
try:
if request.method != 'GET':
return False
from ckanext.harvest import utils
path = (request.path or '').rstrip('/')
prefix = '/{0}/'.format(utils.DATASET_TYPE_NAME)
return path.endswith('/{0}/new'.format(utils.DATASET_TYPE_NAME)) \
or (prefix + 'edit/') in (path + '/')
except Exception:
return False


def _get_object(context, data_dict, name, class_name):
'''
return the named item if in the data_dict, or get it from
Expand Down
42 changes: 40 additions & 2 deletions ckanext/harvest/logic/auth/create.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,42 @@
from ckan.plugins import toolkit as pt
from ckanext.harvest.logic.auth import user_is_sysadmin
from ckanext.harvest.logic.auth import (
user_is_sysadmin,
load_user_from_flask_request,
is_harvest_form_view,
)
import ckan.logic.auth.create as create_auth


def package_create(context, data_dict):
"""
Authorization for creating packages (harvest and regular types).

In CKAN 2.11, this auth function is called before showing the creation form.
When viewing forms (empty data_dict), allow sysadmins to proceed.
"""
# Load user from Flask request if not already in context (CKAN 2.11 Flask issue)
load_user_from_flask_request(context)

package_type = data_dict.get('type', '')

# Check if user is sysadmin
try:
is_sysadmin = user_is_sysadmin(context)

# Allow sysadmins for harvest packages, or for the GET render of the
# harvest creation form (CKAN 2.11 calls this auth before the view
# renders with a sparse data_dict). The form-view case is gated on
# the request path so an empty data_dict from any other caller does
# not silently skip CKAN's normal package_create auth chain.
if is_sysadmin and (package_type == 'harvest' or is_harvest_form_view()):
return {'success': True}
except Exception:
# Intentionally ignore failures in sysadmin check so that authorization
# can fall back to CKAN's default package_create logic below
pass

# For non-sysadmins or non-harvest packages, use CKAN's default auth
return create_auth.package_create(context, data_dict)


def harvest_source_create(context, data_dict):
Expand Down Expand Up @@ -35,7 +72,8 @@ def harvest_job_create(context, data_dict):

context['package'] = pkg
try:
pt.check_access('package_update', context, data_dict)
# Pass the package id to package_update for authorization check
pt.check_access('package_update', context, {'id': pkg.id})
return {'success': True}
except pt.NotAuthorized:
return {'success': False,
Expand Down
39 changes: 38 additions & 1 deletion ckanext/harvest/logic/auth/update.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,42 @@
from ckan.plugins import toolkit as pt
from ckanext.harvest.logic.auth import user_is_sysadmin
from ckanext.harvest.logic.auth import (
user_is_sysadmin,
load_user_from_flask_request,
is_harvest_form_view,
)
import ckan.logic.auth.update as update_auth


def package_update(context, data_dict):
"""
Custom package_update authorization for CKAN 2.11 compatibility.

This function handles the same issue as package_create: in CKAN 2.11,
when viewing the edit form (/harvest/edit/<id>), Flask's authorization
check happens before the user is loaded into context, causing 403 errors.

See package_create in logic/auth/create.py for detailed explanation.
"""
# Load user from Flask request if not already in context (CKAN 2.11 Flask issue)
load_user_from_flask_request(context)

# For sysadmins viewing harvest edit forms, allow access
user_obj = context.get('auth_user_obj')
# Use getattr to safely check sysadmin attribute (auth_user_obj might be AnonymousUser)
if user_obj and getattr(user_obj, 'sysadmin', False):
package = context.get('package')
package_type = package.type if package else data_dict.get('type')

# Allow sysadmins for harvest packages, or for the GET render of a
# harvest edit form (CKAN 2.11 calls this auth before the view renders
# with a sparse data_dict). The form-view case is gated on the request
# path so a missing 'name' from any other caller does not silently
# skip CKAN's normal package_update auth chain.
if package_type == 'harvest' or is_harvest_form_view():
return {'success': True}

# Delegate to CKAN's default package_update auth for all other cases
return update_auth.package_update(context, data_dict)


def harvest_source_update(context, data_dict):
Expand Down
44 changes: 25 additions & 19 deletions ckanext/harvest/model/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
from sqlalchemy.orm import backref, relation
from sqlalchemy.exc import InvalidRequestError

from ckan import model
from ckan.model.meta import metadata, mapper, Session
from ckan.model.types import make_uuid
from ckan.model.domain_object import DomainObject
Expand Down Expand Up @@ -47,55 +46,62 @@ def setup():
define_harvester_tables()
log.debug('Harvest tables defined in memory')

if not model.package_table.exists():
# Check if database is initialized by using Inspector
from ckan.model.meta import engine
try:
inspector = Inspector.from_engine(engine)
existing_tables = inspector.get_table_names()
except Exception as e:
log.debug('Harvest table creation deferred: %s', e)
return

if 'package' not in existing_tables:
log.debug('Harvest table creation deferred')
return

if not harvest_source_table.exists():
if 'harvest_source' not in existing_tables:

# Create each table individually rather than
# using metadata.create_all()
harvest_source_table.create()
harvest_job_table.create()
harvest_object_table.create()
harvest_gather_error_table.create()
harvest_object_error_table.create()
harvest_object_extra_table.create()
harvest_log_table.create()
harvest_source_table.create(bind=engine)
harvest_job_table.create(bind=engine)
harvest_object_table.create(bind=engine)
harvest_gather_error_table.create(bind=engine)
harvest_object_error_table.create(bind=engine)
harvest_object_extra_table.create(bind=engine)
harvest_log_table.create(bind=engine)

log.debug('Harvest tables created')
else:
from ckan.model.meta import engine
log.debug('Harvest tables already exist')
# Check if existing tables need to be updated
inspector = Inspector.from_engine(engine)

# Check if harvest_log table exist - needed for existing users
if 'harvest_log' not in inspector.get_table_names():
harvest_log_table.create()
if 'harvest_log' not in existing_tables:
harvest_log_table.create(bind=engine)

# Check if harvest_object has a index
index_names = [index['name'] for index in inspector.get_indexes("harvest_object")]
if "harvest_job_id_idx" not in index_names:
log.debug('Creating index for harvest_object')
Index("harvest_job_id_idx", harvest_object_table.c.harvest_job_id).create()
Index("harvest_job_id_idx", harvest_object_table.c.harvest_job_id).create(bind=engine)

if "harvest_source_id_idx" not in index_names:
log.debug('Creating index for harvest source')
Index("harvest_source_id_idx", harvest_object_table.c.harvest_source_id).create()
Index("harvest_source_id_idx", harvest_object_table.c.harvest_source_id).create(bind=engine)

if "package_id_idx" not in index_names:
log.debug('Creating index for package')
Index("package_id_idx", harvest_object_table.c.package_id).create()
Index("package_id_idx", harvest_object_table.c.package_id).create(bind=engine)

if "guid_idx" not in index_names:
log.debug('Creating index for guid')
Index("guid_idx", harvest_object_table.c.guid).create()
Index("guid_idx", harvest_object_table.c.guid).create(bind=engine)

index_names = [index['name'] for index in inspector.get_indexes("harvest_object_extra")]
if "harvest_object_id_idx" not in index_names:
log.debug('Creating index for harvest_object_extra')
Index("harvest_object_id_idx", harvest_object_extra_table.c.harvest_object_id).create()
Index("harvest_object_id_idx", harvest_object_extra_table.c.harvest_object_id).create(bind=engine)


class HarvestError(Exception):
Expand Down
Loading