Automation: Direction

Last generated: 2026-01-22T18:42:46.396Z
Provider: openai
Model: gpt-5.2

## Summary
Stabilize CI for this legacy Scrapy/Kafka exporter by (1) removing committed CI-artifact DB files, (2) running tests deterministically in GitHub Actions with a pinned Python matrix, and (3) adding lightweight quality gates (lint + packaging sanity). Goal: reduce flaky builds/toil and prevent repo bloat/regressions while keeping changes small.

## Direction (what and why)
1. **Stop tracking `.bish*` files**: The repo currently contains multiple `.bish.sqlite` / `.bish-index` files at root and inside packages. These look like local tooling artifacts and will create noise, large diffs, and potential CI issues.
2. **Make GitHub Actions the source of truth (replace Travis)**: `.travis.yml` exists but GH Actions is clearly in use. Add a single, simple CI workflow that runs `tox` (or `pytest`) on supported Pythons. This repo’s current automation set is huge (many “auto-*” workflows), but none appear to be the basic “run unit tests” gate.
3. **Add minimal, high-value gates**:
   - `python -m build` to ensure packaging remains valid.
   - `pip install -e .` + run tests to catch dependency issues early.
   - Optional: `ruff` (or `flake8`) to prevent obvious mistakes; keep it non-blocking initially if risk-averse.

## Plan (next 1-3 steps)
### 1) Remove `.bish*` artifacts and prevent reintroduction
- Delete committed files:
  - `/.bish-index`, `/.bish.sqlite`
  - `/.github/.bish.sqlite`
  - `/scrapy_kafka_export/.bish-index`, `/scrapy_kafka_export/.bish.sqlite`
  - `/tests/.bish-index`, `/tests/.bish.sqlite`
- Update `.gitignore` (root) to include:
  - `*.bish-index`
  - `*.bish.sqlite`
- Add a small CI check to fail if any `*.bish.sqlite` or `*.bish-index` is committed (simple `git ls-files` grep).

### 2) Add a single canonical CI workflow that runs tests via tox
Create `.github/workflows/ci.yml`:
- Triggers: `push`, `pull_request`
- Use `actions/setup-python` with a small matrix (recommend: `3.9`, `3.10`, `3.11` unless project constraints require older).
- Install `tox` and run `tox -q`.
- Cache pip.
- If `tox.ini` is already configured, keep it; otherwise add/update envlist.

Suggested `tox.ini` updates (if needed):
- Ensure `pytest` is declared in `deps`.
- Ensure tests run with `pytest -q`.
- If Kafka is required for integration tests, mark them and keep unit tests independent (see Step 3).

### 3) Add packaging sanity + (optional) lint in CI
In the same workflow:
- Run `python -m pip install build` and `python -m build` on one Python version (e.g., 3.11) to verify sdist/wheel.
- Optional lint gate:
  - Add `ruff` config in `pyproject.toml` (or `setup.cfg`) and run `ruff check .`
  - Start as non-blocking (`continue-on-error: true`) for one iteration, then enforce.

## Risks/unknowns
- **Python support range**: `setup.py`/dependencies may target older Python. If tests currently only pass on e.g. 3.8/3.9, adjust the matrix to match reality. Confirm via `classifiers` in `setup.py`.
- **Kafka dependency in tests**: If `tests/test_extension.py` relies on a running Kafka broker, CI may be flaky unless it uses mocks. Prefer mocking producer interactions; if integration coverage is required, run Kafka via `docker compose` or `services:` (but that’s a larger change).
- **Workflows sprawl**: The repository already has many automation workflows. Adding one more is fine, but keep it clearly named (`ci.yml`) and required in branch protection to avoid relying on the noisy “auto-*” set.

## Suggested tests
1. **Unit test run**: `tox` (or `pytest -q`) on at least Python 3.9–3.11.
2. **Packaging**: `python -m build` (ensures README/metadata/setup config is consistent).
3. **Artifact check**: CI step:
   - `! git ls-files | grep -E '\.bish-(index|sqlite)$'`
4. (If Kafka interactions are non-mocked) **Integration test (optional)**:
   - Spin up Kafka in CI using a container and run a marked test subset, e.g. `pytest -m integration`.
   - Keep integration non-blocking initially if it’s flaky.

Verification checklist (quick)
- [ ] No `*.bish*` files tracked by git.
- [ ] `ci.yml` runs on PRs and reports pass/fail deterministically.
- [ ] `tox` passes locally and in CI.
- [ ] `python -m build` succeeds and produces wheel/sdist artifacts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automation: Direction #9

Summary

Direction (what and why)

Plan (next 1-3 steps)

1) Remove `.bish*` artifacts and prevent reintroduction

2) Add a single canonical CI workflow that runs tests via tox

3) Add packaging sanity + (optional) lint in CI

Risks/unknowns

Suggested tests

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Automation: Direction #9

Description

Summary

Direction (what and why)

Plan (next 1-3 steps)

1) Remove .bish* artifacts and prevent reintroduction

2) Add a single canonical CI workflow that runs tests via tox

3) Add packaging sanity + (optional) lint in CI

Risks/unknowns

Suggested tests

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1) Remove `.bish*` artifacts and prevent reintroduction