Skip to content

Extract endpoint attributes#10

Open
shhovhan wants to merge 2 commits into
mainfrom
extract_endpoint_attributes
Open

Extract endpoint attributes#10
shhovhan wants to merge 2 commits into
mainfrom
extract_endpoint_attributes

Conversation

@shhovhan

Copy link
Copy Markdown
Contributor

This PR contains two commits:

  1. Scanner runner: TOML config, file output, error handling — closes
    out the review feedback on Expand repository scanner: organization scan + per-endpoint scan #7 and lays the runner infrastructure the
    repository scanner - extract attributes #5 work needed.
  2. Extract endpoint attributes — implements RusselSand's
    quality-report design and the Style-A parameter / example extraction
    pipeline.

Commit 1 — Scanner runner cleanup, config, output

PR #7 review feedback addressed

  • ScannerService.__init__ now wraps excluded_segments in a
    fresh frozenset. Every instance owns its own object — no leaky
    shared reference
  • DocProvider.fetch_content gains a branch argument;
    GitHubDocProvider passes it as ?ref= to the Contents API. Aligns
    the signature with list_files / path_exists.
  • excluded_segments is now read from
    [scanner].excluded_segments in the new TOML config; the previously
    hardcoded set is gone. Pydantic field defaults in config.py are
    the single source of truth for OTC-specific values.

Config from file

  • New scan-config.toml at the project root with [github],
    [scanner], [output], [logging] sections, loaded via
    pydantic-settings TomlConfigSettingsSource.
  • GITHUB_TOKEN stays out of TOML on purpose — env / .env only, so
    it can never be committed accidentally.
  • load_settings(config_path) supports --config <path> overriding
    the default TOML.
  • tomli added as a dependency for Python 3.10

CLI and output

  • main.py rewritten around argparse with --config, --output,
    --org, --branch, --stdout, -v / -q.
  • Scan report is written to a file (default scan-output.json)
    instead of stdout. --stdout prints to stdout in addition to the
    file. --output - skips the file and emits to stdout only.
  • Exit codes are explicit: EXIT_OK / EXIT_RUNTIME_ERROR /
    EXIT_USAGE_ERROR.

Error handling

  • Missing GITHUB_TOKEN, missing --config file, and invalid TOML
    now surface as clean one-line errors with exit code 2 instead of
    Python tracebacks.
  • Org-level RepositoryError (auth / rate-limit / network) during
    the scan emits a clean log message and exits 1. Per-document errors
    continue to be recorded inside the scan report.

Commit 2 — Quality-report model and endpoint attribute extraction

Implements @RusselSand 's #5 design with some refinements:
two-tier shape (single gating failure_reason: Issue vs per-section
SectionResult), structured Issue(code, location, details),
extracted parameters / examples co-located with their field-level
metrics, and computed overall_status / completeness / service /
all_issues properties. overall_status="unsupported" is distinct
from "failed".

New parser modules (infrastructure/parsers/)

Scanner + CLI

  • Scanner classifies doc style and dispatches: Style-A → parser,
    S3-compatible → failure_reason=Issue(UNSUPPORTED_DOC_STYLE),
    non-endpoint → RepoScanResult.non_endpoint_documents (no longer
    silently dropped). documents_by_version filters on
    overall_status in (ok, partial).
  • main.py summary uses quality_summary — per-overall-status
    counts, top issue codes, per-version distribution.

Testing

New tests, no network access required:

Fixtures under tests/fixtures/ cover the format variations
documented in the upstream investigation:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant