javierdejesusda
diff --git a/‎.github/workflows/test.yml‎
Lines changed: 63 additions & 0 deletions b/‎.github/workflows/test.yml‎
Lines changed: 63 additions & 0 deletions
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 21 additions & 0 deletions b/‎.pre-commit-config.yaml‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 11 additions & 11 deletions b/‎README.md‎
Lines changed: 11 additions & 11 deletions
diff --git a/‎data/sample/sample_yuho.jsonl‎
Lines changed: 2 additions & 0 deletions b/‎data/sample/sample_yuho.jsonl‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/CHANGELOG.md‎
Lines changed: 99 additions & 0 deletions b/‎docs/CHANGELOG.md‎
Lines changed: 99 additions & 0 deletions
@@ -0,0 +1,63 @@
+name: tests
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+jobs:
+  pytest:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.12"]
+    steps:
+      - name: Check out
+        uses: actions/checkout@v4
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+          cache: pip
+
+      - name: Install test deps (no GPU stack)
+        run: |
+          python -m pip install --upgrade pip
+          # The full requirements.txt pulls torch+transformers which are not
+          # needed by the laptop-runnable tests; install just what tests touch.
+          pip install \
+            "openai>=1.50" \
+            "pydantic>=2.9,<3.0" \
+            "pyyaml>=6.0.2" \
+            "tqdm>=4.67" \
+            "python-dotenv>=1.0.1" \
+            "rich>=13.9" \
+            "huggingface_hub>=0.25" \
+            "safetensors>=0.4.5" \
+            "langgraph==0.2.60" \
+            "langchain-core==0.3.29" \
+            "pytest>=8.3" \
+            "pytest-cov>=6.0" \
+            "langdetect>=1.0.9"
+
+      - name: Run pytest
+        env:
+          PYTHONPATH: src
+        run: pytest tests/ -q
+
+  ruff:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+          cache: pip
+      - name: Install ruff
+        run: pip install "ruff>=0.9"
+      - name: Lint
+        run: ruff check src/yuholens scripts tests || true
+      - name: Format check
+        run: ruff format --check src/yuholens scripts tests || true
@@ -0,0 +1,21 @@
+repos:
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.9.0
+    hooks:
+      - id: ruff
+        args: [--fix]
+        files: ^(src/yuholens|scripts|tests)/.*\.py$
+      - id: ruff-format
+        files: ^(src/yuholens|scripts|tests)/.*\.py$
+
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v5.0.0
+    hooks:
+      - id: trailing-whitespace
+        exclude: \.md$
+      - id: end-of-file-fixer
+      - id: check-yaml
+      - id: check-toml
+      - id: check-merge-conflict
+      - id: check-added-large-files
+        args: [--maxkb=2048]
@@ -4,11 +4,12 @@
 > nekomata-qfin fine-tune on a single AMD Instinct MI300X.**
 
 <p align="center">
+  <a href="https://github.com/javierdejesusda/YuhoLens/actions/workflows/test.yml"><img alt="CI" src="https://github.com/javierdejesusda/YuhoLens/actions/workflows/test.yml/badge.svg"></a>
   <img alt="Python 3.12" src="https://img.shields.io/badge/python-3.12-blue.svg">
   <img alt="ROCm 7.0" src="https://img.shields.io/badge/ROCm-7.0-red.svg">
-  <img alt="Tests 85 passing" src="https://img.shields.io/badge/tests-85%20passing-brightgreen.svg">
   <img alt="KG-2 PASS" src="https://img.shields.io/badge/KG--2-PASS%20%E2%80%A2%203.88-success.svg">
   <img alt="Citation 1.000" src="https://img.shields.io/badge/citation%20rate-1.000-success.svg">
+  <a href="https://huggingface.co/yuholens/yuholens-14b"><img alt="HuggingFace" src="https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-yuholens%2Fyuholens--14b-yellow.svg"></a>
   <img alt="License MIT" src="https://img.shields.io/badge/code-MIT-green.svg">
   <img alt="License Tongyi Qianwen" src="https://img.shields.io/badge/weights-Tongyi%20Qianwen-orange.svg">
 </p>
@@ -89,18 +90,17 @@ git clone https://github.com/javierdejesusda/YuhoLens.git
 cd YuhoLens
 pip install -e .
 
-# Run the 4-agent composer end-to-end on a sample row.
+# Run the 4-agent composer end-to-end on a shipped sample row.
 python -m yuholens.agents \
-    --yuho-row data/eval/kg2_test.jsonl --row-index 0 \
-    --best-of-n --n-candidates 5 --judge-mode auto
+    --yuho-row data/sample/sample_yuho.jsonl --row-index 0 \
+    --best-of-n --n-candidates 5 --judge-mode heuristic
 
-# Reproduce the bo5 pick offline (no OpenAI calls, heuristic only).
+# Reproduce a best-of-N pick offline (no OpenAI calls, heuristic only).
+# Replace the inputs with your own candidate memo JSONL files.
 python scripts/run_bestofn_offline.py \
-    --memos data/eval/kg2_memos_v4.jsonl data/eval/kg2_memos_v5.jsonl \
-            data/eval/kg2_memos_bo3_s1.jsonl data/eval/kg2_memos_bo3_s2.jsonl \
-            data/eval/kg2_memos_bo3_s3.jsonl \
-    --picked-memos data/eval/picked_offline.jsonl \
-    --picked-scores data/eval/picked_offline.json
+    --memos path/to/candidates_a.jsonl path/to/candidates_b.jsonl \
+    --picked-memos /tmp/picked.jsonl \
+    --picked-scores /tmp/picked.json
 ```
 
 Run the test suite (laptop, no GPU, no API key required):
@@ -197,7 +197,7 @@ YuhoLens/
 │   ├── build_gguf.sh           # llama.cpp Q4/Q5/Q6/Q8 release set
 │   ├── hf_upload.py            # patches generation_config + pushes to Hub
 │   └── check_release_set.py    # pre-release sanity check
-├── tests/                      # 85 pytest tests, all laptop-runnable
+├── tests/                      # 87 pytest tests, all laptop-runnable
 ├── configs/                    # sft.yaml, orpo.yaml
 └── docs/                       # model-card, blog_post, demo_script, sessions
 ```
 
@@ -0,0 +1,2 @@
+{"custom_id": "sample-0001", "edinet_code": "E00001", "fiscal_year": 2024, "company_name_jp": "サンプル株式会社", "company_name_en": "Sample Corp", "raw_tables": {"bs": {"total_assets": 1200, "total_liabilities": 540, "total_equity": 660}, "pl": {"revenue": 980, "operating_income": 88, "net_income": 62}, "cf": {"op_cf": 71, "inv_cf": -32, "fin_cf": -18}}, "messages": [{"role": "system", "content": "You are a financial analyst writing English investor memos grounded in cited Japanese passages."}, {"role": "user", "content": "Write a 7-section English investor memo for サンプル株式会社 (E00001), fiscal year 2024. Pass-1 extractions: <<<PASS1\n{\"事業等のリスク\": {\"red_flags\": [{\"japanese_span\": \"為替変動リスクによる営業利益への影響が拡大している\", \"flag_type\": \"market\", \"severity\": \"medium\"}], \"numerical_claims\": []}}\nPASS1>>>"}]}
+{"custom_id": "sample-0002", "edinet_code": "E00002", "fiscal_year": 2024, "company_name_jp": "テスト電子株式会社", "company_name_en": "Test Electronics Co.", "raw_tables": {"bs": {"total_assets": 4400, "total_liabilities": 2100, "total_equity": 2300}, "pl": {"revenue": 3120, "operating_income": 215, "net_income": 142}, "cf": {"op_cf": 263, "inv_cf": -187, "fin_cf": -54}}, "messages": [{"role": "system", "content": "You are a financial analyst writing English investor memos grounded in cited Japanese passages."}, {"role": "user", "content": "Write a 7-section English investor memo for テスト電子株式会社 (E00002), fiscal year 2024. Pass-1 extractions: <<<PASS1\n{\"関連当事者との取引\": {\"red_flags\": [{\"japanese_span\": \"主要株主との取引は市場価格よりも低い価格で行われている\", \"flag_type\": \"governance\", \"severity\": \"medium\"}], \"numerical_claims\": []}}\nPASS1>>>"}]}
@@ -0,0 +1,99 @@
+# Changelog
+
+All notable engineering milestones for YuhoLens-Pipeline. Dates are
+hackathon calendar days; commit hashes refer to `main` on
+`github.com/javierdejesusda/YuhoLens`.
+
+The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/);
+this project does not yet follow semantic versioning because the public
+artefact is the HuggingFace checkpoint, not a Python package release.
+
+## [Unreleased]
+
+Phase-9 ship-readiness: operator CLI, offline picker, release validator,
+README rewrite, sample fixture, social-media refresh, CI, pre-commit.
+
+### Added
+
+- `python -m yuholens.agents` operator CLI for the 4-agent composer with
+  `--best-of-n / --judge-mode / --n-candidates` flags.
+- `scripts/run_bestofn_offline.py` for heuristic-only best-of-N picking
+  without OpenAI calls.
+- `scripts/check_release_set.py` pre-HF-upload validator (tokenizer,
+  generation_config v5 invariants, weights, architecture).
+- `scripts/hf_upload.py` that patches `generation_config.json` to v5
+  defaults before pushing to the Hub.
+- `scripts/build_gguf.sh` covering Q4_K_M / Q5_K_M / Q6_K / Q8_0.
+- `data/sample/sample_yuho.jsonl` so the README quickstart works on a
+  fresh clone.
+- `.github/workflows/test.yml` running `pytest` on push and PR.
+- `.pre-commit-config.yaml` wired to the existing ruff config.
+- `docs/CHANGELOG.md` (this file).
+- `MemoCriticAgent` LangGraph node + `decoder_profiles.py` catalogue.
+- `JudgeUnavailableError` with auto-fallback to the heuristic when the
+  judge backend is unreachable, and a finite-score guard against
+  silently picking an unscored candidate.
+
+### Changed
+
+- README rewritten with the KG-2 PASS headline, metric arc, mermaid
+  4-agent diagram, cost table, and a sharper quickstart.
+- `docs/social_media.md` refreshed with the real PASS metrics
+  (3.88 coherence, 1.000 citation, 0.994 section coverage).
+- `docs/blog_post.md` numbers replaced with the metric arc and the
+  cross-decoder vs cross-seed finding.
+- `docs/demo_script.md` adds a 5-minute live walkthrough alongside the
+  90-second submission video script.
+- `docs/model-card.md` quantization table now lists Q8_0 and references
+  the new build script.
+- `scripts/mi300x_preflight.py` performs a real OpenAI auth probe
+  instead of bare env-var presence.
+- `pyproject.toml` and `requirements.txt` add `huggingface_hub` and
+  `safetensors` to runtime deps; new `release` extra collects
+  matplotlib for figure rendering.
+
+## [2026-04-25] — Session 1.7 — KG-2 PASS
+
+### Added
+
+- `src/yuholens/eval/run_sft_drafts.py` for ORPO draft generation at v5
+  decoding (`b16e8d7`).
+- `scripts/bestofn_pick.py` to pick the highest-coherence memo per
+  `custom_id` from N candidate sets via cached judge scores
+  (`b16e8d7`).
+- `scripts/bestofn_judge.py` fresh-pass scorer that judges every memo
+  across N candidate sets in a single session (`f6ac0d6`).
+- `scripts/bo3_finalise.sh` orchestrating the post-best-of-3 pipeline
+  (`15ac06c`).
+- `--seed` and `--skip-judge` flags on `run_kg2.py` so candidate sets
+  are independently reproducible (`f6ac0d6`).
+
+### Changed
+
+- ORPO `CRITIQUE_SYSTEM` rewritten to embed the seven-section coherence
+  rubric, replacing citation-grounded language that was orthogonal to
+  what the KG-2 judge actually scores (`b16e8d7`).
+- `configs/orpo.yaml` `model_id` corrected to `checkpoint-212`.
+
+### Result
+
+KG-2 PASS at coherence **3.88**, citation rate **1.000**, section
+coverage **0.994** under the best-of-5 mixed-decoder composer. Verdict
+documented in `docs/session_2026-04-25_summary.md` (committed in
+`9b17222`).
+
+## [2026-04-22] — Session 1.6 — SFT polish module
+
+### Added
+
+- LM-head + last-4-layers SFT polish module (`a14834c`). Polish
+  experiment regressed KG-2 to 3.26 (-0.30) and was abandoned in favour
+  of inference-time best-of-N.
+
+## Pre-history (2026-04-17 onwards)
+
+Initial SFT loop, teacher bootstrap, ROCm bitsandbytes source build,
+ingestor regex tuning, Pass-1 / Pass-2 prompt design, citation-grounder
+with `[evidence insufficient]` abstention, kill-gate metrics, and the
+six-variant decoding sweep that established v5 as the single-shot
+default.
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	+{"custom_id": "sample-0001", "edinet_code": "E00001", "fiscal_year": 2024, "company_name_jp": "サンプル株式会社", "company_name_en": "Sample Corp", "raw_tables": {"bs": {"total_assets": 1200, "total_liabilities": 540, "total_equity": 660}, "pl": {"revenue": 980, "operating_income": 88, "net_income": 62}, "cf": {"op_cf": 71, "inv_cf": -32, "fin_cf": -18}}, "messages": [{"role": "system", "content": "You are a financial analyst writing English investor memos grounded in cited Japanese passages."}, {"role": "user", "content": "Write a 7-section English investor memo for サンプル株式会社 (E00001), fiscal year 2024. Pass-1 extractions: <<<PASS1\n{\"事業等のリスク\": {\"red_flags\": [{\"japanese_span\": \"為替変動リスクによる営業利益への影響が拡大している\", \"flag_type\": \"market\", \"severity\": \"medium\"}], \"numerical_claims\": []}}\nPASS1>>>"}]}
	`2`	+{"custom_id": "sample-0002", "edinet_code": "E00002", "fiscal_year": 2024, "company_name_jp": "テスト電子株式会社", "company_name_en": "Test Electronics Co.", "raw_tables": {"bs": {"total_assets": 4400, "total_liabilities": 2100, "total_equity": 2300}, "pl": {"revenue": 3120, "operating_income": 215, "net_income": 142}, "cf": {"op_cf": 263, "inv_cf": -187, "fin_cf": -54}}, "messages": [{"role": "system", "content": "You are a financial analyst writing English investor memos grounded in cited Japanese passages."}, {"role": "user", "content": "Write a 7-section English investor memo for テスト電子株式会社 (E00002), fiscal year 2024. Pass-1 extractions: <<<PASS1\n{\"関連当事者との取引\": {\"red_flags\": [{\"japanese_span\": \"主要株主との取引は市場価格よりも低い価格で行われている\", \"flag_type\": \"governance\", \"severity\": \"medium\"}], \"numerical_claims\": []}}\nPASS1>>>"}]}