feat: subscription historical filters + PDF supply-price extraction + local PDF save#15
Open
roughian wants to merge 2 commits into
Open
feat: subscription historical filters + PDF supply-price extraction + local PDF save#15roughian wants to merge 2 commits into
roughian wants to merge 2 commits into
Conversation
โฆorical filters get_apt_subscription_info: - Add rcrit_pblanc_de_from/_to (YYYY-MM-DD) and mvn_prearnge_ym_from (YYYYMM) filters, mapped to odcloud cond[] syntax for server-side filtering of past notices. - Add only_pending_occupancy flag that uses the stricter of the current year-month and the user-supplied mvn_prearnge_ym_from. - Enrich items with is_pre_occupancy and expected_move_in_year_month derived fields from MVN_PREARNGE_YM. - Echo applied_filters back in the response for LLM traceability. get_apt_subscription_supply_prices (new): - Resolve PBLANC_URL via HOUSE_MANAGE_NO lookup or accept it directly. - Download the notice PDF with a 25 MB cap and PDF magic-byte check. - Extract ํํ๋ณ ๋ถ์๊ฐ via pdfplumber: tables first (header keyword detection), regex fallback on price-keyword pages. KRW values are normalized to ๋ง์ with a 1์ต-threshold heuristic, deduplicated by (unit_type, exclusive_area_sqm). Common utilities: - New pdf_parser.py with extract_text() / extract_supply_prices() and a SupplyPrice dataclass. Guards: 25 MB / 200 pages / OCR-required. Tests: - 26 new subscription test cases (filters, derived fields, supply-price tool) plus 7 pdf_parser unit tests using a mocked pdfplumber. Docs: - README-ko.md and CLAUDE.md updated with the new tool, filters, and utility module. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New MCP tool that saves the ์ฒญ์ฝ ๊ณต๊ณ PDF to a user-specified local
directory (Claude Desktop stdio mode). Designed so the LLM asks the user
for `save_dir` before calling instead of guessing a path.
- download_subscription_pdf(save_dir, house_manage_no=None,
pblanc_url=None, filename=None, overwrite=False)
- Reuses _download_pdf for the network step (15s timeout, 25 MB cap).
- New helpers in _helpers.py:
- _sanitize_filename_component: strips path separators, ../, and
Windows-unsafe characters; collapses whitespace to underscores.
- _resolve_save_dir: expands ~ and resolves to absolute path.
- _next_available_path: appends _01.._99 suffix on collision.
- Filename defaults to "{HOUSE_NM}_{HOUSE_MANAGE_NO}.pdf" when looked up.
- Directory is auto-created (mkdir parents=True, exist_ok=True).
Interactive guidance (per Codex consultation, Claude Desktop has no
Skill/AskUserQuestion equivalent):
- Tool docstring instructs the LLM to ask the user for save_dir before
calling rather than silently picking a default.
- resources/custom-instructions-ko.md gains a section on the same
interaction policy with example prompts.
Tests:
- 11 new cases covering missing inputs, default filename, lookup +
filename composition, path-traversal sanitisation, suffix collision,
overwrite=True, and auto-mkdir.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
|
์๋
ํ์ธ์ ๊ธฐ์ฌ ๊ฐ์ฌํฉ๋๋ค. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three related enhancements for the ์ฒญ์ฝ (subscription) tool group:
Historical / pending-occupancy filters on
get_apt_subscription_inforcrit_pblanc_de_from/rcrit_pblanc_de_to(YYYY-MM-DD) โ server-side filter on ๋ชจ์ง๊ณต๊ณ ์ผ via odcloudcond[]syntax. Lets users pull past notices ("2021๋ 7์๊น์ง์ ๊ณต๊ณ ").mvn_prearnge_ym_from(YYYYMM) +only_pending_occupancyflag โ filter to ๋จ์ง with future MVN_PREARNGE_YM (i.e. ์ค๊ฑฐ๋๊ฐ ๋ฏธํ์ ์ ์ฃผ์์ ).is_pre_occupancyandexpected_move_in_year_monthderived fields.applied_filtersecho in the response for LLM traceability.New tool
get_apt_subscription_supply_priceshouse_manage_no(looks up PBLANC_URL via odcloud) orpblanc_urldirectly.pdfplumber.(unit_type, exclusive_area_sqm).New tool
download_subscription_pdf(added in follow-up commit)save_diris required and~-expanded; the directory is auto-created.filenameis sanitised (path separators,.., control chars removed; whitespace collapsed to_).{HOUSE_NM}_{HOUSE_MANAGE_NO}.pdf.overwrite=False(default) appends a numeric suffix (_01,_02, ...) on collision;overwrite=Truereplaces the existing file.resources/custom-instructions-ko.mdinstruct the LLM to ask the user forsave_dirbefore calling, rather than guessing a default. (Claude Desktop has no Skill/AskUserQuestion UI; this is the idiomatic equivalent.)New utility module
src/real_estate/common_utils/pdf_parser.pyโextract_text()/extract_supply_prices()/SupplyPricedataclass. Guards: 25 MB / 200 pages / OCR-required detection.New helpers in
_helpers.py_download_pdfโ streaming GET with 15 s timeout, 25 MB cap, content-type / magic-byte check._sanitize_filename_component,_resolve_save_dir,_next_available_pathโ used bydownload_subscription_pdf._current_year_month,_validate_pblanc_date,_validate_year_monthโ used by the historical filter.Dependencies
pdfplumber==0.11.4added (pure-Python, pdfminer.six + pypdfium2 transitively). No system packages required, friendly for Docker/HTTP-mode deployments.Test plan
uv run pytest tests/mcp_server/test_subscription.py tests/common_utils/test_pdf_parser.pyโ 54 tests passuv run ruff checkโ clean on changed filesuv run pyrighton changed files โ 0 errorsdownload_subscription_pdfin Claude Desktop stdio modeNotes / follow-ups
_download_pdfaccepts arbitrary URLs (viapblanc_url). A follow-up patch is recommended to add a host allowlist and private-IP blocklist for SSRF defence-in-depth before deploying in HTTP mode.\d{2,3}[A-Z]?T?can collide with non-ํํ numbers when tables are absent. Recommend adding regression fixtures from real PDFs once available.save_dirprompt is policy-based (docstring + Project Instructions), not hard-enforced. If stricter UX becomes important, splitting into a two-step propose/confirm tool pair is an option.๐ค Generated with Claude Code