feat: add vibecoded but working convert_json_jsonld.py#23
Conversation
|
Hi @rmfranken, I can see the conversion for |
|
Here are the conversion commands for each type - they are pretty much the same: A organization: The script auto-detects what kind of entity it's dealing with and processes it accordingly. The base-url is a bit too safe maybe - but we can see how easy it is to inject that into the call that we build. |
|
I think I want to make the URI's hashed - not blank nodes. Tentris is not dealing super well with them - not sure why - will investigate tomorrow. |
…ion.py pydantic stuff so we assign good author IRI's
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| # Gemini CLI | ||
| # Please login outside of the container and copy your credentials to ~/.gemini/... | ||
| RUN curl -fsSL https://deb.nodesource.com/setup_24.x | sudo -E bash - && sudo apt-get install -y nodejs | ||
| RUN npm install -g @google/gemini-cli |
There was a problem hiding this comment.
Gemini CLI added to shared devcontainer Dockerfile
Low Severity
The devcontainer Dockerfile now installs Node.js 24 and @google/gemini-cli globally. This appears to be personal development tooling (the comment references copying personal credentials). It adds significant image size and an external dependency to the shared devcontainer that isn't related to the project's core functionality.
|
|
||
| # Academic catalog enrichment models (NEW) | ||
| ACADEMIC_CATALOG_ENRICHMENT_MODELS='[ | ||
| linked_entities_ENRICHMENT_MODELS='[ |
There was a problem hiding this comment.
Environment variable uses inconsistent casing convention
Low Severity
The environment variable linked_entities_ENRICHMENT_MODELS mixes lowercase and uppercase, breaking the standard SCREAMING_SNAKE_CASE convention for environment variables. Grep confirmed this same name is used in the actual source code at src/llm/model_config.py. All other env vars in this file follow SCREAMING_SNAKE_CASE (e.g., USER_ENRICHMENT_MODELS, ORG_ENRICHMENT_MODELS).
| style A fill:#f9f,stroke:#333,stroke-width:2px | ||
| style Z fill:#bfa,stroke:#333,stroke-width.md:2px | ||
| style Y fill:#bfa,stroke:#333,stroke-width.md:2px | ||
| classDef agentNode fill:#dff,stroke:#333,stroke-width.md:2px |
There was a problem hiding this comment.
Mermaid diagram broken by stroke-width.md typo
Low Severity
The mermaid diagram style definitions use stroke-width.md:2px instead of stroke-width:2px on three lines. The .md suffix was erroneously inserted (likely from the filename), making the CSS property invalid. This will cause the mermaid renderer to fail to apply the intended styling to nodes Z, Y, and the agentNode class definition.
|
|
||
| # Convert JSON-LD to JSON | ||
| python scripts/convert_json_jsonld.py to-json input.jsonld output.json | ||
| ``` |
There was a problem hiding this comment.
PR's featured script convert_json_jsonld.py missing from commit
High Severity
The PR title is "feat: add vibecoded but working convert_json_jsonld.py" and this commit adds extensive documentation for scripts/convert_json_jsonld.py (a full CLI guide in docs/JSON_JSONLD_CONVERSION_CLI.md, usage examples in scripts/README.md and docs/JSONLD_MAPPING_UPDATE.md), but the actual script file is absent from the commit. Grep confirms the file does not exist anywhere in the repository — it only appears as a referenced path in documentation. Anyone following the docs will get a "file not found" error.


Note
Medium Risk
Moderate risk due to contract changes to core models (
emails, structuredAffiliation,linkedEntities) and pipeline behavior/documented execution order, which can break cached data and downstream API consumers if not migrated.Overview
Introduces a new
scripts/convert_json_jsonld.pyCLI (andscripts/README.md) to convert extractor outputs between Pydantic JSON and JSON-LD, including basic model-type detection (repo/user/org) and optional--base-urlfor@idgeneration; adds a Tentris upload test script.Refactors/renames academic-catalog enrichment to linked entities across docs and rules, and documents an atomic multi-stage LLM pipeline with optional author-level linked-entity enrichment and a new union-field reconciliation guideline.
Updates data-model guidance to support structured
Affiliationobjects with provenance, switchesPerson.emailtoPerson.emailswith automatic email anonymization, simplifiesOrganizationfields, and bumps project baseline to Python>=3.10; also fixesjusttest commands to runtests/withPYTHONPATH=srcand refreshes devcontainer onboarding/docs.Written by Cursor Bugbot for commit 2a8dcc8. This will update automatically on new commits. Configure here.