Skip to content

feat: add vibecoded but working convert_json_jsonld.py#23

Merged
caviri merged 23 commits intodevelopfrom
json-to-rdf
Feb 16, 2026
Merged

feat: add vibecoded but working convert_json_jsonld.py#23
caviri merged 23 commits intodevelopfrom
json-to-rdf

Conversation

@caviri
Copy link
Copy Markdown
Member

@caviri caviri commented Nov 12, 2025

Note

Medium Risk
Moderate risk due to contract changes to core models (emails, structured Affiliation, linkedEntities) and pipeline behavior/documented execution order, which can break cached data and downstream API consumers if not migrated.

Overview
Introduces a new scripts/convert_json_jsonld.py CLI (and scripts/README.md) to convert extractor outputs between Pydantic JSON and JSON-LD, including basic model-type detection (repo/user/org) and optional --base-url for @id generation; adds a Tentris upload test script.

Refactors/renames academic-catalog enrichment to linked entities across docs and rules, and documents an atomic multi-stage LLM pipeline with optional author-level linked-entity enrichment and a new union-field reconciliation guideline.

Updates data-model guidance to support structured Affiliation objects with provenance, switches Person.email to Person.emails with automatic email anonymization, simplifies Organization fields, and bumps project baseline to Python >=3.10; also fixes just test commands to run tests/ with PYTHONPATH=src and refreshes devcontainer onboarding/docs.

Written by Cursor Bugbot for commit 2a8dcc8. This will update automatically on new commits. Configure here.

Comment thread scripts/convert_json_jsonld.py
@caviri
Copy link
Copy Markdown
Member Author

caviri commented Nov 12, 2025

Hi @rmfranken, I can see the conversion for repositories what about the conversion for user, and organization?

@rmfranken
Copy link
Copy Markdown
Contributor

Here are the conversion commands for each type - they are pretty much the same:
A person:
python scripts/convert_json_jsonld.py to-jsonld MalloryWittwer.json output.jsonld --base-url https://github.com/MalloryWittwer

A organization:
python scripts/convert_json_jsonld.py to-jsonld sdsc-ordes.json sdsc-ordes.jsonld --base-url https://github.com/sdsc-ordes
A software:
python scripts/convert_json_jsonld.py to-jsonld DeepLabCutDeepLabCut.json DeepLabCutDeepLabCut.jsonld --base-url https://github.com/DeepLabCut/DeepLabCut

The script auto-detects what kind of entity it's dealing with and processes it accordingly. The base-url is a bit too safe maybe - but we can see how easy it is to inject that into the call that we build.

@rmfranken
Copy link
Copy Markdown
Contributor

I think I want to make the URI's hashed - not blank nodes. Tentris is not dealing super well with them - not sure why - will investigate tomorrow.

Comment thread .devcontainer/devcontainer.json Outdated
@Imaging-Plaza Imaging-Plaza deleted a comment from cursor bot Feb 16, 2026
@Imaging-Plaza Imaging-Plaza deleted a comment from cursor bot Feb 16, 2026
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@Imaging-Plaza Imaging-Plaza deleted a comment from cursor bot Feb 16, 2026
@caviri caviri merged commit 0b4e035 into develop Feb 16, 2026
3 checks passed
@caviri caviri deleted the json-to-rdf branch February 16, 2026 20:23
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment thread .devcontainer/Dockerfile
# Gemini CLI
# Please login outside of the container and copy your credentials to ~/.gemini/...
RUN curl -fsSL https://deb.nodesource.com/setup_24.x | sudo -E bash - && sudo apt-get install -y nodejs
RUN npm install -g @google/gemini-cli
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini CLI added to shared devcontainer Dockerfile

Low Severity

The devcontainer Dockerfile now installs Node.js 24 and @google/gemini-cli globally. This appears to be personal development tooling (the comment references copying personal credentials). It adds significant image size and an external dependency to the shared devcontainer that isn't related to the project's core functionality.

Fix in Cursor Fix in Web


# Academic catalog enrichment models (NEW)
ACADEMIC_CATALOG_ENRICHMENT_MODELS='[
linked_entities_ENRICHMENT_MODELS='[
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment variable uses inconsistent casing convention

Low Severity

The environment variable linked_entities_ENRICHMENT_MODELS mixes lowercase and uppercase, breaking the standard SCREAMING_SNAKE_CASE convention for environment variables. Grep confirmed this same name is used in the actual source code at src/llm/model_config.py. All other env vars in this file follow SCREAMING_SNAKE_CASE (e.g., USER_ENRICHMENT_MODELS, ORG_ENRICHMENT_MODELS).

Fix in Cursor Fix in Web

Comment thread docs/AGENT_STRATEGY.md
style A fill:#f9f,stroke:#333,stroke-width:2px
style Z fill:#bfa,stroke:#333,stroke-width.md:2px
style Y fill:#bfa,stroke:#333,stroke-width.md:2px
classDef agentNode fill:#dff,stroke:#333,stroke-width.md:2px
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mermaid diagram broken by stroke-width.md typo

Low Severity

The mermaid diagram style definitions use stroke-width.md:2px instead of stroke-width:2px on three lines. The .md suffix was erroneously inserted (likely from the filename), making the CSS property invalid. This will cause the mermaid renderer to fail to apply the intended styling to nodes Z, Y, and the agentNode class definition.

Fix in Cursor Fix in Web

Comment thread scripts/README.md

# Convert JSON-LD to JSON
python scripts/convert_json_jsonld.py to-json input.jsonld output.json
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR's featured script convert_json_jsonld.py missing from commit

High Severity

The PR title is "feat: add vibecoded but working convert_json_jsonld.py" and this commit adds extensive documentation for scripts/convert_json_jsonld.py (a full CLI guide in docs/JSON_JSONLD_CONVERSION_CLI.md, usage examples in scripts/README.md and docs/JSONLD_MAPPING_UPDATE.md), but the actual script file is absent from the commit. Grep confirms the file does not exist anywhere in the repository — it only appears as a referenced path in documentation. Anyone following the docs will get a "file not found" error.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants