Skip to content

feat: new api endpoints#5

Open
caviri wants to merge 2 commits into
developfrom
api-refactoring
Open

feat: new api endpoints#5
caviri wants to merge 2 commits into
developfrom
api-refactoring

Conversation

@caviri
Copy link
Copy Markdown
Member

@caviri caviri commented Sep 24, 2025

Proposed Changes

This PR refactors and enhances the RDF transformation API by splitting input modalities into two explicit endpoints, adding native raw Turtle support (without JSON wrapping), enforcing an explicit output serialization parameter, and improving OpenAPI clarity.

Summary of Key Improvements

  1. Introduced two endpoints for clearer API usage and documentation:
    • POST /v1/things — Raw body (JSON-LD object/array or Turtle string).
    • POST /v1/things/upload — Multipart file upload.
  2. Added support for sending plain Turtle directly (Content-Type: text/turtle or text/plain) without wrapping it in JSON.
  3. Made serialization a mandatory parameter (jsonld | turtle) across both endpoints.
  4. Preserved and documented fuzzy matching parameters:
    • fuzzy (bool, default false)
    • fuzzy_threshold (int, 0–100, default 90)
  5. Added explicit multi–media type request body documentation for the raw endpoint (JSON-LD + Turtle).
  6. Implemented resilient lazy ontology loading (continues with an empty label map if ontologies fail to load).
  7. Separated multipart handling to avoid Swagger/OpenAPI collapsing all input options into a single multipart/form-data representation.
  8. Improved developer ergonomics with clearer error responses (400 for empty bodies or malformed content).

Rationale

Previously a single endpoint attempted to multiplex raw JSON, Turtle, and file uploads. Because FastAPI/OpenAPI only allows one requestBody schema per operation, the presence of a File parameter degraded the generated docs—Swagger showed only multipart/form-data, obscuring JSON-LD/Turtle raw usage. Splitting the endpoints yields explicit, self-documenting API surfaces and reduces client ambiguity.

Backward Compatibility

  • The old combined behavior (file + raw in one call) is replaced by two dedicated endpoints.
  • Existing clients using file upload must update the path to: POST /v1/things/upload.
  • Raw JSON-LD/Turtle clients gain clarity and no longer need to guess correct body formats.

Types of Changes

  • A new feature (non-breaking change which adds functionality). Use MR tag feature.
  • A bug fix
  • A breaking change
  • A non-productive update

(Arguably a minor breaking path change for file uploads if they previously used /v1/things; if your release policy treats endpoint path changes as breaking, tick breaking instead.)


Checklist


API Specification (Delta)

1. Raw Transformation Endpoint

POST /v1/things

Accepts one of:

  • application/ld+json — JSON-LD object/array
  • application/json — Interpreted as JSON-LD
  • text/turtle — Turtle string
  • text/plain — Treated as Turtle if parsable

Query Parameters:

Name Type Required Values Description
serialization string yes jsonld | turtle Output format
fuzzy boolean no Enable fuzzy label matching
fuzzy_threshold integer no 0–100 (default 90) Minimum fuzzy score to accept

Successful Responses:

  • 200 OK — RDF graph serialized as:
    • application/ld+json if serialization=jsonld
    • text/turtle if serialization=turtle

Error Responses:

  • 400 Bad Request — Empty body, parse failure, or transformation error.

2. Multipart Upload Endpoint

POST /v1/things/upload

Consumes: multipart/form-data

Form Fields:

Name Type Required Description
file file yes RDF file (JSON-LD or Turtle)
serialization string yes jsonld | turtle
fuzzy boolean no Enable fuzzy label matching
fuzzy_threshold integer no 0–100, default 90

Responses: Same as raw endpoint.


OpenAPI Snippet (Representative)

paths:
  /v1/things:
    post:
      summary: Transform RDF (raw body)
      parameters:
        - in: query
          name: serialization
          required: true
          schema: { type: string, enum: [jsonld, turtle] }
        - in: query
          name: fuzzy
          required: false
          schema: { type: boolean, default: false }
        - in: query
          name: fuzzy_threshold
          required: false
          schema: { type: integer, minimum: 0, maximum: 100, default: 90 }
      requestBody:
        required: true
        content:
          application/ld+json:
            schema:
              type: object
          application/json:
            schema:
              type: object
          text/turtle:
            schema:
              type: string
          text/plain:
            schema:
              type: string
      responses:
        '200':
          description: Transformed RDF
          content:
            application/ld+json:
              schema: { type: object }
            text/turtle:
              schema: { type: string }
        '400':
            description: Invalid input or processing error
  /v1/things/upload:
    post:
      summary: Transform RDF (file upload)
      requestBody:
        required: true
        content:
          multipart/form-data:
            schema:
              type: object
              required: [file, serialization]
              properties:
                file:
                  type: string
                  format: binary
                serialization:
                  type: string
                  enum: [jsonld, turtle]
                fuzzy:
                  type: boolean
                  default: false
                fuzzy_threshold:
                  type: integer
                  minimum: 0
                  maximum: 100
                  default: 90
      responses:
        '200':
          description: Transformed RDF
          content:
            application/ld+json:
              schema: { type: object }
            text/turtle:
              schema: { type: string }
        '400':
          description: Invalid input or processing error

Internal Implementation Notes

Refactored raw handler to read request.body() (byte-level) allowing flexible media types.
Heuristic fallback for format detection: leading { or [ => JSON-LD else Turtle.
serialization drives only output format; input autodetected.
Centralized _transform_bytes() pipeline ensures consistent logic across endpoints.
Logging captures ontology load failures without aborting transformation.

Testing Recommendations (Follow-Up PR)

Add tests for:
Raw JSON-LD → Turtle output
Raw Turtle → JSON-LD output
Multipart JSON-LD & Turtle flows
Fuzzy match ON vs OFF (with mock label map)
Mock OntologyManager in unit tests to avoid external SPARQL dependencies.

@caviri caviri requested a review from rmfranken September 24, 2025 07:12
Copy link
Copy Markdown
Member

@rmfranken rmfranken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! The only thing I'm doubting is whether it makes sense to have fuzzy be false by default - I would expect the opposite.
I also tested quickly locally your code with a turtle document and it seems to work 👍 as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants