Skip to content

13priyaank/multi-modal-prompt-refinement

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Modal Prompt Refinement System

Overview

The Multi-Modal Prompt Refinement System is a Python-based command-line tool that translates unstructured, human-provided inputs into a clean, structured, machine-ready Master Prompt in JSON format.

The system is designed to act as a translator between messy real-world inputs—such as text notes, documents, and images with descriptions—and downstream AI systems or engineering workflows that require stable, predictable input formats.

This project is not a UI application and not a model training pipeline. Its primary focus is on deterministic reasoning, validation, and explainability.


Why This System Exists

In practical AI and product development workflows, requirements are often:

  • Scattered across multiple formats
  • Incomplete or ambiguous
  • Difficult to reuse reliably

This system addresses those challenges by:

  • Consolidating information from multiple input modalities
  • Structuring it into a predictable, schema-validated JSON format
  • Deterministically detecting intent and extracting requirements
  • Explicitly surfacing missing or unclear information
  • Preventing silent assumptions or hallucinated details

The output is a transparent and auditable Master Prompt that can be safely consumed by downstream systems.


Installation

  1. Clone the repository:

    git clone <repository-url>
    cd multi-modal-prompt-refiner
  2. (Recommended) Create and activate a virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt

How to Run

The system is executed from the command line. You may provide one or more input files along with optional metadata.

python main.py [INPUT_FILES...] [--output-file OUTPUT_PATH] [--image-desc IMAGE_FILE DESCRIPTION]

Arguments

  • INPUT_FILES...
    One or more input files (.txt, .pdf, .docx, .png, .jpg)

  • --output-file (optional)
    Path where the refined JSON prompt will be saved
    Default: outputs/refined_prompt.json

  • --image-desc (optional)
    Associates a human-provided description with an image file
    This flag may be used multiple times


Example Usage

Text-Only Input

python main.py inputs/example_text.txt --output-file outputs/text_example.json

Mixed Inputs (Text, Document, and Image)

python main.py inputs/example_text.txt inputs/example_doc.docx inputs/example_image.png \
  --image-desc inputs/example_image.png "Sketch of the homepage layout" \
  --output-file outputs/mixed_example.json

Handling Irrelevant Input

If an input does not describe a buildable task or product, the system rejects it with a clear explanation.

python main.py inputs/irrelevant.txt --output-file outputs/irrelevant_example.json

The resulting output explicitly indicates the rejection and the reason for it.


How Inputs Become a Refined Prompt

  1. Processing
    Each input file is read using a modality-specific processor (text, document, or image metadata).

  2. Consolidation
    Extracted content and image descriptions are combined into a unified text representation while preserving source attribution.

  3. Relevance Checks
    Inputs that are empty or non-actionable are flagged and excluded from refinement.

  4. Refinement

    • Intent Detection: A deterministic, rule-based process identifies the primary product or task intent.
    • Extraction: Rule-based logic extracts functional requirements and technical constraints.
  5. Prompt Construction
    The extracted information is assembled into a structured JSON object conforming to config/prompt_schema.json.
    Missing or unclear information is explicitly recorded.

  6. Validation
    The final output is validated against the schema to ensure structural correctness and stability.

  7. Output
    The validated Master Prompt is written to disk. If validation fails, an .invalid.json file is produced for inspection.


Documentation

Detailed design rationale, architectural decisions, and trade-offs are documented in:

  • docs/DESIGN_EXPLANATION.md

Author

Name: Priyank Wadiwala Institute: Sardar Vallabhbhai Institute of Technology, Vasad

About

A deterministic system that converts messy multi-modal inputs (text, documents, images) into clean, structured, machine-ready prompts. Built to demonstrate reasoning, validation, and explainable system design.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages