Multi-Modal Prompt Refinement System

Overview

The Multi-Modal Prompt Refinement System is a Python-based command-line tool that translates unstructured, human-provided inputs into a clean, structured, machine-ready Master Prompt in JSON format.

The system is designed to act as a translator between messy real-world inputs—such as text notes, documents, and images with descriptions—and downstream AI systems or engineering workflows that require stable, predictable input formats.

This project is not a UI application and not a model training pipeline. Its primary focus is on deterministic reasoning, validation, and explainability.

Why This System Exists

In practical AI and product development workflows, requirements are often:

Scattered across multiple formats
Incomplete or ambiguous
Difficult to reuse reliably

This system addresses those challenges by:

Consolidating information from multiple input modalities
Structuring it into a predictable, schema-validated JSON format
Deterministically detecting intent and extracting requirements
Explicitly surfacing missing or unclear information
Preventing silent assumptions or hallucinated details

The output is a transparent and auditable Master Prompt that can be safely consumed by downstream systems.

Installation

Clone the repository:

git clone <repository-url>
cd multi-modal-prompt-refiner

(Recommended) Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

How to Run

The system is executed from the command line. You may provide one or more input files along with optional metadata.

python main.py [INPUT_FILES...] [--output-file OUTPUT_PATH] [--image-desc IMAGE_FILE DESCRIPTION]

Arguments

INPUT_FILES...
One or more input files (.txt, .pdf, .docx, .png, .jpg)
--output-file (optional)
Path where the refined JSON prompt will be saved
Default: outputs/refined_prompt.json
--image-desc (optional)
Associates a human-provided description with an image file
This flag may be used multiple times

Example Usage

Text-Only Input

python main.py inputs/example_text.txt --output-file outputs/text_example.json

Mixed Inputs (Text, Document, and Image)

python main.py inputs/example_text.txt inputs/example_doc.docx inputs/example_image.png \
  --image-desc inputs/example_image.png "Sketch of the homepage layout" \
  --output-file outputs/mixed_example.json

Handling Irrelevant Input

If an input does not describe a buildable task or product, the system rejects it with a clear explanation.

python main.py inputs/irrelevant.txt --output-file outputs/irrelevant_example.json

The resulting output explicitly indicates the rejection and the reason for it.

How Inputs Become a Refined Prompt

Processing
Each input file is read using a modality-specific processor (text, document, or image metadata).
Consolidation
Extracted content and image descriptions are combined into a unified text representation while preserving source attribution.
Relevance Checks
Inputs that are empty or non-actionable are flagged and excluded from refinement.
Refinement
- Intent Detection: A deterministic, rule-based process identifies the primary product or task intent.
- Extraction: Rule-based logic extracts functional requirements and technical constraints.
Prompt Construction
The extracted information is assembled into a structured JSON object conforming to config/prompt_schema.json.
Missing or unclear information is explicitly recorded.
Validation
The final output is validated against the schema to ensure structural correctness and stability.
Output
The validated Master Prompt is written to disk. If validation fails, an .invalid.json file is produced for inspection.

Documentation

Detailed design rationale, architectural decisions, and trade-offs are documented in:

docs/DESIGN_EXPLANATION.md

Author

Name: Priyank Wadiwala Institute: Sardar Vallabhbhai Institute of Technology, Vasad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Modal Prompt Refinement System

Overview

Why This System Exists

Installation

How to Run

Arguments

Example Usage

Text-Only Input

Mixed Inputs (Text, Document, and Image)

Handling Irrelevant Input

How Inputs Become a Refined Prompt

Documentation

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
docs		docs
inputs		inputs
outputs		outputs
processors		processors
refinement		refinement
validation		validation
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Multi-Modal Prompt Refinement System

Overview

Why This System Exists

Installation

How to Run

Arguments

Example Usage

Text-Only Input

Mixed Inputs (Text, Document, and Image)

Handling Irrelevant Input

How Inputs Become a Refined Prompt

Documentation

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages