Skip to content

A 2024-25 Executive Project by Akshat Bharara, Atharva Rege and Rudra Gandhi under the mentorship of Aakarsh Bansal, Abhishek Srinivas and Raajan Wankhade.

Notifications You must be signed in to change notification settings

IEEE-NITK/text-guided-image-inpainting

Repository files navigation

The Brush of Spells: Text-guided Image Inpainting

A deep learning project inspired by the research paper "MMFL: Multimodal Fusion Learning for Text-Guided Image Inpainting", enabling users to restore and fill images guided by natural language descriptions. This repository provides an interactive interface for text-guided image inpainting, allowing users to mask regions and describe how they should be filled in.

C04_TBOS image


Table of Contents


Description

The Brush of Spells is a text-guided image inpainting tool that leverages multimodal fusion learning to restore or edit images based on user-provided text prompts. Inspired by the MMFL paper, this project fuses image and text features to generate contextually accurate and semantically meaningful inpainted results. The system is designed for artists, researchers, and developers interested in advanced image editing using natural language.


Solution Overview

The project implements a multimodal approach for image inpainting, integrating both visual information (the masked image) and textual guidance (user prompt). The core idea, as proposed in MMFL, is to imitate a painter’s conjecture process: the model uses the text description to provide abundant guidance for image restoration, fusing multimodal features to generate plausible and context-aware inpainting results.

Workflow:

  • User uploads an image and gives the width and height of the mask region to be edited.
  • User provides a text prompt describing the desired content for the masked area.
  • The system fuses the image and text features using a multimodal deep learning model.
  • The masked region is filled in according to the prompt, producing a visually coherent and semantically relevant output.

Features

  • Text-guided inpainting: Restore or edit images by describing changes in natural language.
  • Interactive interface: Upload images, draw masks, and enter prompts in a user-friendly web app.
  • Multimodal fusion: Combines visual and textual cues for context-aware restoration.
  • State-of-the-art results: Inspired by MMFL and recent advances in diffusion-based inpainting models.

Interface & Results

Below are screenshots and sample results from the interactive interface:

Original Image Masked Image Prompt Inpainted Result
image image "A striking Baltimore Oriole perched on a branch, showcasing its vibrant orange and black plumage." image
image image "A vibrant Sun Conure parrot with an orange head and chest, yellow back and wings, and green-blue tail feathers is perched on a light-colored cylindrical bar against a softly blurred beige background." image
image image "A bright red Northern Cardinal with a pointed crest and orange beak, perched on a branch against a softly blurred green background." image

Interface Screenshot:

image image image image

Installation

  1. Clone this repository:
git clone https://github.com/IEEE-NITK/text-guided-image-inpainting.git
cd text-guided-image-inpainting
  1. Install dependencies:
pip install -r demo/requirements.txt

Usage

To launch the interactive inpainting interface locally:

python demo/app.py

How to use:

  • Upload an image.
  • Select the required height and width of the mask.
  • Enter a text prompt describing the desired content.
  • Click "Generate Inpainted Image" to generate the result.

The interface is built with Streamlit for ease of use and rapid prototyping.


References

  • Lin, Qing, et al. "MMFL: Multimodal Fusion Learning for Text-Guided Image Inpainting." Proceedings of the 28th ACM International Conference on Multimedia, 2020.

Contributions and feedback are welcome!


“This paper imitates the process of painters' conjecture, and proposes to introduce the text description into the image inpainting task for the first time, which provides abundant guidance information for image restoration through the fusion of multimodal features.”

About

A 2024-25 Executive Project by Akshat Bharara, Atharva Rege and Rudra Gandhi under the mentorship of Aakarsh Bansal, Abhishek Srinivas and Raajan Wankhade.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •