A deep learning project inspired by the research paper "MMFL: Multimodal Fusion Learning for Text-Guided Image Inpainting", enabling users to restore and fill images guided by natural language descriptions. This repository provides an interactive interface for text-guided image inpainting, allowing users to mask regions and describe how they should be filled in.
The Brush of Spells is a text-guided image inpainting tool that leverages multimodal fusion learning to restore or edit images based on user-provided text prompts. Inspired by the MMFL paper, this project fuses image and text features to generate contextually accurate and semantically meaningful inpainted results. The system is designed for artists, researchers, and developers interested in advanced image editing using natural language.
The project implements a multimodal approach for image inpainting, integrating both visual information (the masked image) and textual guidance (user prompt). The core idea, as proposed in MMFL, is to imitate a painter’s conjecture process: the model uses the text description to provide abundant guidance for image restoration, fusing multimodal features to generate plausible and context-aware inpainting results.
Workflow:
- User uploads an image and gives the width and height of the mask region to be edited.
- User provides a text prompt describing the desired content for the masked area.
- The system fuses the image and text features using a multimodal deep learning model.
- The masked region is filled in according to the prompt, producing a visually coherent and semantically relevant output.
- Text-guided inpainting: Restore or edit images by describing changes in natural language.
- Interactive interface: Upload images, draw masks, and enter prompts in a user-friendly web app.
- Multimodal fusion: Combines visual and textual cues for context-aware restoration.
- State-of-the-art results: Inspired by MMFL and recent advances in diffusion-based inpainting models.
Below are screenshots and sample results from the interactive interface:
Interface Screenshot:




- Clone this repository:
git clone https://github.com/IEEE-NITK/text-guided-image-inpainting.git
cd text-guided-image-inpainting
- Install dependencies:
pip install -r demo/requirements.txt
To launch the interactive inpainting interface locally:
python demo/app.py
How to use:
- Upload an image.
- Select the required height and width of the mask.
- Enter a text prompt describing the desired content.
- Click "Generate Inpainted Image" to generate the result.
The interface is built with Streamlit for ease of use and rapid prototyping.
- Lin, Qing, et al. "MMFL: Multimodal Fusion Learning for Text-Guided Image Inpainting." Proceedings of the 28th ACM International Conference on Multimedia, 2020.
Contributions and feedback are welcome!
“This paper imitates the process of painters' conjecture, and proposes to introduce the text description into the image inpainting task for the first time, which provides abundant guidance information for image restoration through the fusion of multimodal features.”