Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
c7e843d
Initial post-fork commit
Leob000 Mar 14, 2025
abc8bf3
Update Python version requirement in README
Leob000 Mar 14, 2025
97fd041
Add new dependencies to requirements.txt
Leob000 Mar 14, 2025
5c64717
Add imageio dependency to requirements.txt
Leob000 Mar 14, 2025
0dc078e
Improve device selection logic for compatibility with CUDA and MPS
Leob000 Mar 14, 2025
208e2c5
Cuda agnostic code
Leob000 Mar 14, 2025
a909ede
format
Leob000 Mar 15, 2025
0698ca0
h5 pickle bug correction
Leob000 Mar 15, 2025
c38a2ec
Remove colon from filename for windows compatibility
Leob000 Mar 15, 2025
32d194f
Merge branch 'master' into dev_leo2
Leob000 Mar 15, 2025
c038c6d
iteration change
Leob000 Mar 15, 2025
61e662a
Merge branch 'master' into dev_leo2
Leob000 Mar 15, 2025
84ae010
corps d'appli streamlit
QuetinT Mar 16, 2025
744b420
Update README.md: change bash to sh for script execution and add trai…
Leob000 Mar 16, 2025
0655d3d
added time flags
Leob000 Mar 16, 2025
25788fe
Update training script parameters for improved model performance
Leob000 Mar 16, 2025
3704481
format changes
Leob000 Mar 16, 2025
e79e6a8
Update README.md: add instructions for modifying script parameters an…
Leob000 Mar 16, 2025
5d0bef2
Update requirements.txt: add tqdm dependency for progress tracking
Leob000 Mar 16, 2025
253a9c9
Add run_model.sh script to execute the model with specified parameters
Leob000 Mar 16, 2025
4fba96c
Correct version bugs to make the model work
Leob000 Mar 16, 2025
3e5f9fd
Linting correction
Leob000 Mar 16, 2025
2654cfe
Update README.md: enhance project todo list with detailed tasks and s…
Leob000 Mar 16, 2025
3607327
Update requirements.txt: add watchdog dependency for file system moni…
Leob000 Mar 16, 2025
1f51a3f
Update .gitignore to include model files and clean up streamlit app code
QuetinT Mar 16, 2025
13e3579
Implémentation basique de streamlit, questions avec le gros modèle pr…
Leob000 Mar 16, 2025
1936968
Merge branch 'master' into branche-tristan
QuetinT Mar 16, 2025
a4c8d87
Add OS-specific Python interpreter path and error handling if no mode…
QuetinT Mar 17, 2025
1ef71de
Refactor streamlit app: enhance error handling, prepare for visualiza…
QuetinT Mar 17, 2025
c2ed2f7
Allow attention visualization for a single example, add attention vis…
QuetinT Mar 17, 2025
186f9c7
forgotten return in run_single_example
QuetinT Mar 17, 2025
0c3fc8e
Update run_model.py
MorganScalabrino Mar 23, 2025
82581c3
Update Large pre-trained model.py
MorganScalabrino Mar 23, 2025
78519a1
Update requirements.txt
MorganScalabrino Mar 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
# Data files
data
data/
my_stuff/
testing.py

# Model files
models/

# Experiment files
exp
scripts/dev

# Image files
img/cst
img/attention_visualizations/

# Editor files
*.DS_Store
Expand Down
28 changes: 28 additions & 0 deletions Hello.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import streamlit as st

st.set_page_config(
page_title="Hello",
page_icon="👋",
)

st.write("# Welcome to Streamlit! 👋")

st.sidebar.success("Select a demo above.")

st.markdown(
"""
Streamlit is an open-source app framework built specifically for
Machine Learning and Data Science projects.
**👈 Select a demo from the sidebar** to see some examples
of what Streamlit can do!
### Want to learn more?
- Check out [streamlit.io](https://streamlit.io)
- Jump into our [documentation](https://docs.streamlit.io)
- Ask a question in our [community
forums](https://discuss.streamlit.io)
### See more complex demos
- Use a neural net to [analyze the Udacity Self-driving Car Image
Dataset](https://github.com/streamlit/demo-self-driving)
- Explore a [New York City rideshare dataset](https://github.com/streamlit/demo-uber-nyc-pickups)
"""
)
109 changes: 43 additions & 66 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,79 +1,56 @@
# FiLM: Visual Reasoning with a General Conditioning Layer

## Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville

This code implements a Feature-wise Linear Modulation approach to Visual Reasoning - answering multi-step questions on images. This codebase reproduces results from the AAAI 2018 paper "FiLM: Visual Reasoning with a General Conditioning Layer" (citation [here](https://github.com/ethanjperez/film#film)), which extends prior work "Learning Visual Reasoning Without Strong Priors" presented at ICML's MLSLP workshop. Please see the [retrospective paper](https://ml-retrospectives.github.io/neurips2019/accepted_retrospectives/2019/film/) (citation [here](https://github.com/ethanjperez/film#retrospective-for-film)) for an honest reflection on FiLM after the work that followed, including when to (and not to) use FiLM and tips-and-tricks for effectively training a network with FiLM layers.

### Code Outline

This code is a fork from the code for "Inferring and Executing Programs for Visual Reasoning" available [here](https://github.com/facebookresearch/clevr-iep).

Our FiLM Generator is located in [vr/models/film_gen.py](https://github.com/ethanjperez/film/blob/master/vr/models/film_gen.py), and our FiLMed Network and FiLM layer implementation is located in [vr/models/filmed_net.py](https://github.com/ethanjperez/film/blob/master/vr/models/filmed_net.py).

We inserted a new model mode "FiLM" which integrates into forked code for [CLEVR baselines](https://arxiv.org/abs/1612.06890) and the [Program Generator + Execution Engine model](https://arxiv.org/abs/1705.03633). Throughout the code, for our model, our FiLM Generator acts in place of the "program generator" which generates the FiLM parameters for an the FiLMed Network, i.e. "execution engine." In some sense, FiLM parameters can vaguely be thought of as a "soft program" of sorts, but we use this denotation in the code to integrate better with the forked models.

### Setup and Training

Because of this integration, setup instructions for the FiLM model are nearly the same as for "Inferring and Executing Programs for Visual Reasoning." We will post more detailed instructions on how to use our code in particular soon for more step-by-step guidance. For now, the guidelines below should give substantial direction to those interested.

First, follow the virtual environment setup [instructions](https://github.com/facebookresearch/clevr-iep#setup).

Second, follow the CLEVR data preprocessing [instructions](https://github.com/facebookresearch/clevr-iep/blob/master/TRAINING.md#preprocessing-clevr).

Lastly, model training details are similar at a high level (though adapted for FiLM and our repo) to [these](https://github.com/facebookresearch/clevr-iep/blob/master/TRAINING.md#training-on-clevr) for the Program Generator + Execution Engine model, though our model only uses one step of training, rather than a 3-step training procedure.

The below script has the hyperparameters and settings to reproduce FiLM CLEVR results:
# Todo
- [-] Rapport
- Streamlit
- [ ] Docu sphinx
- [ ] Gros modèle pré-entraîné
- [x] Obtention des weights
- [ ] Streamlit poser questions sur image
- [ ] Visualisation des histogrammes gamma/beta
- [ ] Visualisation tSNE
- [ ] Visualisation de ce que le MLP "voit"
- [ ] Petit modèle, train sur CPU
- Avoir aussi le preprocessing réduit?
- Comment avoir un temps d'entraînement rapide? réduire architecture? réduire train/val dataset?
- [ ] Streamlit train
- [ ] Streamlit questions
- Bonus:
- Zero-shot
- Graph comparaison de performance sur jeux de donnée classique

# Requirements
- Python 3.12
- Other dependencies listed in `requirements.txt`

# References
- The code in this repo is heavily inspired by the repos [Film](https://github.com/ethanjperez/film) and [Clever-iep](https://github.com/facebookresearch/clevr-iep)
- [Distill: Feature wise transformations](https://distill.pub/2018/feature-wise-transformations/)
- [Arxiv: FiLM: Visual Reasoning with a General Conditioning Layer](https://arxiv.org/pdf/1709.07871)

# Get the data
For each script, check the `.sh` and/or the `.py` associated file to modify parameters.
To download the data, run:
```bash
sh scripts/train/film.sh
mkdir data
wget https://dl.fbaipublicfiles.com/clevr/CLEVR_v1.0.zip -O data/CLEVR_v1.0.zip
unzip data/CLEVR_v1.0.zip -d data
```


For CLEVR-Humans, data preprocessing instructions are [here](https://github.com/facebookresearch/clevr-iep/blob/master/TRAINING.md#preprocessing-clevr-humans).
The below script has the hyperparameters and settings to reproduce FiLM CLEVR-Humans results:
To preprocess the data from pngs to a h5 file for each train/val/test set, run the following code. The data will be the raw pixels, there are options to extract features with the option `--model resnet101` (1024x14x14 output), or to set a maximum number of X processed images `--max_images X` (check `extract_features.py`).
```bash
sh scripts/train/film_humans.sh
sh scripts/extract_features.sh
```


Training a CLEVR-CoGenT model is very similar to training a normal CLEVR model. Training a model from pixels requires modifying the preprocessing with scripts included in the repo to preprocess pixels. The scripts to reproduce our results are also located in the scripts/train/ folder.

We tried to not break existing models from the CLEVR codebase with our modifications, but we haven't tested their code after our changes. We recommend using using the CLEVR and "Inferring and Executing Programs for Visual Reasoning" code directly.

Training a solid FiLM CLEVR model should only take ~12 hours on a good GPU (See training curves in the paper appendix).

### Running models

We added an interactive command line tool for use with the below command/script. It's actually super enjoyable to play around with trained models. It's great for gaining intuition around what various trained models have or have not learned and how they tackle reasoning questions.
To preprocess the questions, execute this script:
```bash
python run_model.py --program_generator <FiLM Generator filepath> --execution_engine <FiLMed Network filepath>
sh scripts/preprocess_questions.sh
```

By default, the command runs on [this CLEVR image](https://github.com/ethanjperez/film/blob/master/img/CLEVR_val_000017.png) in our repo, but you may modify which image to use via command line flag to test on any CLEVR image.

CLEVR vocab is enforced by default, but for CLEVR-Humans models, for example, you may append the command line flag option '--enforce_clevr_vocab 0' to ask any string of characters you please.

In addition, one easier way to try out zero-shot with FiLM is to run a trained model with run_model.py, but with the implemented debug command line flag on so you can manipulate the FiLM parameters modulating the FiLMed network during the forward computation. For example, '--debug_every -1' will stop the program after the model generates FiLM parameters but before the FiLMed network carries out its forward pass using FiLM layers.

Thanks for stopping by, and we hope you enjoy playing around with FiLM!

### Bibtex

#### FiLM
To train the model:
```bash
@InProceedings{perez2018film,
title={FiLM: Visual Reasoning with a General Conditioning Layer},
author={Ethan Perez and Florian Strub and Harm de Vries and Vincent Dumoulin and Aaron C. Courville},
booktitle={AAAI},
year={2018}
}
sh scripts/train/film.sh
```

#### Retrospective for FiLM
To run the model (on `CLEVR_val_000017.png` by default):
```bash
@misc{perez2019retrospective,
author = {Perez, Ethan},
title = {{Retroespective for: "FiLM: Visual Reasoning with a General Conditioning Layer"}},
year = {2019},
howpublished = {\url{https://ml-retrospectives.github.io/published_retrospectives/2019/film/}},
}
```
sh scripts/run_model.sh
```
Binary file added docs/projet_DL_slidespres.pdf
Binary file not shown.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
105 changes: 105 additions & 0 deletions pages/Large pre-trained model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
import streamlit as st
import subprocess
import platform
import os
import time
import plotly_express as px
import numpy as np

# Chose the python interpreter path
current_os = platform.system()
if current_os == "Windows":
python_executable = ".venv\Scripts\python.exe"
else:
python_executable = ".venv/bin/python"

st.title("Feature-wise Linear Modulations")

tab1, tab2 = st.tabs(["Visualizing", "Training"])

with tab1:
# Display error message if not data/best.pt
if not os.path.exists("data/best.pt"):
st.error("No model found at \"data/best.pt\". Please train or download the model")

# Select and display the image with default image 17
img_number = st.selectbox(
"Select an image number:", [str(i) for i in range(10, 20)], index=7
)
st.image(
f"img/CLEVR_val_0000{img_number}.png",
caption=f"CLEVR_val_0000{img_number}.png",
# use_container_width=True,
width=400,
)

# Checkbox to visualize attention
visualize = st.checkbox("Visualize attention")

# Create a form so that hitting Enter submits the input
with st.form(key="question_form"):
user_input = st.text_input("Enter your question:")
submit_button = st.form_submit_button("Submit")


if submit_button:
# Launch the process (adjust parameters as needed)
process = subprocess.Popen(
[
python_executable,
"scripts/run_model.py",
"--image",
f"img/CLEVR_val_0000{img_number}.png",
"--streamlit",
"True",
"--visualize_attention",
str(visualize),
],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
)

# Send the user input to the process and capture the output
output, error = process.communicate(input = user_input)
output = output.strip() # Remove leading/trailing whitespace

# Display the output
st.subheader("Model Response:")
st.write(output)

# Optionally display any error messages
# if error:
# st.subheader("Errors:")
# st.write(error)

# Display the image with attention, if requested
if visualize:
attention_img_path = f"img/attention_visualizations/{user_input} {output}/pool-feature-locations.png"
# Wait for the image to be created
while not os.path.exists(attention_img_path):
time.sleep(1)
st.image(attention_img_path, caption="Image with attention", width=400)

# importation and processing of the parameters values for the three resblocks
parameters=torch.load('D:\\projet FiLM deep learning\\img\\params.pt')
beta=[]
gamma=[]
for i in range(3):
beta.extend(parameters[0][i][0:128].tolist())
gamma.extend(parameters[0][i][128:256].tolist())

# ploting the histograms with Plotly
hist_gammas = px.histogram(gamma, nbins=70, marginal='rug')
hist_gammas.update_layout(title='Histogram of gammas values of the 3 resblocks', xaxis_title='Value', yaxis_title='Frequency')
st.plotly_chart(hist_gammas)
hist_betas = px.histogram(beta, nbins=70, marginal='rug')
hist_betas.update_layout(title='Histogram of gammas values of the 3 resblocks', xaxis_title='Value', yaxis_title='Frequency')
st.plotly_chart(hist_betas)

with tab2:
epoch = st.slider("Epoch", 1, 20, 1)
model_choice = st.selectbox("Model", ["resnet", "raw"])
if st.button("Train"):
st.write(f"Training started with {model_choice} for {epoch} epochs")
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[tool.ruff]
ignore = ["F401","E402"]
87 changes: 80 additions & 7 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,80 @@
http://download.pytorch.org/whl/cu80/torch-0.1.11.post5-cp35-cp35m-linux_x86_64.whl
numpy
Pillow
scipy
torchvision
h5py
tqdm
altair==5.5.0
asttokens==3.0.0
attrs==25.1.0
blinker==1.9.0
cachetools==5.5.1
certifi==2024.12.14
charset-normalizer==3.4.1
click==8.1.8
contourpy==1.3.1
cycler==0.12.1
decorator==5.2.1
executing==2.2.0
filelock==3.16.1
fonttools==4.55.3
fsspec==2024.12.0
gitdb==4.0.12
GitPython==3.1.44
h5py==3.13.0
idna==3.10
imageio==2.37.0
ipdb==0.13.13
ipython==9.0.2
ipython_pygments_lexers==1.1.1
jedi==0.19.2
Jinja2==3.1.5
joblib==1.4.2
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
kiwisolver==1.4.8
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib==3.10.0
matplotlib-inline==0.1.7
mdurl==0.1.2
mpmath==1.3.0
narwhals==1.24.0
networkx==3.4.2
numpy==1.26.4
packaging==24.2
pandas==2.2.3
parso==0.8.4
pexpect==4.9.0
pillow==11.1.0
prompt_toolkit==3.0.50
protobuf==5.29.3
ptyprocess==0.7.0
pure_eval==0.2.3
pyarrow==19.0.0
pydeck==0.9.1
Pygments==2.19.1
pyparsing==3.2.1
python-dateutil==2.9.0.post0
pytz==2024.2
referencing==0.36.2
requests==2.32.3
rich==13.9.4
rpds-py==0.22.3
scikit-learn==1.6.1
scipy==1.15.1
setuptools==76.0.0
six==1.17.0
smmap==5.0.2
stack-data==0.6.3
streamlit==1.41.1
sympy==1.13.1
tenacity==9.0.0
termcolor==2.5.0
threadpoolctl==3.5.0
toml==0.10.2
torch==2.6.0
torchvision==0.21.0
tornado==6.4.2
tqdm==4.67.1
traitlets==5.14.3
typing_extensions==4.12.2
tzdata==2025.1
urllib3==2.3.0
watchdog==6.0.0
wcwidth==0.2.13
plotly.express=0.4.0
Loading