ethanjperez · MorganScalabrino · Mar 14, 2025 · Mar 14, 2025 · Mar 14, 2025 · Mar 14, 2025
diff --git a/.gitignore b/.gitignore
@@ -1,12 +1,18 @@
 # Data files
-data
+data/
+my_stuff/
+testing.py
+
+# Model files
+models/
 
 # Experiment files
 exp
 scripts/dev
 
 # Image files
 img/cst
+img/attention_visualizations/
 
 # Editor files
 *.DS_Store

diff --git a/Hello.py b/Hello.py
@@ -0,0 +1,28 @@
+import streamlit as st
+
+st.set_page_config(
+    page_title="Hello",
+    page_icon="👋",
+)
+
+st.write("# Welcome to Streamlit! 👋")
+
+st.sidebar.success("Select a demo above.")
+
+st.markdown(
+    """
+    Streamlit is an open-source app framework built specifically for
+    Machine Learning and Data Science projects.
+    **👈 Select a demo from the sidebar** to see some examples
+    of what Streamlit can do!
+    ### Want to learn more?
+    - Check out [streamlit.io](https://streamlit.io)
+    - Jump into our [documentation](https://docs.streamlit.io)
+    - Ask a question in our [community
+        forums](https://discuss.streamlit.io)
+    ### See more complex demos
+    - Use a neural net to [analyze the Udacity Self-driving Car Image
+        Dataset](https://github.com/streamlit/demo-self-driving)
+    - Explore a [New York City rideshare dataset](https://github.com/streamlit/demo-uber-nyc-pickups)
+"""
+)
diff --git a/README.md b/README.md
@@ -1,79 +1,56 @@
-# FiLM: Visual Reasoning with a General Conditioning Layer
-
-## Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville
-
-This code implements a Feature-wise Linear Modulation approach to Visual Reasoning - answering multi-step questions on images. This codebase reproduces results from the AAAI 2018 paper "FiLM: Visual Reasoning with a General Conditioning Layer" (citation [here](https://github.com/ethanjperez/film#film)), which extends prior work "Learning Visual Reasoning Without Strong Priors" presented at ICML's MLSLP workshop. Please see the [retrospective paper](https://ml-retrospectives.github.io/neurips2019/accepted_retrospectives/2019/film/) (citation [here](https://github.com/ethanjperez/film#retrospective-for-film)) for an honest reflection on FiLM after the work that followed, including when to (and not to) use FiLM and tips-and-tricks for effectively training a network with FiLM layers.
-
-### Code Outline
-
-This code is a fork from the code for "Inferring and Executing Programs for Visual Reasoning" available [here](https://github.com/facebookresearch/clevr-iep).
-
-Our FiLM Generator is located in [vr/models/film_gen.py](https://github.com/ethanjperez/film/blob/master/vr/models/film_gen.py), and our FiLMed Network and FiLM layer implementation is located in [vr/models/filmed_net.py](https://github.com/ethanjperez/film/blob/master/vr/models/filmed_net.py).
-
-We inserted a new model mode "FiLM" which integrates into forked code for [CLEVR baselines](https://arxiv.org/abs/1612.06890) and the [Program Generator + Execution Engine model](https://arxiv.org/abs/1705.03633). Throughout the code, for our model, our FiLM Generator acts in place of the "program generator" which generates the FiLM parameters for an the FiLMed Network, i.e. "execution engine." In some sense, FiLM parameters can vaguely be thought of as a "soft program" of sorts, but we use this denotation in the code to integrate better with the forked models.
-
-### Setup and Training
-
-Because of this integration, setup instructions for the FiLM model are nearly the same as for "Inferring and Executing Programs for Visual Reasoning." We will post more detailed instructions on how to use our code in particular soon for more step-by-step guidance. For now, the guidelines below should give substantial direction to those interested.
-
-First, follow the virtual environment setup [instructions](https://github.com/facebookresearch/clevr-iep#setup).
-
-Second, follow the CLEVR data preprocessing [instructions](https://github.com/facebookresearch/clevr-iep/blob/master/TRAINING.md#preprocessing-clevr).
-
-Lastly, model training details are similar at a high level (though adapted for FiLM and our repo) to [these](https://github.com/facebookresearch/clevr-iep/blob/master/TRAINING.md#training-on-clevr) for the Program Generator + Execution Engine model, though our model only uses one step of training, rather than a 3-step training procedure.
-
-The below script has the hyperparameters and settings to reproduce FiLM CLEVR results:
+# Todo
+- [-] Rapport
+- Streamlit
+    - [ ] Docu sphinx
+    - [ ] Gros modèle pré-entraîné
+        - [x] Obtention des weights
+        - [ ] Streamlit poser questions sur image
+        - [ ] Visualisation des histogrammes gamma/beta
+        - [ ] Visualisation tSNE
+        - [ ] Visualisation de ce que le MLP "voit"
+    - [ ] Petit modèle, train sur CPU
+        - Avoir aussi le preprocessing réduit?
+        - Comment avoir un temps d'entraînement rapide? réduire architecture? réduire train/val dataset?
+        - [ ] Streamlit train
+        - [ ] Streamlit questions
+- Bonus:
+    - Zero-shot
+    - Graph comparaison de performance sur jeux de donnée classique
+
+# Requirements
+- Python 3.12
+- Other dependencies listed in `requirements.txt`
+
+# References
+- The code in this repo is heavily inspired by the repos [Film](https://github.com/ethanjperez/film) and [Clever-iep](https://github.com/facebookresearch/clevr-iep)
+- [Distill: Feature wise transformations](https://distill.pub/2018/feature-wise-transformations/)
+- [Arxiv: FiLM: Visual Reasoning with a General Conditioning Layer](https://arxiv.org/pdf/1709.07871)
+
+# Get the data
+For each script, check the `.sh` and/or the `.py` associated file to modify parameters.
+To download the data, run:
 ```bash
-sh scripts/train/film.sh
+mkdir data
+wget https://dl.fbaipublicfiles.com/clevr/CLEVR_v1.0.zip -O data/CLEVR_v1.0.zip
+unzip data/CLEVR_v1.0.zip -d data
 ```
 
-
-For CLEVR-Humans, data preprocessing instructions are [here](https://github.com/facebookresearch/clevr-iep/blob/master/TRAINING.md#preprocessing-clevr-humans).
-The below script has the hyperparameters and settings to reproduce FiLM CLEVR-Humans results:
+To preprocess the data from pngs to a h5 file for each train/val/test set, run the following code. The data will be the raw pixels, there are options to extract features with the option `--model resnet101` (1024x14x14 output), or to set a maximum number of X processed images `--max_images X` (check `extract_features.py`).
 ```bash
-sh scripts/train/film_humans.sh
+sh scripts/extract_features.sh
 ```
 
-
-Training a CLEVR-CoGenT model is very similar to training a normal CLEVR model. Training a model from pixels requires modifying the preprocessing with scripts included in the repo to preprocess pixels. The scripts to reproduce our results are also located in the scripts/train/ folder.
-
-We tried to not break existing models from the CLEVR codebase with our modifications, but we haven't tested their code after our changes. We recommend using using the CLEVR and "Inferring and Executing Programs for Visual Reasoning" code directly.
-
-Training a solid FiLM CLEVR model should only take ~12 hours on a good GPU (See training curves in the paper appendix).
-
-### Running models
-
-We added an interactive command line tool for use with the below command/script. It's actually super enjoyable to play around with trained models. It's great for gaining intuition around what various trained models have or have not learned and how they tackle reasoning questions.
+To preprocess the questions, execute this script:
 ```bash
-python run_model.py --program_generator <FiLM Generator filepath> --execution_engine <FiLMed Network filepath>
+sh scripts/preprocess_questions.sh
 ```
 
-By default, the command runs on [this CLEVR image](https://github.com/ethanjperez/film/blob/master/img/CLEVR_val_000017.png) in our repo, but you may modify which image to use via command line flag to test on any CLEVR image.
-
-CLEVR vocab is enforced by default, but for CLEVR-Humans models, for example, you may append the command line flag option '--enforce_clevr_vocab 0' to ask any string of characters you please.
-
-In addition, one easier way to try out zero-shot with FiLM is to run a trained model with run_model.py, but with the implemented debug command line flag on so you can manipulate the FiLM parameters modulating the FiLMed network during the forward computation. For example, '--debug_every -1' will stop the program after the model generates FiLM parameters but before the FiLMed network carries out its forward pass using FiLM layers.
-
-Thanks for stopping by, and we hope you enjoy playing around with FiLM!
-
-### Bibtex
-
-#### FiLM
+To train the model:
 ```bash
-@InProceedings{perez2018film,
-  title={FiLM: Visual Reasoning with a General Conditioning Layer},
-  author={Ethan Perez and Florian Strub and Harm de Vries and Vincent Dumoulin and Aaron C. Courville},
-  booktitle={AAAI},
-  year={2018}
-}
+sh scripts/train/film.sh
 ```
 
-#### Retrospective for FiLM
+To run the model (on `CLEVR_val_000017.png` by default):
 ```bash
-@misc{perez2019retrospective,
-  author = {Perez, Ethan},
-  title = {{Retroespective for: "FiLM: Visual Reasoning with a General Conditioning Layer"}},
-  year = {2019},
-  howpublished = {\url{https://ml-retrospectives.github.io/published_retrospectives/2019/film/}},
-}
-```
+sh scripts/run_model.sh
+```
diff --git a/docs/projet_DL_slidespres.pdf b/docs/projet_DL_slidespres.pdf
diff --git a/img/stats/Betas: Layer 1.png → img/stats/Betas Layer 1.png b/img/stats/Betas: Layer 1.png → img/stats/Betas Layer 1.png
diff --git a/img/stats/Betas: Layer 2.png → img/stats/Betas Layer 2.png b/img/stats/Betas: Layer 2.png → img/stats/Betas Layer 2.png
diff --git a/img/stats/Betas: Layer 3.png → img/stats/Betas Layer 3.png b/img/stats/Betas: Layer 3.png → img/stats/Betas Layer 3.png
diff --git a/img/stats/Betas: Layer 4.png → img/stats/Betas Layer 4.png b/img/stats/Betas: Layer 4.png → img/stats/Betas Layer 4.png
diff --git a/img/stats/Gammas: Layer 1.png → img/stats/Gammas Layer 1.png b/img/stats/Gammas: Layer 1.png → img/stats/Gammas Layer 1.png
diff --git a/img/stats/Gammas: Layer 2.png → img/stats/Gammas Layer 2.png b/img/stats/Gammas: Layer 2.png → img/stats/Gammas Layer 2.png
diff --git a/img/stats/Gammas: Layer 3.png → img/stats/Gammas Layer 3.png b/img/stats/Gammas: Layer 3.png → img/stats/Gammas Layer 3.png
diff --git a/img/stats/Gammas: Layer 4.png → img/stats/Gammas Layer 4.png b/img/stats/Gammas: Layer 4.png → img/stats/Gammas Layer 4.png
diff --git a/pages/Large pre-trained model.py b/pages/Large pre-trained model.py
@@ -0,0 +1,105 @@
+import streamlit as st
+import subprocess
+import platform
+import os
+import time
+import plotly_express as px
+import numpy as np
+
+# Chose the python interpreter path
+current_os = platform.system()
+if current_os == "Windows":
+    python_executable = ".venv\Scripts\python.exe"
+else:
+    python_executable = ".venv/bin/python"
+
+st.title("Feature-wise Linear Modulations")
+
+tab1, tab2 = st.tabs(["Visualizing", "Training"])
+
+with tab1:
+    # Display error message if not data/best.pt
+    if not os.path.exists("data/best.pt"):
+        st.error("No model found at \"data/best.pt\". Please train or download the model")
+
+    # Select and display the image with default image 17
+    img_number = st.selectbox(
+        "Select an image number:", [str(i) for i in range(10, 20)], index=7
+    )
+    st.image(
+        f"img/CLEVR_val_0000{img_number}.png",
+        caption=f"CLEVR_val_0000{img_number}.png",
+        # use_container_width=True,
+        width=400,
+    )
+
+    # Checkbox to visualize attention
+    visualize = st.checkbox("Visualize attention")
+
+    # Create a form so that hitting Enter submits the input
+    with st.form(key="question_form"):
+        user_input = st.text_input("Enter your question:")
+        submit_button = st.form_submit_button("Submit")
+
+
+    if submit_button:
+        # Launch the process (adjust parameters as needed)
+        process = subprocess.Popen(
+            [
+                python_executable,
+                "scripts/run_model.py",
+                "--image",
+                f"img/CLEVR_val_0000{img_number}.png",
+                "--streamlit",
+                "True",
+                "--visualize_attention",
+                str(visualize),
+            ],
+            stdin=subprocess.PIPE,
+            stdout=subprocess.PIPE,
+            stderr=subprocess.PIPE,
+            text=True,
+        )
+
+        # Send the user input to the process and capture the output
+        output, error = process.communicate(input = user_input)
+        output = output.strip() # Remove leading/trailing whitespace
+
+        # Display the output
+        st.subheader("Model Response:")
+        st.write(output)
+
+        # Optionally display any error messages
+        # if error:
+        #     st.subheader("Errors:")
+        #     st.write(error)
+
+        # Display the image with attention, if requested
+        if visualize:
+            attention_img_path = f"img/attention_visualizations/{user_input} {output}/pool-feature-locations.png"
+            # Wait for the image to be created
+            while not os.path.exists(attention_img_path):
+                time.sleep(1)
+            st.image(attention_img_path, caption="Image with attention", width=400)
+
+        # importation and processing of the parameters values for the three resblocks
+        parameters=torch.load('D:\\projet FiLM deep learning\\img\\params.pt')
+        beta=[]
+        gamma=[]
+        for i in range(3):
+            beta.extend(parameters[0][i][0:128].tolist())
+            gamma.extend(parameters[0][i][128:256].tolist())
+
+        # ploting the histograms with Plotly
+        hist_gammas = px.histogram(gamma, nbins=70, marginal='rug')
+        hist_gammas.update_layout(title='Histogram of gammas values of the 3 resblocks', xaxis_title='Value', yaxis_title='Frequency')
+        st.plotly_chart(hist_gammas)
+        hist_betas = px.histogram(beta, nbins=70, marginal='rug')
+        hist_betas.update_layout(title='Histogram of gammas values of the 3 resblocks', xaxis_title='Value', yaxis_title='Frequency')
+        st.plotly_chart(hist_betas)
+
+with tab2:
+    epoch = st.slider("Epoch", 1, 20, 1)
+    model_choice = st.selectbox("Model", ["resnet", "raw"])
+    if st.button("Train"):
+        st.write(f"Training started with {model_choice} for {epoch} epochs")
diff --git a/pyproject.toml b/pyproject.toml
@@ -0,0 +1,2 @@
+[tool.ruff]
+ignore = ["F401","E402"]
diff --git a/requirements.txt b/requirements.txt
@@ -1,7 +1,80 @@
-http://download.pytorch.org/whl/cu80/torch-0.1.11.post5-cp35-cp35m-linux_x86_64.whl
-numpy
-Pillow
-scipy
-torchvision
-h5py
-tqdm
+altair==5.5.0
+asttokens==3.0.0
+attrs==25.1.0
+blinker==1.9.0
+cachetools==5.5.1
+certifi==2024.12.14
+charset-normalizer==3.4.1
+click==8.1.8
+contourpy==1.3.1
+cycler==0.12.1
+decorator==5.2.1
+executing==2.2.0
+filelock==3.16.1
+fonttools==4.55.3
+fsspec==2024.12.0
+gitdb==4.0.12
+GitPython==3.1.44
+h5py==3.13.0
+idna==3.10
+imageio==2.37.0
+ipdb==0.13.13
+ipython==9.0.2
+ipython_pygments_lexers==1.1.1
+jedi==0.19.2
+Jinja2==3.1.5
+joblib==1.4.2
+jsonschema==4.23.0
+jsonschema-specifications==2024.10.1
+kiwisolver==1.4.8
+markdown-it-py==3.0.0
+MarkupSafe==3.0.2
+matplotlib==3.10.0
+matplotlib-inline==0.1.7
+mdurl==0.1.2
+mpmath==1.3.0
+narwhals==1.24.0
+networkx==3.4.2
+numpy==1.26.4
+packaging==24.2
+pandas==2.2.3
+parso==0.8.4
+pexpect==4.9.0
+pillow==11.1.0
+prompt_toolkit==3.0.50
+protobuf==5.29.3
+ptyprocess==0.7.0
+pure_eval==0.2.3
+pyarrow==19.0.0
+pydeck==0.9.1
+Pygments==2.19.1
+pyparsing==3.2.1
+python-dateutil==2.9.0.post0
+pytz==2024.2
+referencing==0.36.2
+requests==2.32.3
+rich==13.9.4
+rpds-py==0.22.3
+scikit-learn==1.6.1
+scipy==1.15.1
+setuptools==76.0.0
+six==1.17.0
+smmap==5.0.2
+stack-data==0.6.3
+streamlit==1.41.1
+sympy==1.13.1
+tenacity==9.0.0
+termcolor==2.5.0
+threadpoolctl==3.5.0
+toml==0.10.2
+torch==2.6.0
+torchvision==0.21.0
+tornado==6.4.2
+tqdm==4.67.1
+traitlets==5.14.3
+typing_extensions==4.12.2
+tzdata==2025.1
+urllib3==2.3.0
+watchdog==6.0.0
+wcwidth==0.2.13
+plotly.express=0.4.0