Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation
- Critics are the bottleneck. FIRM is built around the idea that RL for visual generation only works when the reward model is faithful, stable, and hard to hack.
- Two task-specific data pipelines.
FIRM-Edituses a difference-first scoring pipeline, whileFIRM-Genuses a plan-then-score pipeline to reduce MLLM hallucinations. - One benchmark for both sides.
FIRM-Benchprovides a human-annotated test bed for editing and generation critics. - RL reward shaping that actually holds up.
CMEandQMAare designed to prevent the shortcut behavior that appears when rewards are naively combined.
| Component |
|---|
FIRM-Edit-370K |
FIRM-Gen-293K |
FIRM-Edit-8B / FIRM-Gen-8B |
FIRM-Bench |
FIRM-Qwen-Edit / FIRM-SD-3.5 |
TrustYourCritic/
├── generation/ # GenerationRL training and reward serving
└── editing/ # EditRL training, reward serving, reproduction scripts
- To avoid Python package conflicts, install and run GenRL/EditRL in separate environments.
cd generation
conda create -n FIRM-Gen python=3.10 -y
conda activate FIRM-Gen
pip install -e .python generation/flow_grpo/reward_model_server.py
generation/config/nft_flux2_klein.pygeneration/config/nft_qwen_image.pygeneration/config/nft_zimage_turbo.pygeneration/config/nft.py
bash generation/scripts/train_sd35_sharegpt_qwenvl.shcd editing
conda create -n FIRM-Edit python=3.10 -y
conda activate FIRM-Edit
pip install -e .## Change the default ip and port to your perference
python editing/reward_server/reward_server_qwen3_vl_8b_sft.pyediting/config/kontext_nft_qwen3vl_8b_sft.pyediting/config/kontext_nft_qwen3vl_8b.pyediting/config/kontext_nft_qwen25vl_32b_non_logits.py
bash editing/examples/train_qwen_image_edit.shExpected JSON file like:
[
{"input_prompt": "A cinematic portrait of a fox in snow."}
]Expected dataset layout:
dataset-root/
├── images/
├── train_metadata.jsonl
└── test_metadata.jsonl
Each JSONL line:
{"prompt": "make the sky sunset orange", "image": "images/example.jpg", "requirement": "preserve identity"}The code and data for FIRM-Bench are hosted on Hugging Face.
We provide inference and evaluation scripts for FIRM-Bench. We recommend deploying the model with vLLM for inference.
python FIRM-Bench-Edit/vllm_infer.py \
--input FIRM-Bench-Edit/bench_v1.jsonl \
--output FIRM-Bench-Edit/result/xxx.jsonl \
--image-root FIRM-Bench-Edit/ \
--api-url xxxxxpython FIRM-Bench-Edit/edit_mae.py \
--gt FIRM-Bench-Edit/result/human_bench_v1.jsonl \
--pred FIRM-Bench-Edit/result/xxx.jsonlpython FIRM-Bench-Gen/vllm_infer.py \
--input FIRM-Bench-Gen/bench_v1.jsonl \
--output FIRM-Bench-Gen/result/xxx.jsonl \
--image-root FIRM-Bench-Gen/ \
--api-url xxxxxpython FIRM-Bench-Gen/gen_mae.py \
--gt FIRM-Bench-Gen/result/human_bench_v1.jsonl \
--pred FIRM-Bench-Gen/result/xxx.jsonlThis repository was shaped by several open-source projects that pushed RL for image generation and image editing forward:
