Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions text_diffuser/a.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@


target_list=[]

target_images=[1,2,3,4]

while len(target_list)<5:

for img in target_images:
if len(target_list)==5:
break

target_list.append(img)
print(target_list)
21 changes: 21 additions & 0 deletions text_diffuser/clipscore/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2021 Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, Yejin Choi

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
175 changes: 175 additions & 0 deletions text_diffuser/clipscore/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# What's in here?

This repo contains the code for our EMNLP 2021 paper: [CLIPScore: A
Reference-free Evaluation Metric for Image
Captioning](https://arxiv.org/abs/2104.08718). CLIPScore is a metric
that you can use to evaluate the quality of an automatic image
captioning system. In our paper, we show that CLIPScore achieves high
correlation with human judgment on literal image captioning
tasks. However, unlike BLEU or CIDEr, CLIPScore doesn't require
reference captions.

If you find the paper or this code useful, please consider citing:

```
@inproceedings{hessel2021clipscore,
title={{CLIPScore:} A Reference-free Evaluation Metric for Image Captioning},
author={Hessel, Jack and Holtzman, Ari and Forbes, Maxwell and Bras, Ronan Le and Choi, Yejin},
booktitle={EMNLP},
year={2021}
}
```

# How do I run the code?

## Command Line

Example usage
```
> python clipscore.py example/good_captions.json example/images/
...
CLIPScore: 0.8584
```

If you include optionally some references, you will see RefCLIPScore, alongside a usual set of
caption generation evaluation metrics. The references are optional.

```
> python clipscore.py example/good_captions.json example/images/ --references_json example/refs.json
...
BLEU-1: 0.6667
BLEU-2: 0.4899
BLEU-3: 0.3469
BLEU-4: 0.0000
METEOR: 0.3444
ROUGE: 0.4280
CIDER: 0.5637
SPICE: 0.4000
CLIPScore: 0.8584
RefCLIPScore: 0.8450
```

Worse captions should get lower scores:
```
> python clipscore.py example/bad_captions.json example/images/ --references_json example/refs.json
...
BLEU-1: 0.4815
BLEU-2: 0.2404
BLEU-3: 0.1359
BLEU-4: 0.0000
METEOR: 0.1861
ROUGE: 0.3121
CIDER: 0.2790
SPICE: 0.1500
CLIPScore: 0.7153
RefCLIPScore: 0.7253
```

You can treat/report CLIPScore and RefCLIPScore similarly to the other
evaluation metrics. See the paper for more details about CLIPScore and
RefCLIPScore. Full usage options can be given by `python clipscore.py
-h`. An example set of inputs, including a candidate json, image
directory, and references json is given this repo under `example/`

The input files are formatted as follows:

The candidates json should be a dictionary that maps from
`{"string_image_identifier": "candidate"}`, e.g.,

```
{'image1': 'an orange cat and a grey cat are lying together.',
'image2': 'a black dog looks at the camera.'
...}
```

The image directory should be a directory containing the images that
act as the keys in the candidates json, e.g.,

```
images/
├── image1.jpg
└── image2.jpg
```

and, finally, the references json should be a dictionary that maps from
`{"string_image_identifier": ["list", "of", "references"]}`, e.g.,

```
{"image1": ["two cats are sleeping next to each other.",
"a grey cat is cuddling with an orange cat on a blanket.",
"the orange cat is happy that the black cat is close to it."],
"image2": ["a dog is wearing ear muffs as it lies on a carpet.",
"a black dog and an orange cat are looking at the photographer.",
"headphones are placed on a dogs ears."]}
```

## MSCOCO dataset in pycocoevalcap

If you're running on the MSCOCO dataset and using the standard
evaluation toolkit, you can use our version of
[pycocoevalcap](https://github.com/jmhessel/pycocoevalcap) to evaluate.
You won't even need to download the original MSCOCO images, thanks to
a bit of magic :-)

To use `pycocoevalcap` on the MSCOCO dataset in the MSCOCO format, you
can simply use:

```
pip install git+https://github.com/jmhessel/pycocoevalcap.git
```

there is an example evaluation in that repo under
`examples/eval.py`. After pip installing, if you clone the
`pycocoeval` repo and run

```
python eval.py
```

after a bit of time, the output should be:
```
Bleu_1: 0.579
Bleu_2: 0.404
Bleu_3: 0.279
Bleu_4: 0.191
METEOR: 0.195
ROUGE_L: 0.396
CIDEr: 0.600
SPICE: 0.133
CLIPScore: 0.528
RefCLIPScore: 0.605
```

## Reproducibility notes:

- CLIPScore can run either on CPU or GPU. But, there are slight
differences due to floating point precision. As discussed
[here](https://github.com/openai/CLIP/issues/30#issuecomment-771099118),
on CPU, all operations run in `float32`, but on GPU, some operations
run in `float16`. The differences are generally small (e.g., for the
example run above, with `example/good_captions.json` captions and
`example/images/` images, on CPU, the output is `CLIPScore: 0.8585`,
but on GPU, the output is `CLIPScore: 0.8584`.) *All experiments in the
paper were run on GPU, and this code will raise a warning if you're not
using a GPU.*

- Because CLIPScore depends on the images to compute, resizing,
compressing, etc. can all cause slight differences in computing the
CLIPScore. Even saving a jpg twice can result in different
compression, because that format is lossy! To this end, we release
the checksums of the images we used for the paper. see `checksums/`
for more info. For the pycocoevalcap repo, we have also included the
checksums for MSCOCO --- see
[here](https://github.com/jmhessel/pycocoevalcap/tree/master/clipscore)
for more info.

- The prompt we used for the text side of CLIP, as mentioned in the
paper is ``A photo depicts" This is hard-coded into this repo. Other
prompts will result in slightly different results, and we don't
recommend them for the sake of reproducibility.

## Acknowledgment

The authors would like to thank Jungo Kasai for being the first to use
this repo. Thanks to Jungo, we fixed a few issues, and added some
information about reproducibility that was missing before.
8 changes: 8 additions & 0 deletions text_diffuser/clipscore/checksums/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
### Whats in here?

We include the checksums of the images we used to compute CLIPScore
for the paper. Because the evaluation metric depends on the image
itself, things like jpg compression, resizing, etc. can impact
reproducability. We downloaded the datasets from the original sources
and didn't modify them. But if you're having trouble matching these
checksums, definitely get in touch!
Loading