GitHub - davide-serramazza/Geometry-dataset: A syntetic dataset for trees transduction

Code for generating a synthetic dataset for image captioning. The data produced for each data element are: -an image of shapes (circles and rectangles) of different colours recursively containing other figures. -a XML tree describing the hierarchy of the shapes composing the image (maximum depth is 3).

a sentence (English language) describing the image with different levels of detail: figures in the first two levels of the hierarchy are listed using their shape and colour, figures in the third level are described only with their number.

This dataset has different purposes: as long as the standard image captioning task, it is possible to tackle this problem as a trees transduction task. Indeed it is possible to get input trees labelled with CNN information from images and XML trees. XML trees can also be used as targets thus defining isomorphic tree transduction. Finally, the sentences can be processed with suitable software to get parse trees or dependency trees. Using these trees as targets we define a non-isomorphic tree transduction problem.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
README.md		README.md
color.py		color.py
dump_results.py		dump_results.py
evaluate.py		evaluate.py
generate.py		generate.py
geometry dataset.iml		geometry dataset.iml
helper_functions.py		helper_functions.py
main.py		main.py
parser_nltk.py		parser_nltk.py
stanfor parser.py		stanfor parser.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

davide-serramazza/Geometry-dataset

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages