Extracted figures are overwritten

Hi,

I'm new to unstructured. When I run the sample code to perform the partitioning of several pdf/doc files, the extracted images are saved to the separate folder called `figures`. The naming convention seems to be: `figure-{page_number}-{#}`, as a result, the images extracted from different documents appearing on the same page number, will overwrite themselves - the metadata in the resulting json will point out to wrong file.

For instance the link to `figures/figure-3-2.jpg` is included in three separate json files.

Here is my code:
```
    Pipeline.from_configs(
        context=ProcessorConfig(),
        indexer_config=LocalIndexerConfig(input_path=str(INPUT_DIR)),
        downloader_config=LocalDownloaderConfig(),
        source_connection_config=LocalConnectionConfig(),
        partitioner_config=PartitionerConfig(
            ocr_languages=["eng"],
            strategy="hi_res",
            partition_by_api=False,
            additional_partition_args={"extract_image_block_types": ["Image", "Table"]},
        ),
        # chunker_config=ChunkerConfig(
        #     chunking_strategy="by_title",
        #     chunk_max_characters=512,
        #     chunk_combine_text_under_n_chars=200,
        # ),
        # embedder_config=EmbedderConfig(embedding_provider="huggingface"),
        uploader_config=LocalUploaderConfig(output_dir=str(OUTPUT_DIR)),
    ).run()
```

Is this a bug or am I doing something wrong ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extracted figures are overwritten #396

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extracted figures are overwritten #396

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions