spark-nlp/docs/en/transformer_entries/Florence2Transformer.md at 625e5c10fd25b4b54ccb1adf4bb718d9abb6e14a · JohnSnowLabs/spark-nlp

{%- capture title -%} Florence2Transformer {%- endcapture -%}

{%- capture description -%} Florence2Transformer can load Florence-2 models for a wide variety of vision and vision-language tasks using prompt-based inference.

Florence-2 is an advanced vision foundation model from Microsoft that uses a prompt-based approach to handle tasks like image captioning, object detection, segmentation, OCR, and more. The model leverages the FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. Its sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings.

Pretrained models can be loaded with pretrained of the companion object:

val florence2 = Florence2Transformer.pretrained()
     .setInputCols("image_assembler")
     .setOutputCol("answer")

The default model is "florence2_base_ft_int4", if no name is provided.

For available pretrained models please see the Models Hub.

==Supported Tasks==

Florence-2 supports a variety of tasks through prompt engineering. The following prompt tokens can be used:

: Image captioning
<DETAILED_CAPTION>: Detailed image captioning
<MORE_DETAILED_CAPTION>: Paragraph-level captioning
<CAPTION_TO_PHRASE_GROUNDING>: Phrase grounding from caption (requires additional text input)
: Object detection
<DENSE_REGION_CAPTION>: Dense region captioning
<REGION_PROPOSAL>: Region proposal
: Optical Character Recognition (plain text extraction)
<OCR_WITH_REGION>: OCR with region information
<REFERRING_EXPRESSION_SEGMENTATION>: Segmentation for a referred phrase (requires additional text input)
<REGION_TO_SEGMENTATION>: Polygon mask for a region (requires additional text input)
<OPEN_VOCABULARY_DETECTION>: Open vocabulary detection for a phrase (requires additional text input)
<REGION_TO_CATEGORY>: Category of a region (requires additional text input)
<REGION_TO_DESCRIPTION>: Description of a region (requires additional text input)
<REGION_TO_OCR>: OCR for a region (requires additional text input)

{%- endcapture -%}

{%- capture input_anno -%} IMAGE {%- endcapture -%}

{%- capture output_anno -%} DOCUMENT {%- endcapture -%}

{%- capture python_example -%} import sparknlp from sparknlp.base import * from sparknlp.annotator import * from pyspark.ml import Pipeline from pyspark.sql.functions import lit

image_df = spark.read.format("image").load(path=images_path) # Replace with your image path test_df = image_df.withColumn("text", lit(""))

imageAssembler = ImageAssembler()
.setInputCol("image")
.setOutputCol("image_assembler")

florence2 = Florence2Transformer.pretrained()
.setInputCols(["image_assembler"])
.setOutputCol("answer")

pipeline = Pipeline().setStages([ imageAssembler, florence2 ])

result = pipeline.fit(test_df).transform(test_df) result.select("image_assembler.origin", "answer.result").show(False) {%- endcapture -%}

{%- capture scala_example -%} import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline import org.apache.spark.sql.DataFrame import org.apache.spark.sql.functions.lit

val imageFolder = "path/to/your/images" // Replace with your image path

val imageDF: DataFrame = spark.read .format("image") .option("dropInvalid", value = true) .load(imageFolder)

val testDF: DataFrame = imageDF.withColumn("text", lit(""))

val imageAssembler: ImageAssembler = new ImageAssembler() .setInputCol("image") .setOutputCol("image_assembler")

val florence2 = Florence2Transformer.pretrained() .setInputCols("image_assembler") .setOutputCol("answer")

val pipeline = new Pipeline().setStages(Array( imageAssembler, florence2 ))

val result = pipeline.fit(testDF).transform(testDF)

result.select("image_assembler.origin", "answer.result").show(false) {%- endcapture -%}

{%- capture api_link -%} Florence2Transformer {%- endcapture -%}

{%- capture python_api_link -%} Florence2Transformer {%- endcapture -%}

{%- capture source_link -%} Florence2Transformer {%- endcapture -%}

{% include templates/anno_template.md title=title description=description input_anno=input_anno output_anno=output_anno python_example=python_example scala_example=scala_example api_link=api_link python_api_link=python_api_link source_link=source_link %}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FilesExpand file tree

Florence2Transformer.md

Latest commit

History

Florence2Transformer.md

File metadata and controls