{%- capture title -%} Florence2Transformer {%- endcapture -%}
{%- capture description -%} Florence2Transformer can load Florence-2 models for a wide variety of vision and vision-language tasks using prompt-based inference.
Florence-2 is an advanced vision foundation model from Microsoft that uses a prompt-based approach to handle tasks like image captioning, object detection, segmentation, OCR, and more. The model leverages the FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. Its sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings.
Pretrained models can be loaded with pretrained of the companion object:
val florence2 = Florence2Transformer.pretrained()
.setInputCols("image_assembler")
.setOutputCol("answer")The default model is "florence2_base_ft_int4", if no name is provided.
For available pretrained models please see the Models Hub.
==Supported Tasks==
Florence-2 supports a variety of tasks through prompt engineering. The following prompt tokens can be used:
- : Image captioning
- <DETAILED_CAPTION>: Detailed image captioning
- <MORE_DETAILED_CAPTION>: Paragraph-level captioning
- <CAPTION_TO_PHRASE_GROUNDING>: Phrase grounding from caption (requires additional text input)
- : Object detection
- <DENSE_REGION_CAPTION>: Dense region captioning
- <REGION_PROPOSAL>: Region proposal
- : Optical Character Recognition (plain text extraction)
- <OCR_WITH_REGION>: OCR with region information
- <REFERRING_EXPRESSION_SEGMENTATION>: Segmentation for a referred phrase (requires additional text input)
- <REGION_TO_SEGMENTATION>: Polygon mask for a region (requires additional text input)
- <OPEN_VOCABULARY_DETECTION>: Open vocabulary detection for a phrase (requires additional text input)
- <REGION_TO_CATEGORY>: Category of a region (requires additional text input)
- <REGION_TO_DESCRIPTION>: Description of a region (requires additional text input)
- <REGION_TO_OCR>: OCR for a region (requires additional text input)
{%- endcapture -%}
{%- capture input_anno -%} IMAGE {%- endcapture -%}
{%- capture output_anno -%} DOCUMENT {%- endcapture -%}
{%- capture python_example -%} import sparknlp from sparknlp.base import * from sparknlp.annotator import * from pyspark.ml import Pipeline from pyspark.sql.functions import lit
image_df = spark.read.format("image").load(path=images_path) # Replace with your image path test_df = image_df.withColumn("text", lit(""))
imageAssembler = ImageAssembler()
.setInputCol("image")
.setOutputCol("image_assembler")
florence2 = Florence2Transformer.pretrained()
.setInputCols(["image_assembler"])
.setOutputCol("answer")
pipeline = Pipeline().setStages([ imageAssembler, florence2 ])
result = pipeline.fit(test_df).transform(test_df) result.select("image_assembler.origin", "answer.result").show(False) {%- endcapture -%}
{%- capture scala_example -%} import spark.implicits._ import com.johnsnowlabs.nlp.base._ import com.johnsnowlabs.nlp.annotator._ import org.apache.spark.ml.Pipeline import org.apache.spark.sql.DataFrame import org.apache.spark.sql.functions.lit
val imageFolder = "path/to/your/images" // Replace with your image path
val imageDF: DataFrame = spark.read .format("image") .option("dropInvalid", value = true) .load(imageFolder)
val testDF: DataFrame = imageDF.withColumn("text", lit(""))
val imageAssembler: ImageAssembler = new ImageAssembler() .setInputCol("image") .setOutputCol("image_assembler")
val florence2 = Florence2Transformer.pretrained() .setInputCols("image_assembler") .setOutputCol("answer")
val pipeline = new Pipeline().setStages(Array( imageAssembler, florence2 ))
val result = pipeline.fit(testDF).transform(testDF)
result.select("image_assembler.origin", "answer.result").show(false) {%- endcapture -%}
{%- capture api_link -%} Florence2Transformer {%- endcapture -%}
{%- capture python_api_link -%} Florence2Transformer {%- endcapture -%}
{%- capture source_link -%} Florence2Transformer {%- endcapture -%}
{% include templates/anno_template.md title=title description=description input_anno=input_anno output_anno=output_anno python_example=python_example scala_example=scala_example api_link=api_link python_api_link=python_api_link source_link=source_link %}