diff --git a/docs.json b/docs.json index 56b3402a..cc02562f 100644 --- a/docs.json +++ b/docs.json @@ -482,6 +482,13 @@ "self-hosted/gcp/overview", "self-hosted/gcp/onboard" ] + }, + { + "group": "Plugins", + "pages": [ + "self-hosted/plugins/overview", + "self-hosted/plugins/tutorial" + ] } ] }, diff --git a/img/ui/Plugins-DAG.png b/img/ui/Plugins-DAG.png new file mode 100644 index 00000000..68b18697 Binary files /dev/null and b/img/ui/Plugins-DAG.png differ diff --git a/self-hosted/plugins/overview.mdx b/self-hosted/plugins/overview.mdx new file mode 100644 index 00000000..237adb22 --- /dev/null +++ b/self-hosted/plugins/overview.mdx @@ -0,0 +1,39 @@ +--- +title: Unstructured self-hosted plugins overview +sidebarTitle: Overview +--- + +In Unstructured, a _plugin_ is a self-contained unit of code that can be used to add, change, or use data within the context of an Unstructured ETL+ workflow. Every +node in a workflow is itself a plugin. You can also create your own plugins to extend your organization's workflow capabilities. + +Developing, deploying, and running your own custom plugins is available only for +the [Unstructured user interface](/ui/overview) (UI) that has already been deployed to +infrastructure that you maintain in your +[Amazon Web Services (AWS)](/self-hosted/aws/overview), [Azure](/self-hosted/azure/overview), or +[Google Cloud Platform (GCP)](/self-hosted/gcp/overview) account. + +If you do not already have a self-hosted deployment of the Unstructured UI, +contact your Unstructured sales representative, email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io), or fill out the +[contact form](https://unstructured.io/contact) on the Unstructured website, and a member of the Unstructured sales or support teams +will get back to you as soon as possible to discuss self-hosting options. + +## Concepts + +Plugins are rather straightforward in they accept a named input and emit a named output. The following diagram illustrates this concept: + +![Conceptual programmatic flow of plugins](/img/ui/Plugins-DAG.png) + +In the preceding diagram: + +- The blue boxes represent the default plugins that come with Unstructured. +- The yellow circles describe what each default plugin does. +- The green box represents the indexer that gathers all of the source files. +- The red box represents the destination location. +- The arrows represent the flow of data between the plugins. +- The words within the arrows represent the programmatic names of the inputs and outputs of the plugins. For example, + the **Partitioner** plugin accepts its input, represented by the programmatic name `doc_path`, from the previous plugin. + The **Partitioner** plugin emits its output, represented by the programmatic name `element_dicts` to the next plugin. + +## Getting started + +To get started with eveloping, deploying, and running your own custom plugins, try out the [tutorial](/self-hosted/plugins/tutorial). \ No newline at end of file diff --git a/self-hosted/plugins/tutorial.mdx b/self-hosted/plugins/tutorial.mdx new file mode 100644 index 00000000..01be8e5b --- /dev/null +++ b/self-hosted/plugins/tutorial.mdx @@ -0,0 +1,731 @@ +--- +title: Self-hosted plugins tutorial +sidebarTitle: Tutorial +--- + +This hands-on tutorial shows how to use the Unstructured self-hosted plugin framework to create a sample [plugin](/self-hosted/plugins/overview). +This sample plugin uses a [VertexAI](https://cloud.google.com/vertex-ai) model from Google to perform sentiment analysis +on the text that Unstructured extracts from documents. For example, +given the following custom prompt: + +```text +Given a piece of text, classify the following types of information: +- toxic or non-toxic +- emotion if it conveys, such as happiness, or anger +- intent such as "finding information", "making a reservation", or "placing an order" + +text: + {text} + +Output the results as JSON and nothing else. For example: +```json +{{ + "toxicity": "non-toxic", + "emotion": "neutral", + "intent": "making a reservation" +}} +``` + +And the following text: + +```text +Hi, can you please book a table for two at Juan for May 1? +``` + +The model returns a sentiment analysis in this format: + +```json +{ + "toxicity": "non-toxic", + "emotion": "neutral", + "intent": "making a reservation" +} +``` + +## Requirements + +- A [self-hosted](/self-hosted/overview) deployment of the Unstructured UI and Unstructured API into infrastructure that you maintain in your + Amazon Web Services (AWS), Azure, or Google Cloud Platform (GCP) account. If you do not have a self-hosted deployment, stop and contact your + Unstructured sales representative, email Unstructured Sales at [sales@unstructured.io](mailto:sales@unstructured.io), or fill out the + [contact form](https://unstructured.io/contact) on the Unstructured website first. +- A local development machine with [Docker Desktop](https://docs.docker.com/desktop/) and the Python pacakge and project manager + [uv](https://docs.astral.sh/uv/getting-started/installation/) installed. +- For sending requests to the plugin through Docker locally, the [curl](https://curl.se/) utility installed on the development machine. +- For deploying the plugin to your self-hosted Unstructured UI, you must have aceess to a container registry that is compliant with the Open Container Initiative (OCI) and + that is also reachable from your AWS, Azure, or GCP account. For example: + + - For AWS accounts, [Amazon Elastic Container Registry](https://aws.amazon.com/ecr/) (Amazon ECR). + - For Azure accounts, [Azure Container Registry](https://azure.microsoft.com/products/container-registry). + - For GCP accounts, [Google Artifact Registry](https://cloud.google.com/artifact-registry) (GAR). + + You must also have the related command-line interface installed and configured on the development machine: + + - For AWS accounts, the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html). + - For Azure accounts, the [Azure CLI](https://learn.microsoft.com/cli/azure/install-azure-cli). + - For GCP accounts, the [Google Cloud CLI](https://cloud.google.com/sdk/docs/install). + +- To call the VertexAI portions of this tutorial: + + - A [Google Cloud account](https://console.cloud.google.com). + - The **Vertex AI API** enabled in the Google Cloud account. [Learn how](https://cloud.google.com/apis/docs/getting-started#enabling_apis). + - Within the Google Cloud account, a Google Cloud service account and its related `credentials.json` key file or its contents in JSON format. + [Create a service account](https://developers.google.com/workspace/guides/create-credentials#create_a_service_account). + [Create credentials for a service account](https://developers.google.com/workspace/guides/create-credentials#create_credentials_for_a_service_account). + - A single-line string that contains the contents of the downloaded `credentials.json` key file for the service account (and not the service account key file itself). + To print this single-line string without line breaks, suitable for copying, you can run one of the following commands from your Terminal or Command Prompt. + In this command, replace `` with the path to the `credentials.json` key file that you downloaded by following the preceding instructions. + + - For macOS or Linux: + + ```text + tr -d '\n' < + ``` + + - For Windows: + + ```text + (Get-Content -Path "" -Raw).Replace("`r`n", "").Replace("`n", "") + ``` + +## Getting started + +In this section, you set up the local development environment for this tutorial's plugin. +This includes creating a directory for overall plugin development, +creating a virtual environment to isolate and version Python and various code dependencies, installing the Unstructured plugin development tools and their dependencies, +and creating and initializing the code project for this tutorial's plugin. + + + + We recommend creating or using a centralized directory on your local development machine to use for developing this and other plugins. If you create a new directory, be + sure to switch to it after you create it. This tutorial uses a directory named `plugins` within the + current working directory. For example: + + ```bash + mkdir plugins + cd plugins + ``` + + + + Use `uv` to create a virtual environment within the directory that you want to use for overall plugin development. After you + create the virtual environment, activate it. + + This tutorial uses a virtual environment named + `plugins_3_12_9`. This virtual environment uses Python 3.12.9. If this Python version is not installed + on the system, `uv` installs it first. For example: + + ```bash + uv venv --python 3.12.9 --prompt "plugins_3_12_9" + source .venv/bin/activate + ``` + + + + Use `uv` to install the Unstructured plugin development tools and their dependencies into this virtual environment. These tools and their + dependencies will be the same for all plugins that you develop that use this virtual environment. + + ```bash + uv pip install utic-dev-tools cookiecutter + ``` + + The dependent `cookiecutter` package is a command-line utility that uses techniques such as wizards along with Python project templates to + initialize new projects based on user input. + + + + 1. Use the `unstructured-plugins new` command to create the starter code for this tutorial's plugin development project. This command starts a wizard that is used + to create a new directory for developing this plugin and then creates the plugin's starter files and subdirectories within that directory: + + ```bash + unstructured-plugins new + ``` + + 2. When propmpted, enter some display name for the plugin, and then press `Enter`. This tutorial uses `Sentiment Analysis` as the plugin's display name: + + ```bash + [1/3] name (My First Plugin): Sentiment Analysis + ``` + + 3. Next, enter the plugin's type, and then press `Enter`. This tutorial uses `sentiment` as the plugin's type: + + ```bash + [2/3] type (): sentiment + ``` + + 4. Next, enter the plugin's subtype, and then press `Enter`. This tutorial uses `analysis` as the plugin's subtype: + + ```bash + [3/3] subtype (): analysis + ``` + + 5. A project folder is created within the centralized `plugins` directory. The project folder is named `plugin-` followed by the + plugin's type, another dash, and the plugin's subtype. For this tutorial, the project folder's name is named `plugin-sentiment-analysis`. + + Switch to the plugin's project folder and then use `uv` to install and update this project's specific code dependencies: + + ```bash + cd plugin-sentiment-analysis + uv sync + ``` + + + + +## Write the plugin + +In this section, you write the plugin's runtime logic. This tutorial's logic is primarily within the project's `src/plugin_sentiment_analysis/__init__.py` file. + + + + In this step, you add the user interface (UI) settings for the plugin. The UI settings are the fields that + users see when they add the plugin as a node to a workflow's visual DAG designer in the UI. The UI settings are defined in the `__init__.py` file + of the plugin project's `src/plugin__` subfolder. These settings are specified in the + `__init__.py` file's `PluginSettings` class, which is a subclass of + the [Pydantic](https://docs.pydantic.dev/latest/) `BaseModel` class. The `BaseModel` class provides a Pydantic implementation of various type validation, + data parsing, and serialization functionality. + + 1. In the project's `src` directory, under the `plugin_sentiment_analyis` subdirectory, open the `__init__.py` file. + 2. In the `__init__.py` file, add the necessary imports to capture VertexAI settings that the user sets in the UI. To do this, add the following `from...import` statements to the top of the file: + + ```python + from typing import Literal + from pydantic import SecretStr + ``` + + The `Literal` is a type hint in Python that restricts a field to specific literal values (such as strings, numbers, or booleans). + It enforces that the input must match one of the specified options. + + The `SecretStr` is a specialized string type in Pydantic for sensitive data (such as passwords and API keys). It masks the value in + fields by displaying `*****`. + + 3. In the `__init__.py` file's `PluginSettings` class, replace the sample `string_field` setting definition with settings for the `location`, `credentials`, and + `model` fields. The class definition should now look as follows: + + ```python + class PluginSettings(BaseModel): + """ + Settings used to configure running instances of the plugin. + + These are what can be configured by the user and what will be + available in the UI. + """ + + location: Literal[ + "us-east5", "us-south1", "us-central1", "us-east1", "us-east4", "us-west1" + ] = Field(title="API Location") + credentials: SecretStr = Field(title="Credentials JSON") + model: Literal["gemini-1.5-flash"] = Field("gemini-1.5-flash", title="Model") + ``` + + - The `location` field specifies the location of the VertexAI API. The field in the UI's help pane for the plugin node + will display the title of **API Location**. + - The `credentials` field specifies the JSON credentials for the VertexAI API. The field in the UI will have the title of + **Credentials JSON**. Specifying the `SecretStr` type displays the field's text with asterisks. + - The `model` field specifies the model for VertexAI to use. The field in the UI will have the title of **Model**. The default + value for this field is `gemini-1.5-flash`. + - At run time, the `PluginSettings` class reads these field's values from the UI and writes them as a JSON dictionary into a `settings.json` file + in the project's root for the plugin to read from later. + + + + 1. Add the necessary VertexAI dependencies: + + ```bash + uv pip install google-cloud-aiplatform google-auth + ``` + + 2. At the top of the `__init__.py` file, add the necessary import statements for calling the VertexAI API and for standard Python logging and JSON parsing: + + ```python + from google.cloud import aiplatform + from google.oauth2 import service_account + from vertexai.generative_models import GenerativeModel + import logging, json + ``` + + 3. In the `__init__.py` file's `Plugin` class, replace the `__post_init__` function body with the following definition: + + ```python + def __post_init__(self): + try: + with open(self.env_settings.job_settings_file) as f: + self.plugin_settings = PluginSettings.model_validate_json(f.read()) + + credentials_json = json.loads( + self.plugin_settings.credentials.get_secret_value() + ) + credentials = service_account.Credentials.from_service_account_info( + credentials_json + ) + + aiplatform.init( + location=self.plugin_settings.location, + project=credentials_json["project_id"], + credentials=credentials, + ) + + self.model = GenerativeModel(self.plugin_settings.model) + except Exception as e: + print(f"Plugin initialization failed: {e}") + raise + ``` + + - The `__post_init__` function is called after the `Plugin` class is initialized. The function reads in the UI field values from the `settings.json` file + that the `PluginSettings` class wrote to earlier. + - The function then prepares the authorization credentials that were provided in the UI to be used by VertexAI. + - The `aiplatform.init` function initializes the VertexAI API with the specified location, project ID, and authorization credentials. + - The `GenerativeModel` class gets the model to be used that was specified in the UI. + + 4. In the `__init__.py` file's `Plugin` class, just before the `run` function, add the prompt text to be sent to VertexAI. + At run time, this prompt, along with a piece of text that Unstructured extracts from the document, is sent to VertexAI for sentiment analysis: + + ```python + PROMPT = """ + Given a piece of text, classify the following types of information: + - toxic or non-toxic + - emotion if it conveys, such as happiness, or anger + - intent such as "finding information", "making a reservation", or "placing an order" + + text: + {text} + + Output the results as JSON and nothing else. For example: + ```json + {{ + "toxicity": "non-toxic", + "emotion": "neutral", + "intent": "making a reservation" + }} + ``` + """ + ``` + + 5. In the `__init__.py` file's `Plugin` class, replace the `run` function body with the following definition: + + ```python + def run(self, element_dicts: list[dict]) -> Response: + """ + This method is called once for every file that is processed. + + element_dicts is a list of elements: + + See https://docs.unstructured.io/open-source/concepts/document-elements + """ + for element in element_dicts: + element: ElementDict + prompt_text = self.PROMPT.format(text=element["text"]) + response_text = self.model.generate_content(prompt_text).text + try: + data = json.loads(response_text.strip().strip("```").lstrip("json")) + except json.JSONDecodeError: + logging.basicConfig(level=logging.INFO) + logging.getLogger().error(f"Failed to parse response: {response_text}") + data = {} + element["metadata"].update(data) + + return Response(element_dicts=element_dicts) + ``` + + - The `run` function is called once for every file that is processed. The function takes a list of the elements that Unstructured generated from the file as input. + - Each element in the list of elements is a dictionary that contains the text extracted from the document and its related metadata. + - The function sends the prompt and the element's text to the model. + - The function then adds the sentiment analysis output to the element's `metadata` field. + - After the last element's sentiment analysis is output into the last element's `metadata` field, the enitre updated list's contents are given as input into the + next node in the workflow's DAG. + + + + +## Run plugin tests locally with pytest + +In this section, you manually run the plugin's tests locally using `pytest` to make sure that the plugin's logic is working as expected before further +testing in Docker and eventual deployment for use in the UI. + +In practice, you would typically use a continuous integration and continuous deployment (CI/CD) pipeline to automate running these tests. +If any of the tests fail, the pipeline should stop and notify you of the failure. If all of the tests pass, the pipeline should then +continue by [running the plugin in Docker](#run-the-plugin-in-docker-locally) as a further test. + + + + 1. Add the necessary `pytest` dependencies. Also add a dependency on the `dotenv` package, which is used to read environment variables from a local `.env` file: + + ```bash + uv pip install pytest dotenv + ``` + + 2. In the project's `test` directory, at the top of the `test_plugin.py` file, add the following import statements to enable reading local environment variables. + Also, call the `load_dotenv` function to load the environment variables from the `.env` file: + + ```python + import os + from dotenv import load_dotenv + + load_dotenv() + ``` + + 3. In the `test_plugin.py` file, update the following `from...import` statement to find the specified classes that are defined in the `src/plugin_sentiment_analyis` folder: + + ```python + from src.plugin_sentiment_analysis import Plugin, PluginSettings + ``` + + 4. In the root of the project's `test` directory, add a blank `__init__.py` file. This file is required to allow the + `src` directory to be seen by the `test` directory to enable the preceding `from...import` statement to work. + + + + 1. In the `test_plugin.py` file, replace the `plugin` function body with the following definition: + + ```python + @pytest.fixture + def plugin(tmp_path): + credentials = os.getenv("VERTEXAI_CREDENTIALS") + if not credentials: + raise ValueError("VERTEXAI_CREDENTIALS env var must be set to run test") + + settings_filepath: Path = tmp_path / "settings.json" + settings = {"location": "us-east1", "credentials": credentials} + settings_filepath.write_text(json.dumps(settings)) + + yield Plugin( + env_settings=EnvSettings( + shared_filepath=tmp_path, + job_settings_file=str(settings_filepath), + ) + ) + ``` + + - The `plugin` function is a fixture that sets up the plugin's infrastructure for the `test_plugin` test function that follows. + - The function reads the `VERTEXAI_CREDENTIALS` environment variable from the `.env` file that you will create next. + - Instead of using the `settings.json` file that would normally be used by the `PluginSettings` class, the function creates a temporary + `settings.json` file just for these tests. This temporary file contains sample values for the **API Location** and **Credentials JSON** + fields that users would have + otherwise specified when using the plugin in the UI. + + 2. In the project's root, create a file named `.env`. In this file, add an environment variable named `VERTEXAI_CREDENTIALS`, and set it to the single-line representation of the `credentials.json` file that + you generated in this tutorial's requirements: + + ```text + VERTEXAI_CREDENTIALS="" + ``` + + + If you plan to publish this plugin's source code to an external repository such as GitHub, do not include the `.env` file in the repository, as it + can expose sentitive information publicly, such as your credentials for the VertexAI API. + + To help prevent this file from accidentally being included in the repository, add a `.env` entry to a `.gitignore` file in the root of the project. + + + 3. In the `test_plugin.py` file, replace the `test_plugin` function body with the following definition. + The function body definition should now look as follows: + + ```python + def test_plugin(plugin: Plugin, elements: list[dict]): + + elements[0]["text"] = "Hi, can you please book a table for two at Juan for May 1?" + + output = plugin.run(elements) + output_elements = output.element_dicts + + assert len(output_elements) == 1 + metadata = output_elements[0]["metadata"] + + assert metadata["toxicity"] == "non-toxic" + assert metadata["emotion"] == "neutral" + assert metadata["intent"] == "making a reservation" + ``` + + - The `test_plugin` function is a test case that uses the `plugin` fixture to run the plugin's logic. + - The function takes a list of Unstructured-formatted elements as input. The first element in the list contains the text that is used to test the plugin. + - The function then runs the plugin's logic and checks that the output is as expected. + - The function checks that the output contains the expected values for the `toxicity`, `emotion`, and `intent` fields that are returned. If the expected values + match, the test passes. Otherwise, the test fails. + + + + To run the test, use the following command to run `pytest` though the `test` target in the file named `Makefile` in the root of the project: + + ```bash + make test + ``` + + If the test passes, you should see something similar to the following: + + ```bash + tests/test_plugin.py . + + 1 passed + ``` + + + + +## Run the plugin in Docker locally + +In this section, you proceed with local testing by manually running the plugin in Docker locally. +This allows you to more fully test the plugin's logic in an isolated environment before you deploy it into your self-hosted UI. + +In practice, you would typically use a CI/CD pipeline to automate running the plugin in Docker and +testing the output against an expected result. If the plugin's output does not match the expected result, the pipeline should stop and notify you of the failure. +If the plugin's output matches the expected result, the pipeline should then continue by +[deploying the plugin to the staging version of your self-hosted Unstructured UI](#deploy-the-plugin-to-your-self-hosted-ui). + + + + In your local machine's home directory, create a hidden file named `.vertex-plugin-settings.json`. This file contains + information that your local installation of Docker passes into the running container. In this file, add the following JSON content: + + ```json + { + "location": "", + "credentials": "" + } + ``` + + In the preceding JSON: + + - Replace `` with the location of the VertexAI API that you want to use, for example, `us-east1`. + - Replace `` with the single-line representation of the `credentials.json` file that + you generated in this tutorial's requirements. + + + This `.vertex-plugin-settings.json` file contains sensitive information and + is intended for local Docker testing only. Do not check in this file with your plugin's source code. + + + + + 1. In the file named `Makefile` in the root of the project, replace the `.PHONY: run-docker` definition with the following definition: + + ```text + .PHONY: run-docker + run-docker: docker-build-local + docker run -it --rm \ + -v $(PWD):/shared \ + -v $(HOME)/.vertex-plugin-settings.json:/settings.json \ + -e JOB_SETTINGS_FILE=/settings.json \ + -p 8000:8000 \ + "${IMAGE_REPOSITORY}:${VERSION}" + ``` + + The `run-docker` target builds the Docker image locally and then runs it as a container representing the plugin. + + 2. Start Docker Desktop on your local machine, if it is not already running. + 3. Run the following command to call the `run-docker` target, which builds the Docker image and then runs the resulting container, representing the plugin: + + ```bash + make run-docker + ``` + + You must leave this terminal window open and running while you are testing the plugin locally within the running Docker container. If you interrupt the running process here or close this + terminal window, the Docker container stops running, and the plugin stops working. + + + + 1. In a new terminal window, use the following `curl` command to send a request to the plugin that is running in the Docker container. + The request contains some sample text that you want VertexAI to perform sentiment analysis on along with some pretend metadata in the format that is + typically generated by Unstructured during processing. + + ```bash + curl --location 'localhost:8000/invoke' \ + --header 'Content-Type: application/json' \ + --data '{ + "element_dicts": [ + { + "type": "NarrativeText", + "element_id": "1453c80530ef11712374570a086dbd64", + "text": "Hi, can you please book a table for two at Juan for May 1?", + "metadata": { + "languages": [ + "eng" + ], + "filetype": "text/plain", + "data_source": { + "record_locator": { + "path": "/path/to/file.txt" + }, + "permissions_data": [ + { + "mode": 33188 + } + ] + } + } + } + ] + }' + ``` + + 2. If successful, the output should look similar to the following. Notice that the `toxicity`, `emotion`, and `intent` fields were added to the + element's `metadata` field (JSON formatting has been applied here for better readability): + + ```json + { + "usage": [], + "status_code": 200, + "filedata_meta": { + "terminate_current": false, + "new_records": [] + }, + "status_code_text": null, + "output": { + "element_dicts": [ + { + "type": "NarrativeText", + "element_id": "1453c80530ef11712374570a086dbd64", + "text": "Hi, can you please book a table for two at Juan for May 1?", + "metadata": { + "languages": [ + "eng" + ], + "filetype": "text/plain", + "data_source": { + "record_locator": { + "path": "/path/to/file.txt" + }, + "permissions_data": [ + { + "mode": 33188 + } + ] + }, + "toxicity": "non-toxic", + "emotion": "neutral", + "intent": "making a reservation" + } + } + ] + }, + "message_channels": { + "infos": [], + "warnings": [] + } + } + ``` + + 3. When you are done testing, you can stop the plugin by interrupting or closing the terminal window where the Docker container is running. + + + +## Deploy the plugin to your self-hosted UI + +In this section, you manually deploy the successfully-tested plugin for your users to add to their workflows' DAGs within your self-hosted Unstructured UI. +This section describes how to deploy the plugin from your local development machine directly into your existing container registry. + +In practice, you would typically use a CI/CD pipeline to automate deploying the plugin. + + + + In the file named `Makefile` in the root of the project, set the `IMAGE_REGISTRY` variable, replacing `REGISTRY_NAME_REPLACE_ME` with the name of your container registry. + + ```text + IMAGE_REGISTRY=REGISTRY_NAME_REPLACE_ME + ``` + + To get the name of your container registry if you do not already know it, run the command that is appropriate for your container registry. For example: + + - For AWS ECR, run the AWS CLI command [aws ecr describe-repositories](https://docs.aws.amazon.com/cli/latest/reference/ecr/describe-repositories.html) with the appropriate command-line options. + - For Azure Container Registry, run the Azure CLI command [az acr list](https://learn.microsoft.com/en-us/cli/azure/acr?view=azure-cli-latest#az-acr-list) with the appropriate command-line options. + - For GAR, run the Google Cloud CLI command [gcloud artifacts repositories list](https://cloud.google.com/sdk/gcloud/reference/artifacts/repositories/list) with the appropriate command-line options. + + The container registry name typically takes the following format: + + - For AWS ECR, `.dkr.ecr..amazonaws.com` + - For Azure Container Registry, `.azurecr.io` + - For GAR, `-docker.pkg.dev//` + + + + Set the following environment variables to the appropriate username and password for access to your container registry: + + - `PLUGIN_REGISTRY_USERNAME` + - `PLUGIN_REGISTRY_PASSWORD` + + For example: + + ```bash + # For macOS and Linux: + export PLUGIN_REGISTRY_USERNAME="" + + export PLUGIN_REGISTRY_PASSWORD="" + + # For Windows: + set PLUGIN_REGISTRY_USERNAME="" + + set PLUGIN_REGISTRY_PASSWORD="" + ``` + + In the preceding commands, for ``, run the command that is appropriate for your container registry. For example: + + - For AWS ECR, you do not run a separate login command here. + - For Azure Container Registry, run the Azure CLI command [az acr login](https://learn.microsoft.com/en-us/cli/azure/acr?view=azure-cli-latest#az-acr-login) with the appropriate command-line options. + - For GAR, run the Google Cloud CLI command [gcloud auth configure-docker](https://cloud.google.com/sdk/gcloud/reference/auth/configure-docker) with the appropriate command-line options. + + In the preceding commands, to get the value for ``, run the command that is appropriate for your container registry. For example: + + - For AWS ECR, run the AWS CLI command [aws ecr get-login-password](https://docs.aws.amazon.com/cli/latest/reference/ecr/get-login-password.html) with the appropriate command-line options. + - For Azure Container Registry, run the Azure CLI command [az acr credential show](https://learn.microsoft.com/en-us/cli/azure/acr/credential?view=azure-cli-latest#az-acr-credential-show) with the appropriate command-line options. + - For GAR, run the Google Cloud CLI command [gcloud auth print-access-token](https://cloud.google.com/sdk/gcloud/reference/auth/print-access-token) with the appropriate command-line options. + + + + Run the following commands, one command at a time, to build the plugin's container, deploy it to your container registry, and make the plugin + available for use in the staging version of your self-hosted Unstructured UI: + + ```bash + make docker-build + make docker-push + make publish-plugin + make promote-plugin-to-staging + ``` + + + + +## Test the plugin in your UI + + + + 1. Sign in to the staging version of your self-hosted Unstructured UI. + 2. Create a new workflow or open an existing workflow. + 3. In the workflow's visual DAG designer, click the `+` icon anywhere between a **Chunker** node and a **Destination** node, + and select **Plugins > Sentiment Analysis**. + 4. Click the **Sentiment Analysis** node to open its settings pane. + 5. In the settings pane, enter the required settings for the plugin. + For example, enter the location of the VertexAI API, the single-string version of the `credentials.json` file's contents for accessing the + VertexAI API, and the model for VertexAI to use. + 6. Run the workflow. + 7. When the workflow is finished, go to the destination location, and look for the `toxicity`, `emotion`, and `intent` values that the plugin adds to the `metadata` field for each element that Unstructured generated based on the source files' contents. + + + + If you need to make any changes to the plugin, you can do so by returning to the previous section titled [Write the plugin](#write-the-plugin). + + Make the necessary code changes and then: + + 1. [Run plugin tests locally with pytest](#run-plugin-tests-locally-with-pytest). + 2. [Run the plugin in Docker locally](#run-the-plugin-in-docker-locally). + 3. Increment the plugin's version number. To do this, in the project's `src/plugin_sentiment_analyis/__init__.py` file, + update the value of `version` in the `PLUGIN_MANIFEST` variable, for example from `0.0.1` to `0.0.2`. Then save this file. + 3. [Deploy the plugin again to the staging version of your self-hosted Unstructured UI](#deploy-the-plugin-to-your-self-hosted-ui). + 4. Test the updated plugin again in staging. + + Keep repeating this loop until you are satisfied with the plugin's performance in staging. + + + + After you have tested the plugin in your staging UI and are satisfied with its performance, you can promote it from staging to production. + To do this, run the following command: + + ```bash + make promote-plugin-to-production + ``` + + Of coursse, you should immediately sign in to the production version of your self-hosted Unstructured UI and test the plugin from there + there before you start advertising its availability to your users. + + + +Congratulations! You have successfully created, tested, and deployed your first custom plugin into your self-hosted Unstructured UI that your users +can now add to their workflow DAGs to unlock new capabilities and insights for their files and data! +