Skip to content

Commit 58577a9

Browse files
authored
General troubleshooters (#787)
1 parent d38c1a7 commit 58577a9

9 files changed

+280
-2
lines changed

docs.json

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -536,7 +536,14 @@
536536
"group": "Issues",
537537
"pages": [
538538
"support/issues/overview",
539-
"support/issues/api-connector-secrets"
539+
"support/issues/api-connector-secrets",
540+
"support/issues/authorization-permissions",
541+
"support/issues/configuration-resource",
542+
"support/issues/quota-billing-rate-limiting",
543+
"support/issues/network-connection-timeout",
544+
"support/issues/data-format-schema-validation",
545+
"support/issues/document-processing",
546+
"support/issues/internal-file-handling"
540547
]
541548
}
542549
]
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
---
2+
title: Authorization and permissions issues
3+
---
4+
5+
## Issues
6+
7+
When you try to connect Unstructured to a specific source or destination, you get one of the following error types:
8+
9+
- `PermissionError`: For example, for Amazon S3, an `Access Denied` error.
10+
- `ClientAuthenticationError`: For example, for Azure Blob Storage, an `AuthenticationFailed` or `Signature not valid` error.
11+
- `AuthError`: For example, for Dropbox, an `expired_access_token` error.
12+
- `ApiPermissionError`: For example, for Confluence, a `permission to view content` error.
13+
- `ClientRequestException`: For example, for OneDrive, an `AccessDenied` error.
14+
- `ValueError`: For example, for Outlook, an `invalid_grant` or `Conditional Access policies` error. For Google Drive, a `File not found related to auth` error.
15+
- `UserError`: For example, for Box, an `Access denied - insufficient permission` error.
16+
- `HttpResponseError`: For example, for Azure Blob Storage, an `AccountIsDisabled` error.
17+
18+
## Possible causes
19+
20+
Unstructured could not access the specified source or destination system due to invalid credentials, insufficient permissions, or restrictive policies. This error can still occur even though the related connector passed Unstructured's connection test.
21+
22+
Possible causes include:
23+
24+
- An incorrect API key, token, password, service account credential, tenant ID, or client ID was specified.
25+
- The specified credentials or access token has expired.
26+
- Insufficient permissions were granted; for example, read or list access is needed for indexing data, or write access is required for uploading data.
27+
- The source's or destination's associated account is disabled or inactive.
28+
- For Microsoft Entra ID, Conditional Access Policies are blocking the authentication flow.
29+
- The specified credentials might be valid, but an incorrect configuration points to a resource that is not authorized for those credentials.
30+
31+
## Possible solutions
32+
33+
- **Verify the credentials**: Double-check all authentication details—keys, secrets, tokens, usernames, passwords, and IDs—for typos and accuracy.
34+
- **Check the expiration**: Ensure that the specified token or key has not expired. Regenerate them if needed.
35+
- **Review permissions**: Confirm that the credentials have the required permissions for the operation—for example, `s3:ListBucket` and `s3:GetObject` for Amazon S3 indexing; `Files.Read.All` and `Sites.Read.All` for OneDrive or SharePoint; write permissions for upload destinations—and grant the necessary roles and permissions in the source and destination systems.
36+
- **Check the status of the account and resource**: Ensure that the user account or service principal is active, and the target resource—such as the Azure Storage Account—is enabled.
37+
- **Check the Conditional Access Policies (for Azure Blob Storage)**: Review the Microsoft Entra ID Conditional Access Policies. They might be blocking non-interactive sign-ins or require specific compliance. Adjust the policies or exclude the Unstructured service principal if appropriate and secure.
38+
- **Reconfigure the connector**: Delete and recreate the source or destination connector configuration in Unstructured with verified credentials.
39+
40+
## Additional resources
41+
42+
To ask questions or get additional help with this issue, see [requesting support](/support/request).
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
---
2+
title: Configuration and resource issues
3+
---
4+
5+
## Issues
6+
7+
When you try to connect Unstructured to a specific source or destination, you get one of the following error types:
8+
9+
- `FileNotFoundError`: For example, for Amazon S3, a `path not found` error.
10+
- `ClientRequestException`: For example, for OneDrive, an `itemNotFound` error.
11+
- `ValueError`: For example, for Google Drive, a `File not found` error. For Amazon S3, an `Invalid endpoint` error.
12+
- `UserError`: For example, for Azure Blob Storage, a `DeploymentNotFound` error.
13+
- `ParamValidationError`: For example, for Amazon S3, an `Invalid bucket name` error.
14+
- `EndpointResolutionError`: For example, for Amazon S3, a `Custom endpoint not valid URI` error.
15+
- `ProgrammingError`: For example, for Snowflake, a `No active warehouse selected` error.
16+
- `UnboundLocalError`: For example, for SharePoint, a `cannot access 'site_drive_item'` error.
17+
- `HTTPError`: For example, for Confluence, a `404 Not Found` error for a specific page or attachment URL.
18+
- `KeyError`: For example, for Jira, an error containing the word 'total'. For Amazon S3, an error containing the word `Key`.
19+
20+
## Possible causes
21+
22+
- Unstructured is configured to interact with a resource—such as a file, path, deployment, endpoint, or database object—that doesn't exist,
23+
is misnamed, or the configuration itself is invalid.
24+
- There is a typo in a bucket name, folder path, file ID, deployment name, hostname, site path, or database name.
25+
- The specified resource has been deleted or moved.
26+
- An endpoint URL is incorrectly formatted, for example, is missing `https://` or contains invalid characters.
27+
- An Amazon S3 bucket name is not formatted correctly.
28+
- A required configuration is missing in the source or destination connector, for example, there is no active Snowflake warehouse specified.
29+
- An Azure OpenAI deployment name is mismatched, failed, or does not exist.
30+
- An invalid URL is specified for an attachment or a link within a source document.
31+
- There is a specific configuration issue with a connector, for example, the specified SharePoint path does not lead to a valid drive.
32+
33+
## Possible solutions
34+
35+
- **Verify names and paths**: Carefully check all configured names, IDs, and paths—such as for buckets, folders, files, sites, deployments, and endpoint URLs—for accuracy. Ensure they exist in the source and destination system. Case sensitivity often matters.
36+
- **Check formatting**: Ensure that URLs, bucket names, and other parameters adhere to the required format.
37+
- **Verify that the resource exists**: Confirm that the target file, folder, deployment, or other resource exists and has not been moved or deleted.
38+
- **Check configuration dependencies**: Ensure the necessary configurations are set in the source or destination, for example, select and start a Snowflake warehouse by running the `USE WAREHOUSE` command first.
39+
- For **Azure OpenAI**: Double-check that the Deployment Name matches a successful deployment in your Azure portal.
40+
- For **Confluence**: If a `404` error occurs during download, check if the page or attachment link is valid within Confluence itself.
41+
- For **SharePoint**: Verify the Site Path leads to a valid location containing document libraries.
42+
- **Reconfigure the connector**: Review and, as needed, correct any misconfigured settings in the source or destination connector.
43+
44+
## Additional resources
45+
46+
To ask questions or get additional help with this issue, see [requesting support](/support/request).
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
---
2+
title: Data format, schema, and validation issues
3+
---
4+
5+
## Issues
6+
7+
When Unstructured tries to process or transfer data, you get one of the following error types:
8+
9+
- `ValidationError`: For example, `S3ConnectionConfig`, `PluginResponse`, or `filedata_meta input type` errors.
10+
- `ClientResponseError`: For example, an `HTTP 422 Unprocessable Entity` error.
11+
- `ListConversionException`: For example, for Pinecone, an `Expected list, got None` error.
12+
- `ServerOperationError`: For example, for Databricks, an `UNRESOLVED_COLUMN` error.
13+
- `WeaviateDeleteManyError`: For example, a `no such prop with name 'record_id'` error.
14+
- `CollectionInsertManyException`: For example, for Astra DB, a `vector dimension mismatch` error.
15+
- `TypeError`: For example, for Pinecone, a `PineconeUploader missing argument` error.
16+
- `UserError`: For example, a wrappingschema validation failure.
17+
18+
## Possible causes
19+
20+
- The data being processed or transferred doesn't match the expected structure, format, or validation rules.
21+
- The data provided in configuration or during processing is not valid.
22+
- The data being sent to a destination doesn't match the destination's schema—for example, the data has missing fields, wrong data types, or an incorrect number of vector dimensions.
23+
- Incorrect configuration values were provided—for example, a non-string token was provided where a string was expected instead.
24+
- There is a schema mismatch between the data generated by Unstructured and the destination schema—for example, missing columns or properties, or wrong data types.
25+
- The specified embedding model generates vectors of a different dimension than the destination index or collection is configured for.
26+
- The data was generated in an unexpected format.
27+
28+
## Possible solutions
29+
30+
- **Verify the data configuration**: Double-check configuration parameters against the relevant documentation for the correct data format and type.
31+
- **Verify the destination schema**: Ensure that the schema—including columns, properties, types, and vector dimensions—in the destination's system matches the data being sent. You might need to update or recreate the destination schema.
32+
- **Check the data fields**: Check if fields such as `record_id`, `element_id`, `text`, `embeddings`, and `metadata` are expected and present.
33+
- **Verify the embedding dimensions**: Confirm that the specified embedding model produces vectors of the dimension expected by the destination collection or index.
34+
- **Contact Unstructured Support**: For `HTTP 422 Unprocessable Entity` errors where the cause isn't clear, `ListConversionException` errors, `TypeError` errors, or persistent schema mismatch issues after verification, [contact Unstructured Support](/support/request).
35+
36+
## Additional resources
37+
38+
To ask questions or get additional help with this issue, see [requesting support](/support/request).
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
title: Document processing issues
3+
---
4+
5+
## Issues
6+
7+
When Unstructured tries to partition or chunk a document, you get one of the following error types:
8+
9+
- `TooManyPageFailuresException`
10+
- `ControllerException`: For example, an error wrapping partitioning or chunking errors such as code `512`.
11+
12+
## Possible causes
13+
14+
- There are issues with the document's underlying content or structure.
15+
- Unstructured encountered too many errors while processing individual pages or elements within a specific document, exceeding some internal failure threshold.
16+
- The document is corrupted or malformed. This is especially the case for some complex PDFs.
17+
- The document has a highly unusual underlying structure or has highly unusual content, which the specified partitioning model cannot handle.
18+
- The document is encrypted or password-protected.
19+
- Some underlying issues are causing page-level failures. This could sometimes be related to quotas if a per-page VLM is used.
20+
21+
## Possible solutions
22+
23+
- **Inspect the document**: Examine the specific document. Make sure it opens correctly, does not seem to look unusual, and does not appear to be corrupted or encrypted.
24+
- **Test a simpler document**: Try processing a known-good, simple document of the same type, to see if the error is document-specific.
25+
- **Check quotas (if VLM related)**: If the underlying errors mention quotas, see [Quota, billing, and rate limiting issues](/support/issues/quota-billing-rate-limiting).
26+
- **Report the problematic document to Unstructured Support**: If the issue seems specific to an otherwise seemingly valid document, [report it to Unstructured Support](/support/request). If possible, provide the document for Unstructured to further investigate.
27+
28+
## Additional resources
29+
30+
To ask questions or get additional help with this issue, see [requesting support](/support/request).
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
title: Internal issues
3+
---
4+
5+
## Issues
6+
7+
Unstructured returns one of the following error types:
8+
9+
- `NotFoundException`: For example, an error retrieving intermediate file data from internal storage, with a pattern such as `/f8c6.../element_dicts/...json`.
10+
- `FileNotFoundError`: For example, an error with internal system paths that use a pattern such as `/home/etl/node/staged/...` or `/home/etl/node/downloads/...`.
11+
- `ValueError`: For example, a `Failed to decrypt secret` error.
12+
- `AttributeError`: For example, a `'TimeoutError' object has no attribute 'status_code'` error.
13+
- `asyncio.exceptions.CancelledError`
14+
15+
## Possible causes
16+
17+
- There is an error in a previous workflow or ingestion pipeline stage within Unstructured's environment that is preventing intermediate files from being created correctly.
18+
- There are file system or storage issues within Unstructured's environment.
19+
- There are problems with Unstructured's internal management of secret keys or configurations.
20+
- There are issues with Unstructured's internal error handling or service lifecycle management.
21+
- There are some other general issues with Unstructured's internal workings, often related to managing temporary files or internal state.
22+
23+
## Possible solutions
24+
25+
- **Retry the job**: Retry the entire job, as the issue might have been temporary.
26+
- **Contact Unstructured Support**: These issues are generally fixable only by Unstructured. These issues typically indicate internal issues that customers cannot fix themselves.
27+
Be sure when you [contact Unstructured Support](/support/request) to include the full error message, timestamp, and job details.
28+
29+
## Additional resources
30+
31+
To ask questions or get additional help with this issue, see [requesting support](/support/request).
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: Network, connection, and timeout issues
3+
---
4+
5+
## Issues
6+
7+
When you try to establish or maintain a network connection between Unstructured and an external system, you get one of the following error types:
8+
9+
- `ReadTimeout`
10+
- `TimeoutError`
11+
- `ServerDisconnectedError`
12+
- `ClientConnectorError`: For example, a `cannot connect to localhost` error.
13+
- `ConnectionError`: For example, a `Failed to resolve 'model-registry-api...` error.
14+
- `ServiceResponseError`: For example, for Azure, a `Timeout on reading data from socket` error.
15+
- `ClientPayloadError`: For example, a `Response payload not completed` or `Connection reset by peer` error.
16+
- `OSError`: For example, for Amazon S3 during a file upload, an `[Errno 22] The request body terminated unexpectedly` error.
17+
18+
## Possible causes
19+
20+
- Unstructured failed to connect to a required system.
21+
- The connection timed out while waiting for a response.
22+
- An established connection was unexpectedly closed.
23+
- Transient network issues were encountered with the local network, with the Internet, or with a cloud provider.
24+
- Firewalls are blocking connections.
25+
- Some services are temporarily unavailable, slow, or unresponsive.
26+
- The processing of very large files is causing long-running operations, which lead to exceeding default timeouts.
27+
- Issues with DNS resolutions were encountered.
28+
29+
## Possible solutions
30+
31+
- **Retry the job**: Many network errors are transient. Wait a few minutes and then retry the job.
32+
- **Check network connectivity**: Ensure stable network connectivity with the related source and destination systems.
33+
- **Check firewalls**: Verify that any firewalls are not blocking necessary connections to source or destination APIs, blob storage, databases, vector stores, or ports.
34+
- **Process smaller batches or files**: If timeouts occur during processing or uploading very large files, try using smaller files or batches.
35+
- **Contact Unstructured Support**: For persistent `ClientConnectorError`, `ConnectionError`, or other frequent timeouts or disconnects that aren't resolved by retries, [contact Unstructured Support](/support/request). These issues can sometimes point to internal Unstructured system problems.
36+
37+
## Additional resources
38+
39+
- [Unstructured Status](https://status.unstructured.io) dashboard
40+
- [Request Unstructured support](/support/request)

support/issues/overview.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Each issue contains the following sections:
1111
there might be multiple causes, or the causes might be indirect or dependent on other factors. In these cases, the title will be
1212
labeled as **Possible causes** instead.
1313
- **Solution**: A solution to the issue. In some cases, there might be solutions that are dependent on other factors. In these cases,
14-
the title will be labelled as **Possible solutions** instead.
14+
the title will be labeled as **Possible solutions** instead.
1515
- **Additional resources**: Any usage notes, links, or other secondary information that might also help you to resolve the issue.
1616

1717
To ask questions or get additional help with these issues, see [requesting support](/support/request).

0 commit comments

Comments
 (0)