Skip to content

Commit ef038ad

Browse files
committed
add section for configuring document limits in README
1 parent 840ca7d commit ef038ad

2 files changed

Lines changed: 306 additions & 0 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ All work produced is open source. More information can be found in the GitHub re
4040
| [Testing Your PDF Accessibility Solution](#testing-your-pdf-accessibility-solution) | User guide for the working solution |
4141
| [PDF-to-PDF Remediation Solution](#pdf-to-pdf-remediation-solution) | PDF format preservation solution details |
4242
| [PDF-to-HTML Remediation Solution](#pdf-to-html-remediation-solution) | HTML conversion solution details |
43+
| [Configuring Limits](docs/CONFIGURING_LIMITS.md) | How to modify document limits, quotas, and defaults |
4344
| [Monitoring](#monitoring) | System monitoring and observability |
4445
| [Troubleshooting](#troubleshooting) | Common issues and solutions |
4546
| [Contributing](#contributing) | How to contribute to the project |

docs/CONFIGURING_LIMITS.md

Lines changed: 305 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,305 @@
1+
# Configuring Document Limits and Defaults
2+
3+
This guide explains the configurable limits across the PDF Accessibility solution and how to modify them. Limits exist at two levels:
4+
5+
1. **User-Facing Limits (UI)** — Per-user quotas for file uploads, page counts, and file size, managed through Cognito custom attributes in the [PDF_accessability_UI](https://github.com/ASUCICREPO/PDF_accessability_UI) repository.
6+
2. **Infrastructure Limits (Backend)** — Resource-level settings such as Lambda timeouts, memory, chunk sizes, and concurrency, managed in this repository.
7+
8+
---
9+
10+
## Table of Contents
11+
12+
| Section | Description |
13+
|---|---|
14+
| [User-Facing Limits (UI)](#user-facing-limits-ui) | File upload quotas, page limits, and size limits per user |
15+
| [Modifying User Limits via Cognito Console](#modifying-user-limits-via-the-cognito-console) | How to change limits for an individual user |
16+
| [Modifying Default Group Limits in Code](#modifying-default-group-limits-in-code) | How to change the defaults assigned to new users |
17+
| [Infrastructure Limits (Backend)](#infrastructure-limits-backend) | Lambda, ECS, Step Functions, and processing settings |
18+
| [PDF-to-HTML Processing Defaults](#pdf-to-html-processing-defaults) | Configuration defaults for the PDF-to-HTML pipeline |
19+
20+
---
21+
22+
## User-Facing Limits (UI)
23+
24+
When the [PDF Accessibility UI](https://github.com/ASUCICREPO/PDF_accessability_UI) is deployed, each user is assigned limits via Cognito custom attributes. These limits control what users can upload through the web interface.
25+
26+
### Custom Cognito Attributes
27+
28+
| Attribute | Description | Default (DefaultUsers) |
29+
|---|---|---|
30+
| `custom:max_files_allowed` | Maximum number of files a user can upload | `8` |
31+
| `custom:max_pages_allowed` | Maximum number of pages per PDF | `10` |
32+
| `custom:max_size_allowed_MB` | Maximum file size in MB | `25` |
33+
| `custom:total_files_uploaded` | Current upload count (tracked automatically) | `0` |
34+
35+
### Default Limits by User Group
36+
37+
The UI creates three Cognito user groups, each with different default limits:
38+
39+
| Attribute | DefaultUsers | AmazonUsers | AdminUsers |
40+
|---|---|---|---|
41+
| `max_files_allowed` | 8 | 15 | 100 |
42+
| `max_pages_allowed` | 10 | 10 | 2500 |
43+
| `max_size_allowed_MB` | 25 | 25 | 1000 |
44+
45+
These defaults are set when a user first signs up and is automatically assigned to a group. Users with an `@amazon.com` email are assigned to **AmazonUsers**; all others go to **DefaultUsers**. Administrators can move users to **AdminUsers** manually through the Cognito console.
46+
47+
---
48+
49+
## Modifying User Limits via the Cognito Console
50+
51+
To change limits for a **specific user** without redeploying:
52+
53+
1. Open the [Amazon Cognito Console](https://console.aws.amazon.com/cognito/).
54+
2. Select the user pool named **`PDF-Accessability-User-Pool`**.
55+
3. Navigate to **Users** and search for the user by email or username.
56+
4. Select the user and scroll to **User attributes**.
57+
5. Click **Edit** and modify any of the following attributes:
58+
- `custom:max_files_allowed` — Set the new file upload limit
59+
- `custom:max_pages_allowed` — Set the new page limit per PDF
60+
- `custom:max_size_allowed_MB` — Set the new file size limit in MB
61+
- `custom:total_files_uploaded` — Reset to `0` to restore a user's quota
62+
6. Click **Save changes**.
63+
64+
The updated limits take effect immediately on the user's next upload attempt.
65+
66+
> **Note:** Changing a user's group membership (e.g., moving them from DefaultUsers to AdminUsers) will automatically apply that group's default limits via an EventBridge-triggered Lambda function.
67+
68+
---
69+
70+
## Modifying Default Group Limits in Code
71+
72+
To change the **default limits** that are assigned to all new users, you need to update two Lambda functions in the [PDF_accessability_UI](https://github.com/ASUCICREPO/PDF_accessability_UI) repository and redeploy.
73+
74+
### File 1: Post-Confirmation Lambda
75+
76+
**Path:** `cdk_backend/lambda/postConfirmation/index.py`
77+
78+
This Lambda runs when a new user signs up and sets their initial attributes. Edit the `group_attributes` dictionary:
79+
80+
```python
81+
group_attributes = {
82+
DEFAULT_GROUP: {
83+
'custom:first_sign_in': 'true',
84+
'custom:total_files_uploaded': '0',
85+
'custom:max_files_allowed': '8', # Change this value
86+
'custom:max_pages_allowed': '10', # Change this value
87+
'custom:max_size_allowed_MB': '25' # Change this value
88+
},
89+
AMAZON_GROUP: {
90+
'custom:first_sign_in': 'true',
91+
'custom:total_files_uploaded': '0',
92+
'custom:max_files_allowed': '15', # Change this value
93+
'custom:max_pages_allowed': '10', # Change this value
94+
'custom:max_size_allowed_MB': '25' # Change this value
95+
},
96+
ADMIN_GROUP: {
97+
'custom:first_sign_in': 'true',
98+
'custom:total_files_uploaded': '0',
99+
'custom:max_files_allowed': '100', # Change this value
100+
'custom:max_pages_allowed': '2500', # Change this value
101+
'custom:max_size_allowed_MB': '1000' # Change this value
102+
}
103+
}
104+
```
105+
106+
### File 2: Update Attributes Groups Lambda
107+
108+
**Path:** `cdk_backend/lambda/UpdateAttributesGroups/index.py`
109+
110+
This Lambda runs when a user is moved between groups (via EventBridge) and applies the new group's limits. Edit the `GROUP_LIMITS` dictionary:
111+
112+
```python
113+
GROUP_LIMITS = {
114+
'DefaultUsers': {
115+
'custom:max_files_allowed': '3', # Change this value
116+
'custom:max_pages_allowed': '10', # Change this value
117+
'custom:max_size_allowed_MB': '25' # Change this value
118+
},
119+
'AmazonUsers': {
120+
'custom:max_files_allowed': '5', # Change this value
121+
'custom:max_pages_allowed': '10', # Change this value
122+
'custom:max_size_allowed_MB': '25' # Change this value
123+
},
124+
'AdminUsers': {
125+
'custom:max_files_allowed': '500', # Change this value
126+
'custom:max_pages_allowed': '1500', # Change this value
127+
'custom:max_size_allowed_MB': '1000' # Change this value
128+
}
129+
}
130+
```
131+
132+
> **Important:** Make sure the values in both files are consistent for each group. After editing, redeploy the UI stack for changes to take effect. Only **newly registered users** or **users whose group changes** will receive the updated defaults. Existing users retain their current attribute values unless manually updated via the Cognito console.
133+
134+
### Redeploying After Changes
135+
136+
From the `PDF_accessability_UI` repository root:
137+
138+
```bash
139+
cd cdk_backend
140+
npx cdk deploy
141+
```
142+
143+
Or re-run the deployment script:
144+
145+
```bash
146+
chmod +x deploy.sh
147+
./deploy.sh
148+
```
149+
150+
---
151+
152+
## Infrastructure Limits (Backend)
153+
154+
The following limits are configured in this repository's infrastructure code (`app.py`) and affect processing capacity. Modifying these requires a redeployment of the backend stack.
155+
156+
### Lambda Function Limits
157+
158+
| Lambda Function | Timeout | Memory | File |
159+
|---|---|---|---|
160+
| PDF Splitter | 900s (15 min) | 1024 MB | `app.py` |
161+
| PDF Merger | 900s (15 min) | 1024 MB | `app.py` |
162+
| Title Generator | 900s (15 min) | 1024 MB | `app.py` |
163+
| Pre-Remediation Checker | 900s (15 min) | 512 MB | `app.py` |
164+
| Post-Remediation Checker | 900s (15 min) | 512 MB | `app.py` |
165+
166+
To modify, edit the `timeout` and `memory_size` parameters in `app.py`. For example:
167+
168+
```python
169+
pdf_splitter_lambda = lambda_.Function(
170+
self, 'PdfChunkSplitterLambda',
171+
runtime=lambda_.Runtime.PYTHON_3_12,
172+
handler='main.lambda_handler',
173+
code=lambda_.Code.from_docker_build("lambda/pdf-splitter-lambda"),
174+
timeout=Duration.seconds(900), # Maximum is 900 seconds (15 min)
175+
memory_size=1024 # In MB, range: 128–10240
176+
)
177+
```
178+
179+
### ECS Task Limits
180+
181+
| Task | Memory | CPU | File |
182+
|---|---|---|---|
183+
| Adobe AutoTag | 1024 MiB | 256 (0.25 vCPU) | `app.py` |
184+
| Alt Text Generator | 1024 MiB | 256 (0.25 vCPU) | `app.py` |
185+
186+
To modify, edit the `memory_limit_mib` and `cpu` parameters in `app.py`:
187+
188+
```python
189+
adobe_autotag_task_def = ecs.FargateTaskDefinition(
190+
self, "AdobeAutotagTaskDefinition",
191+
memory_limit_mib=1024, # Supported values: 512, 1024, 2048, 4096, ...
192+
cpu=256, # Supported values: 256, 512, 1024, 2048, 4096
193+
...
194+
)
195+
```
196+
197+
### Step Functions Limits
198+
199+
| Setting | Value | File |
200+
|---|---|---|
201+
| State Machine Timeout | 150 minutes | `app.py` |
202+
| Map State Max Concurrency | 100 | `app.py` |
203+
204+
To modify:
205+
206+
```python
207+
# State Machine overall timeout
208+
pdf_remediation_state_machine = sfn.StateMachine(
209+
self, "PdfAccessibilityRemediationWorkflow",
210+
definition=parallel_accessibility_workflow,
211+
timeout=Duration.minutes(150), # Change this value
212+
...
213+
)
214+
215+
# Maximum parallel chunk processing
216+
pdf_chunks_map_state = sfn.Map(
217+
self, "ProcessPdfChunksInParallel",
218+
max_concurrency=100, # Change this value
219+
...
220+
)
221+
```
222+
223+
### PDF Chunk Size (Pages Per Chunk)
224+
225+
The PDF splitter Lambda splits uploaded PDFs into chunks for parallel processing. The number of pages per chunk is set in `lambda/pdf-splitter-lambda/main.py`:
226+
227+
```python
228+
# Line 146 in lambda/pdf-splitter-lambda/main.py
229+
chunks = split_pdf_into_pages(pdf_file_content, pdf_file_key, s3, bucket_name, 200)
230+
```
231+
232+
The last argument (`200`) is the number of pages per chunk. To process in smaller or larger batches, change this value.
233+
234+
### Image Size Limit
235+
236+
The maximum image size for Bedrock model invocation is set in `pdf2html/content_accessibility_utility_on_aws/remediate/services/bedrock_client.py`:
237+
238+
```python
239+
MAX_IMAGE_SIZE = 4_000_000 # 4 MB — maximum allowed image size in bytes
240+
```
241+
242+
Images exceeding this limit are automatically resized before being sent to Bedrock.
243+
244+
### Redeploying After Infrastructure Changes
245+
246+
After modifying any values in `app.py` or Lambda source code, redeploy the backend:
247+
248+
```bash
249+
cdk deploy
250+
```
251+
252+
Or re-run the deployment script:
253+
254+
```bash
255+
chmod +x deploy.sh
256+
./deploy.sh
257+
```
258+
259+
---
260+
261+
## PDF-to-HTML Processing Defaults
262+
263+
The PDF-to-HTML pipeline has its own set of configurable defaults defined in `pdf2html/content_accessibility_utility_on_aws/utils/config_defaults.yaml`:
264+
265+
```yaml
266+
pdf:
267+
extract_images: true
268+
image_format: "png"
269+
embed_fonts: false
270+
single_file: false
271+
continuous: true
272+
embed_images: false
273+
exclude_images: false
274+
cleanup_bda_output: false
275+
276+
audit:
277+
severity_threshold: "minor"
278+
detailed_context: true
279+
skip_automated_checks: false
280+
281+
remediate:
282+
severity_threshold: "minor"
283+
model_id: "us.amazon.nova-lite-v1:0"
284+
285+
aws:
286+
region: null
287+
create_bda_project: false
288+
```
289+
290+
These defaults can be overridden in three ways (in order of precedence, highest first):
291+
292+
1. **Command-line arguments** — When using the CLI directly
293+
2. **Configuration file** — Pass a custom YAML file with `--config my-config.yaml`
294+
3. **Environment variables** — Prefix with `DOC_ACCESS_` (e.g., `DOC_ACCESS_PDF_IMAGE_FORMAT=jpg`)
295+
296+
For details on CLI options and configuration file format, see the [pdf2html README](../pdf2html/README.md#configuration).
297+
298+
---
299+
300+
## Support
301+
302+
For questions or assistance with configuration:
303+
304+
- **Email**: ai-cic@amazon.com
305+
- **Issues**: [GitHub Issues](https://github.com/ASUCICREPO/PDF_Accessibility/issues)

0 commit comments

Comments
 (0)