Skip to content

Commit 85a7abe

Browse files
authored
Add Modal example for marker deployment
2 parents 1f62954 + 25a6ace commit 85a7abe

2 files changed

Lines changed: 491 additions & 0 deletions

File tree

examples/README_MODAL.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
## Usage Examples
2+
3+
This directory contains examples of running `marker` in different contexts.
4+
5+
### Usage with Modal
6+
7+
We have a [self-contained example](./marker_modal_deployment.py) that shows how you can quickly use [Modal](https://modal.com) to deploy `marker` by provisioning a container with a GPU, and expose that with an API so you can submit PDFs for conversion into Markdown, HTML, or JSON.
8+
9+
It's a limited example that you can extend into different use cases.
10+
11+
#### Pre-requisites
12+
13+
Make sure you have the `modal` client installed by [following their instructions here](https://modal.com/docs/guide#getting-started).
14+
15+
Modal's [Starter Plan](https://modal.com/pricing) includes $30 of free compute each month.
16+
Modal is [serverless](https://arxiv.org/abs/1902.03383), so you only pay for resources when you are using them.
17+
18+
#### Running the example
19+
20+
Once `modal` is configured, you can deploy it to your workspace by running:
21+
22+
> modal deploy marker_modal_deployment.py
23+
24+
Notes:
25+
- `marker` has a few models it uses. By default, the endpoint will check if these models are loaded and download them if not (first request will be slow). You can avoid this by running
26+
27+
> modal run marker_modal_deployment.py::download_models
28+
29+
Which will create a [`Modal Volume`](https://modal.com/docs/guide/Volumes) to store them for re-use.
30+
31+
Once the deploy is finished, you can:
32+
- Test a file upload locally through your CLI using an `invoke_conversion` command we expose through Modal's [`local_entrypoint`](https://modal.com/docs/reference/modal.App#local_entrypoint)
33+
- Get the URL of your endpoint and make a request through a client of your choice.
34+
35+
**Test from your CLI with `invoke_conversion`**
36+
37+
If your endpoint is live, simply run this command:
38+
39+
```
40+
$ modal run marker_modal_deployment.py::invoke_conversion --pdf-file <PDF_FILE_PATH> --output-format markdown
41+
```
42+
43+
And it'll automatically detect the URL of your new endpoint using [`.get_web_url()`](https://modal.com/docs/guide/webhook-urls#determine-the-url-of-a-web-endpoint-from-code), make sure it's healthy, submit your file, and store its output on your machine (in the same directory).
44+
45+
**Making a request using your own client**
46+
47+
If you want to make requests elsewhere e.g. with cURL or a client like Insomnia, you'll need to get the URL.
48+
49+
When your `modal deploy` command from earlier finishes, it'll include your endpoint URL at the end. For example:
50+
51+
```
52+
$ modal deploy marker_modal_deployment.py
53+
...
54+
✓ Created objects.
55+
├── 🔨 Created mount /marker/examples/marker_modal_deployment.py
56+
├── 🔨 Created function download_models.
57+
├── 🔨 Created function MarkerModalDemoService.*.
58+
└── 🔨 Created web endpoint for MarkerModalDemoService.fastapi_app => <YOUR_ENDPOINT_URL>
59+
✓ App deployed in 149.877s! 🎉
60+
```
61+
62+
If you accidentally close your terminal session, you can also always go into Modal's dashboard and:
63+
- Find the app (default name: `datalab-marker-modal-demo`)
64+
- Click on `MarkerModalDemoService`
65+
- Find your endpoint URL
66+
67+
Once you have your URL, make a request to `{YOUR_ENDPOINT_URL}/convert` like this (you can also use Insomnia, etc.):
68+
```
69+
curl --request POST \
70+
--url {BASE_URL}/convert \
71+
--header 'Content-Type: multipart/form-data' \
72+
--form file=@/Users/cooldev/sample.pdf \
73+
--form output_format=html
74+
```
75+
76+
You should get a response like this
77+
78+
```
79+
{
80+
"success": true,
81+
"filename": "sample.pdf",
82+
"output_format": "html",
83+
"json": null,
84+
"html": "<YOUR_RESPONSE_CONTENT>",
85+
"markdown": null,
86+
"images": {},
87+
"metadata": {... page level metadata ...},
88+
"page_count": 2
89+
}
90+
```
91+
92+
[Modal](https://modal.com) makes deploying and scaling models and inference workloads much easier.
93+
94+
If you're interested in Datalab's managed API or on-prem document intelligence solution, check out [our platform here](https://datalab.to/?utm_source=gh-marker).

0 commit comments

Comments
 (0)