Skip to content

Commit a613ac8

Browse files
committed
first commit
0 parents  commit a613ac8

File tree

6 files changed

+221
-0
lines changed

6 files changed

+221
-0
lines changed

.github/workflows/docker-ghcr.yml

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: Build and Push to GHCR
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
8+
permissions:
9+
contents: read
10+
packages: write
11+
actions: read
12+
13+
jobs:
14+
build-and-push:
15+
runs-on: ubuntu-latest
16+
17+
steps:
18+
- name: Checkout repository
19+
uses: actions/checkout@v3
20+
21+
- name: Log in to GitHub Container Registry
22+
uses: docker/login-action@v2
23+
with:
24+
registry: ghcr.io
25+
username: ${{ github.actor }}
26+
password: ${{ secrets.GITHUB_TOKEN }}
27+
28+
- name: Build and push Docker image
29+
uses: docker/build-push-action@v3
30+
with:
31+
context: .
32+
file: ./Dockerfile
33+
push: true
34+
tags: ghcr.io/${{ github.repository }}:latest
35+

Dockerfile

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
FROM python:3.12-slim
2+
3+
WORKDIR /app
4+
5+
# Install system dependencies that might be required by MarkItDown
6+
RUN apt-get update && apt-get install -y \
7+
build-essential \
8+
&& rm -rf /var/lib/apt/lists/*
9+
10+
# Copy requirements first for better caching
11+
COPY requirements.txt .
12+
13+
# Install Python dependencies
14+
RUN pip install --no-cache-dir -r requirements.txt
15+
16+
# Copy application code
17+
COPY app.py .
18+
19+
# Expose the port the app runs on
20+
EXPOSE 5000
21+
22+
# Run the application
23+
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "app:app"]

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2025 Bitovi
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+80
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# MarkItDown HTTP API Wrapper
2+
3+
This is a simple HTTP API wrapper for the MarkItDown package that allows you to convert various document formats to text using a RESTful API.
4+
5+
## Files Included
6+
7+
- `app.py`: The Flask application that provides the HTTP API
8+
- `requirements.txt`: Python dependencies
9+
- `Dockerfile`: Instructions for building the Docker image
10+
11+
## Getting Started
12+
13+
### Running with docker-compose.yml
14+
15+
```yaml
16+
version: "3.8"
17+
18+
services:
19+
markitdown_api:
20+
image: ghcr.io/bitovi/markitdown_api:latest
21+
container_name: markitdown_api
22+
ports:
23+
- "5000:5000"
24+
```
25+
26+
```sh
27+
docker compose up --build -d
28+
```
29+
30+
The service will be available at <http://localhost:5000>
31+
32+
### API Endpoints
33+
34+
#### Health Check
35+
36+
```http
37+
GET /health
38+
```
39+
40+
Returns a 200 OK response if the service is running correctly.
41+
42+
#### Convert a File
43+
44+
```http
45+
POST /convert
46+
```
47+
48+
Parameters:
49+
50+
- `file`: The file to convert (multipart/form-data)
51+
52+
Example usage with curl:
53+
54+
```bash
55+
curl -X POST -F "[email protected]" -H "Content-Type: multipart/form-data" -H "Accept: application/json" http://localhost:5000/convert
56+
```
57+
58+
Example response:
59+
60+
```json
61+
{
62+
"markdown": "Extracted text from the document..."
63+
}
64+
```
65+
66+
## Development
67+
68+
To modify the application:
69+
70+
1. Edit the `app.py` file as needed
71+
2. Rebuild and restart the Docker container:
72+
73+
```bash
74+
docker compose down
75+
docker compose up --build -d
76+
```
77+
78+
## Extending the API
79+
80+
You can extend the API by adding more endpoints to `app.py` or by adding more features to the existing endpoint.

app.py

+59
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
from flask import Flask, request, jsonify
2+
import os
3+
import tempfile
4+
from markitdown import MarkItDown
5+
from werkzeug.utils import secure_filename
6+
7+
app = Flask(__name__)
8+
9+
@app.route('/health', methods=['GET'])
10+
def health_check():
11+
return jsonify({"status": "healthy"}), 200
12+
13+
@app.route('/convert', methods=['POST'])
14+
def convert_file():
15+
# Check if file is present in the request
16+
if 'file' not in request.files:
17+
return jsonify({"error": "No file provided"}), 400
18+
19+
file = request.files['file']
20+
21+
# Check if the file has a name
22+
if file.filename == '':
23+
return jsonify({"error": "No file selected"}), 400
24+
25+
# Create a temporary file to save the uploaded file
26+
temp_dir = tempfile.mkdtemp()
27+
file_path = os.path.join(temp_dir, secure_filename(file.filename))
28+
29+
try:
30+
# Save the file temporarily
31+
file.save(file_path)
32+
33+
# Process with MarkItDown
34+
md = MarkItDown()
35+
result = md.convert(file_path)
36+
37+
# Prepare the response
38+
response = {
39+
"markdown": result.text_content
40+
}
41+
42+
# Add any other relevant data from the result object if needed
43+
# Example: if result has metadata, add it to the response
44+
if hasattr(result, 'metadata'):
45+
response["metadata"] = result.metadata
46+
47+
return jsonify(response), 200
48+
49+
except Exception as e:
50+
return jsonify({"error": str(e)}), 500
51+
52+
finally:
53+
# Clean up the temporary file
54+
if os.path.exists(file_path):
55+
os.remove(file_path)
56+
57+
# Clean up the temporary directory
58+
if os.path.exists(temp_dir):
59+
os.rmdir(temp_dir)

requirements.txt

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
flask>=2.3.3
2+
markitdown
3+
gunicorn>=20.1.0

0 commit comments

Comments
 (0)