FastAPI-based service for extracting structured data (tables, infoboxes, citations, etc.) from Wikipedia articles.
- Install dependencies:
pip install -r requirements.txtStart the FastAPI server:
uvicorn Application:application --reload --host 0.0.0.0 --port 8000The API will be available at:
- API: http://localhost:8000
- Interactive API docs: http://localhost:8000/docs
- Alternative docs: http://localhost:8000/redoc
Run the client script to interact with the API:
python client.pyThe client script includes:
- Example requests for extracting tables from Wikipedia articles
- Interactive mode for testing different articles and languages
Extract all tables from a Wikipedia article.
Request Body:
{
"page_title": "Python (programming language)",
"language": "english"
}Response:
{
"article_title": "Python (programming language)",
"language": "english",
"number_of_tables": 2,
"tables": [
{
"table_id": "table_1",
"headers": ["Column1", "Column2"],
"rows": [["Value1", "Value2"]],
"caption": "Table caption"
}
],
"table_details": {
"1": "5 rows and 3 columns",
"2": "10 rows and 4 columns"
}
}Extract infobox from Wikipedia article (JSON input).
Parse tables from article JSON (legacy endpoint).
Extract citations from Wikipedia article.
Extract headers from Wikipedia article.
import requests
response = requests.post(
"http://localhost:8000/extract-tables",
json={
"page_title": "Python (programming language)",
"language": "english"
}
)
print(response.json())curl -X POST "http://localhost:8000/extract-tables" \
-H "Content-Type: application/json" \
-d '{"page_title": "Python (programming language)", "language": "english"}'- english (en)
- spanish (es)
- german (de)
- dutch (nl)
- Or any ISO 639-1 two-letter language code