A tool for extracting clean article text from web pages.
Designed for AI pipelines that need structured article content for summarization and embeddings.
extract title
extract main article text
remove navigation and ads
normalize formatting
return structured output
LLM summarization pipelines
knowledge ingestion systems
content indexing
research automation