Skip to content

TroodInc/article-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Article Extractor

A tool for extracting clean article text from web pages.

Designed for AI pipelines that need structured article content for summarization and embeddings.


Features

extract title

extract main article text

remove navigation and ads

normalize formatting

return structured output


Use cases

LLM summarization pipelines

knowledge ingestion systems

content indexing

research automation

About

A tool for extracting clean article text from web pages. Designed for AI pipelines that need structured article content for summarization and embeddings.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors