Lemon8 Feeds Scraper collects posts, images, videos, comments, and engagement analytics from Lemon8 feeds across multiple categories and regions. It’s built for teams that need reliable, repeatable feed intelligence for research, trend tracking, and content monitoring—without manual scrolling and copying. Use Lemon8 Feeds Scraper to turn fast-moving feed data into structured datasets you can analyze and automate.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for lemon8-feeds-scraper you've just found your team — Let’s Chat. 👆👆
This project extracts structured feed data from Lemon8, including post metadata, media assets, comment threads, and post-level statistics. It solves the problem of capturing large, scroll-based feeds and converting them into consistent, machine-readable output for analysis. It’s for developers, analysts, and growth teams who want searchable, exportable feed data for reporting, monitoring, and downstream pipelines.
- Supports 22 feed categories (IDs 0–21) to target specific content verticals (e.g., Food, Fashion, Tech, Education).
- Works across 10+ regions using region codes to localize results (e.g., us, au, jp, th, sg, ca).
- Handles infinite scrolling behavior to collect large volumes of posts beyond initial page loads.
- Extracts post-level analytics (likes, saves, comments) for trend scoring and performance comparisons.
- Optionally fetches full post details and deep comment threads (including replies) for richer analysis.
| Feature | Description |
|---|---|
| 22 Feed Categories | Target specific feed categories using category (0–21) for focused data collection. |
| 10+ Regions | Localize scraping with region codes to capture regional content and trends. |
| Infinite Scrolling Capture | Continuously scrolls and collects posts until limits are reached or content is exhausted. |
| Full Post Data | Extracts titles, captions/content previews, hashtags, author metadata, URLs, and media flags. |
| Post Analytics | Captures key engagement stats (likes, saves, comments) for performance tracking. |
| Comment Extraction | Pulls comment threads including replies for sentiment, themes, and community insights. |
| Detail Fetch Mode | getDetails enables deeper post extraction; detailsLimit controls how many posts get full details. |
| Media Downloads to KVS | Optional saving of images/videos via saveImages and saveVideos. |
| Anti-Bot Strategy | Uses a stealth-capable fetching approach to reduce blocks and improve stability. |
| Proxy Support | Accepts an optional proxy configuration for higher success rates at scale. |
| Field Name | Field Description |
|---|---|
| posts | Array of extracted posts from the selected feed. |
| posts[].id | Unique post identifier. |
| posts[].author | Author object for the post. |
| posts[].author.name | Display name of the author. |
| posts[].author.profileUrl | Link to the author profile. |
| posts[].author.profileImageUrl | URL to the author avatar/profile image. |
| posts[].title | Post title (when available). |
| posts[].content | Content preview/caption snippet. |
| posts[].postUrl | Direct URL to the post. |
| posts[].statistics | Engagement metrics object for the post. |
| posts[].statistics.savedCount | Number of saves/bookmarks for the post. |
| posts[].statistics.likesCount | Number of likes for the post. |
| posts[].statistics.commentsCount | Number of comments for the post. |
| posts[].images | Array of extracted image URLs / metadata. |
| posts[].isVideo | Boolean indicating whether the post contains video. |
| posts[].category | Category name (e.g., "Food"). |
| posts[].categoryId | Category ID used for extraction (0–21). |
| posts[].details | Optional deep details object when getDetails=true. |
| posts[].allComments | Optional list of comments (and replies) when comment extraction is enabled. |
| posts[].commentStats | Optional derived comment metrics (counts, reply depth, etc.). |
| metadata | Run-level metadata about what was collected and how. |
| metadata.feedsUrl | Feed URL used for extraction. |
| metadata.category | Category name used for the run. |
| metadata.categoryId | Category ID used for the run. |
| metadata.region | Region code used for the run. |
| metadata.totalScraped | Total number of posts collected. |
| metadata.scrollsPerformed | Number of scrolling cycles executed. |
| metadata.videoPostsFound | Count of video posts detected during extraction. |
| metadata.detailedPostsScraped | Number of posts fully expanded via detail mode. |
{
"posts": [
{
"id": "7412987407534162437",
"author": {
"name": "Author Name",
"profileUrl": "https://...",
"profileImageUrl": "https://..."
},
"title": "Post Title",
"content": "Content preview...",
"postUrl": "https://...",
"statistics": {
"savedCount": "0",
"likesCount": "6437",
"commentsCount": "0"
},
"images": [
"https://..."
],
"isVideo": false,
"category": "Food",
"categoryId": 2,
"details": {
"hashtags": [
"#food",
"#recipe"
],
"publishedAt": "2025-12-10T12:34:56Z"
},
"allComments": [
{
"id": "c_001",
"author": "User A",
"text": "Looks amazing!",
"likes": 12,
"replies": [
{
"id": "r_001",
"author": "User B",
"text": "Agree!",
"likes": 2
}
]
}
],
"commentStats": {
"totalComments": 1,
"totalReplies": 1,
"maxThreadDepth": 2
}
}
],
"metadata": {
"feedsUrl": "https://...",
"category": "Food",
"categoryId": 2,
"region": "us",
"totalScraped": 50,
"scrollsPerformed": 15,
"videoPostsFound": 5,
"detailedPostsScraped": 10
}
}
Lemon8 Feeds Scraper/
├── src/
│ ├── main.py
│ ├── runner.py
│ ├── settings.py
│ ├── clients/
│ │ ├── __init__.py
│ │ ├── stealth_fetcher.py
│ │ └── session_manager.py
│ ├── scraping/
│ │ ├── __init__.py
│ │ ├── feed_scroller.py
│ │ ├── post_parser.py
│ │ ├── details_extractor.py
│ │ └── comments_extractor.py
│ ├── models/
│ │ ├── __init__.py
│ │ ├── post.py
│ │ ├── author.py
│ │ ├── comment.py
│ │ └── metadata.py
│ ├── storage/
│ │ ├── __init__.py
│ │ ├── kvs_media_store.py
│ │ └── dataset_writer.py
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── throttling.py
│ │ ├── retries.py
│ │ ├── validators.py
│ │ └── logging_config.py
│ └── constants/
│ ├── __init__.py
│ ├── categories.py
│ └── regions.py
├── tests/
│ ├── test_categories.py
│ ├── test_regions.py
│ ├── test_post_parser.py
│ └── test_comments_extractor.py
├── examples/
│ ├── input.sample.json
│ └── output.sample.json
├── scripts/
│ ├── run_local.sh
│ └── export_dataset.py
├── .env.example
├── .gitignore
├── pyproject.toml
├── requirements.txt
└── README.md
- Content researchers use it to collect category-specific posts and comments, so they can analyze themes, sentiment, and creator patterns.
- Growth teams use it to monitor engagement analytics across regions, so they can spot rising trends and optimize content strategy faster.
- Data analysts use it to build structured datasets from infinite feeds, so they can run dashboards, scoring models, and weekly reporting.
- Brand monitoring teams use it to track content mentions and comment discussions, so they can catch reputation risks early and respond with context.
- Media archiving workflows use it to download images/videos and preserve post metadata, so they can maintain searchable archives for audits or review.
How do I choose the right category and region?
Use category (0–21) to select the feed vertical you want and region (e.g., us, au, jp, th, sg, ca) to localize results. If you’re validating coverage, start with a lower limit (e.g., 25–50) and increase once results match your expectations.
What’s the difference between limit and detailsLimit?
limit controls how many posts you collect from the feed overall. detailsLimit controls how many of those posts are expanded into full detail mode when getDetails=true. This lets you keep a broad feed sample while only deep-extracting the top N posts.
When should I enable media downloads?
Turn on saveImages and/or saveVideos when you need local persistence of media for audits, archives, or offline analysis. If your goal is purely analytics, keeping downloads off will reduce bandwidth usage and speed up runs.
Why might runs slow down or collect fewer posts than expected?
Feed loading behavior, rate limits, and dynamic content can reduce throughput. Using proxy configuration and keeping getDetails/comment extraction limited (via detailsLimit) generally improves stability and keeps runs consistent.
Primary Metric: A typical run collects ~120–220 feed posts per minute in list-only mode (details/comments disabled), depending on region latency and scroll load time.
Reliability Metric: With proxy enabled and conservative throttling, successful extraction completion commonly exceeds 95% across repeated runs on the same category/region.
Efficiency Metric: Detail mode increases per-post cost; limiting detailsLimit to 10–20 usually keeps overall runtime within 1.5–2.5× compared to list-only collection at the same limit.
Quality Metric: Post-level fields (id, url, author, category, basic statistics) are typically near-complete; comment depth completeness improves when fewer detailed posts are requested, reducing timeouts and partial thread loads.
