Lemon8 Feeds Scraper

Lemon8 Feeds Scraper collects posts, images, videos, comments, and engagement analytics from Lemon8 feeds across multiple categories and regions. It’s built for teams that need reliable, repeatable feed intelligence for research, trend tracking, and content monitoring—without manual scrolling and copying. Use Lemon8 Feeds Scraper to turn fast-moving feed data into structured datasets you can analyze and automate.

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for lemon8-feeds-scraper you've just found your team — Let’s Chat. 👆👆

Introduction

This project extracts structured feed data from Lemon8, including post metadata, media assets, comment threads, and post-level statistics. It solves the problem of capturing large, scroll-based feeds and converting them into consistent, machine-readable output for analysis. It’s for developers, analysts, and growth teams who want searchable, exportable feed data for reporting, monitoring, and downstream pipelines.

Feed Intelligence Across Categories & Regions

Supports 22 feed categories (IDs 0–21) to target specific content verticals (e.g., Food, Fashion, Tech, Education).
Works across 10+ regions using region codes to localize results (e.g., us, au, jp, th, sg, ca).
Handles infinite scrolling behavior to collect large volumes of posts beyond initial page loads.
Extracts post-level analytics (likes, saves, comments) for trend scoring and performance comparisons.
Optionally fetches full post details and deep comment threads (including replies) for richer analysis.

Features

Feature	Description
22 Feed Categories	Target specific feed categories using `category` (0–21) for focused data collection.
10+ Regions	Localize scraping with `region` codes to capture regional content and trends.
Infinite Scrolling Capture	Continuously scrolls and collects posts until limits are reached or content is exhausted.
Full Post Data	Extracts titles, captions/content previews, hashtags, author metadata, URLs, and media flags.
Post Analytics	Captures key engagement stats (likes, saves, comments) for performance tracking.
Comment Extraction	Pulls comment threads including replies for sentiment, themes, and community insights.
Detail Fetch Mode	`getDetails` enables deeper post extraction; `detailsLimit` controls how many posts get full details.
Media Downloads to KVS	Optional saving of images/videos via `saveImages` and `saveVideos`.
Anti-Bot Strategy	Uses a stealth-capable fetching approach to reduce blocks and improve stability.
Proxy Support	Accepts an optional proxy configuration for higher success rates at scale.

What Data This Scraper Extracts

Field Name	Field Description
posts	Array of extracted posts from the selected feed.
posts[].id	Unique post identifier.
posts[].author	Author object for the post.
posts[].author.name	Display name of the author.
posts[].author.profileUrl	Link to the author profile.
posts[].author.profileImageUrl	URL to the author avatar/profile image.
posts[].title	Post title (when available).
posts[].content	Content preview/caption snippet.
posts[].postUrl	Direct URL to the post.
posts[].statistics	Engagement metrics object for the post.
posts[].statistics.savedCount	Number of saves/bookmarks for the post.
posts[].statistics.likesCount	Number of likes for the post.
posts[].statistics.commentsCount	Number of comments for the post.
posts[].images	Array of extracted image URLs / metadata.
posts[].isVideo	Boolean indicating whether the post contains video.
posts[].category	Category name (e.g., "Food").
posts[].categoryId	Category ID used for extraction (0–21).
posts[].details	Optional deep details object when `getDetails=true`.
posts[].allComments	Optional list of comments (and replies) when comment extraction is enabled.
posts[].commentStats	Optional derived comment metrics (counts, reply depth, etc.).
metadata	Run-level metadata about what was collected and how.
metadata.feedsUrl	Feed URL used for extraction.
metadata.category	Category name used for the run.
metadata.categoryId	Category ID used for the run.
metadata.region	Region code used for the run.
metadata.totalScraped	Total number of posts collected.
metadata.scrollsPerformed	Number of scrolling cycles executed.
metadata.videoPostsFound	Count of video posts detected during extraction.
metadata.detailedPostsScraped	Number of posts fully expanded via detail mode.

Example Output

{
	"posts": [
		{
			"id": "7412987407534162437",
			"author": {
				"name": "Author Name",
				"profileUrl": "https://...",
				"profileImageUrl": "https://..."
			},
			"title": "Post Title",
			"content": "Content preview...",
			"postUrl": "https://...",
			"statistics": {
				"savedCount": "0",
				"likesCount": "6437",
				"commentsCount": "0"
			},
			"images": [
				"https://..."
			],
			"isVideo": false,
			"category": "Food",
			"categoryId": 2,
			"details": {
				"hashtags": [
					"#food",
					"#recipe"
				],
				"publishedAt": "2025-12-10T12:34:56Z"
			},
			"allComments": [
				{
					"id": "c_001",
					"author": "User A",
					"text": "Looks amazing!",
					"likes": 12,
					"replies": [
						{
							"id": "r_001",
							"author": "User B",
							"text": "Agree!",
							"likes": 2
						}
					]
				}
			],
			"commentStats": {
				"totalComments": 1,
				"totalReplies": 1,
				"maxThreadDepth": 2
			}
		}
	],
	"metadata": {
		"feedsUrl": "https://...",
		"category": "Food",
		"categoryId": 2,
		"region": "us",
		"totalScraped": 50,
		"scrollsPerformed": 15,
		"videoPostsFound": 5,
		"detailedPostsScraped": 10
	}
}

Directory Structure Tree

Lemon8 Feeds Scraper/
├── src/
│   ├── main.py
│   ├── runner.py
│   ├── settings.py
│   ├── clients/
│   │   ├── __init__.py
│   │   ├── stealth_fetcher.py
│   │   └── session_manager.py
│   ├── scraping/
│   │   ├── __init__.py
│   │   ├── feed_scroller.py
│   │   ├── post_parser.py
│   │   ├── details_extractor.py
│   │   └── comments_extractor.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── post.py
│   │   ├── author.py
│   │   ├── comment.py
│   │   └── metadata.py
│   ├── storage/
│   │   ├── __init__.py
│   │   ├── kvs_media_store.py
│   │   └── dataset_writer.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── throttling.py
│   │   ├── retries.py
│   │   ├── validators.py
│   │   └── logging_config.py
│   └── constants/
│       ├── __init__.py
│       ├── categories.py
│       └── regions.py
├── tests/
│   ├── test_categories.py
│   ├── test_regions.py
│   ├── test_post_parser.py
│   └── test_comments_extractor.py
├── examples/
│   ├── input.sample.json
│   └── output.sample.json
├── scripts/
│   ├── run_local.sh
│   └── export_dataset.py
├── .env.example
├── .gitignore
├── pyproject.toml
├── requirements.txt
└── README.md

Use Cases

Content researchers use it to collect category-specific posts and comments, so they can analyze themes, sentiment, and creator patterns.
Growth teams use it to monitor engagement analytics across regions, so they can spot rising trends and optimize content strategy faster.
Data analysts use it to build structured datasets from infinite feeds, so they can run dashboards, scoring models, and weekly reporting.
Brand monitoring teams use it to track content mentions and comment discussions, so they can catch reputation risks early and respond with context.
Media archiving workflows use it to download images/videos and preserve post metadata, so they can maintain searchable archives for audits or review.

FAQs

How do I choose the right category and region? Use category (0–21) to select the feed vertical you want and region (e.g., us, au, jp, th, sg, ca) to localize results. If you’re validating coverage, start with a lower limit (e.g., 25–50) and increase once results match your expectations.

What’s the difference between limit and detailsLimit? limit controls how many posts you collect from the feed overall. detailsLimit controls how many of those posts are expanded into full detail mode when getDetails=true. This lets you keep a broad feed sample while only deep-extracting the top N posts.

When should I enable media downloads? Turn on saveImages and/or saveVideos when you need local persistence of media for audits, archives, or offline analysis. If your goal is purely analytics, keeping downloads off will reduce bandwidth usage and speed up runs.

Why might runs slow down or collect fewer posts than expected? Feed loading behavior, rate limits, and dynamic content can reduce throughput. Using proxy configuration and keeping getDetails/comment extraction limited (via detailsLimit) generally improves stability and keeps runs consistent.

Performance Benchmarks and Results

Primary Metric: A typical run collects ~120–220 feed posts per minute in list-only mode (details/comments disabled), depending on region latency and scroll load time.

Reliability Metric: With proxy enabled and conservative throttling, successful extraction completion commonly exceeds 95% across repeated runs on the same category/region.

Efficiency Metric: Detail mode increases per-post cost; limiting detailsLimit to 10–20 usually keeps overall runtime within 1.5–2.5× compared to list-only collection at the same limit.

Quality Metric: Post-level fields (id, url, author, category, basic statistics) are typically near-complete; comment depth completeness improves when fewer detailed posts are requested, reducing timeouts and partial thread loads.

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lemon8 Feeds Scraper

Introduction

Feed Intelligence Across Categories & Regions

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Lemon8 Feeds Scraper

Introduction

Feed Intelligence Across Categories & Regions

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages