Skip to content

beverly-benson/lemon8-feeds-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Lemon8 Feeds Scraper

Lemon8 Feeds Scraper collects posts, images, videos, comments, and engagement analytics from Lemon8 feeds across multiple categories and regions. It’s built for teams that need reliable, repeatable feed intelligence for research, trend tracking, and content monitoring—without manual scrolling and copying. Use Lemon8 Feeds Scraper to turn fast-moving feed data into structured datasets you can analyze and automate.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for lemon8-feeds-scraper you've just found your team — Let’s Chat. 👆👆

Introduction

This project extracts structured feed data from Lemon8, including post metadata, media assets, comment threads, and post-level statistics. It solves the problem of capturing large, scroll-based feeds and converting them into consistent, machine-readable output for analysis. It’s for developers, analysts, and growth teams who want searchable, exportable feed data for reporting, monitoring, and downstream pipelines.

Feed Intelligence Across Categories & Regions

  • Supports 22 feed categories (IDs 0–21) to target specific content verticals (e.g., Food, Fashion, Tech, Education).
  • Works across 10+ regions using region codes to localize results (e.g., us, au, jp, th, sg, ca).
  • Handles infinite scrolling behavior to collect large volumes of posts beyond initial page loads.
  • Extracts post-level analytics (likes, saves, comments) for trend scoring and performance comparisons.
  • Optionally fetches full post details and deep comment threads (including replies) for richer analysis.

Features

Feature Description
22 Feed Categories Target specific feed categories using category (0–21) for focused data collection.
10+ Regions Localize scraping with region codes to capture regional content and trends.
Infinite Scrolling Capture Continuously scrolls and collects posts until limits are reached or content is exhausted.
Full Post Data Extracts titles, captions/content previews, hashtags, author metadata, URLs, and media flags.
Post Analytics Captures key engagement stats (likes, saves, comments) for performance tracking.
Comment Extraction Pulls comment threads including replies for sentiment, themes, and community insights.
Detail Fetch Mode getDetails enables deeper post extraction; detailsLimit controls how many posts get full details.
Media Downloads to KVS Optional saving of images/videos via saveImages and saveVideos.
Anti-Bot Strategy Uses a stealth-capable fetching approach to reduce blocks and improve stability.
Proxy Support Accepts an optional proxy configuration for higher success rates at scale.

What Data This Scraper Extracts

Field Name Field Description
posts Array of extracted posts from the selected feed.
posts[].id Unique post identifier.
posts[].author Author object for the post.
posts[].author.name Display name of the author.
posts[].author.profileUrl Link to the author profile.
posts[].author.profileImageUrl URL to the author avatar/profile image.
posts[].title Post title (when available).
posts[].content Content preview/caption snippet.
posts[].postUrl Direct URL to the post.
posts[].statistics Engagement metrics object for the post.
posts[].statistics.savedCount Number of saves/bookmarks for the post.
posts[].statistics.likesCount Number of likes for the post.
posts[].statistics.commentsCount Number of comments for the post.
posts[].images Array of extracted image URLs / metadata.
posts[].isVideo Boolean indicating whether the post contains video.
posts[].category Category name (e.g., "Food").
posts[].categoryId Category ID used for extraction (0–21).
posts[].details Optional deep details object when getDetails=true.
posts[].allComments Optional list of comments (and replies) when comment extraction is enabled.
posts[].commentStats Optional derived comment metrics (counts, reply depth, etc.).
metadata Run-level metadata about what was collected and how.
metadata.feedsUrl Feed URL used for extraction.
metadata.category Category name used for the run.
metadata.categoryId Category ID used for the run.
metadata.region Region code used for the run.
metadata.totalScraped Total number of posts collected.
metadata.scrollsPerformed Number of scrolling cycles executed.
metadata.videoPostsFound Count of video posts detected during extraction.
metadata.detailedPostsScraped Number of posts fully expanded via detail mode.

Example Output

{
	"posts": [
		{
			"id": "7412987407534162437",
			"author": {
				"name": "Author Name",
				"profileUrl": "https://...",
				"profileImageUrl": "https://..."
			},
			"title": "Post Title",
			"content": "Content preview...",
			"postUrl": "https://...",
			"statistics": {
				"savedCount": "0",
				"likesCount": "6437",
				"commentsCount": "0"
			},
			"images": [
				"https://..."
			],
			"isVideo": false,
			"category": "Food",
			"categoryId": 2,
			"details": {
				"hashtags": [
					"#food",
					"#recipe"
				],
				"publishedAt": "2025-12-10T12:34:56Z"
			},
			"allComments": [
				{
					"id": "c_001",
					"author": "User A",
					"text": "Looks amazing!",
					"likes": 12,
					"replies": [
						{
							"id": "r_001",
							"author": "User B",
							"text": "Agree!",
							"likes": 2
						}
					]
				}
			],
			"commentStats": {
				"totalComments": 1,
				"totalReplies": 1,
				"maxThreadDepth": 2
			}
		}
	],
	"metadata": {
		"feedsUrl": "https://...",
		"category": "Food",
		"categoryId": 2,
		"region": "us",
		"totalScraped": 50,
		"scrollsPerformed": 15,
		"videoPostsFound": 5,
		"detailedPostsScraped": 10
	}
}

Directory Structure Tree

Lemon8 Feeds Scraper/
├── src/
│   ├── main.py
│   ├── runner.py
│   ├── settings.py
│   ├── clients/
│   │   ├── __init__.py
│   │   ├── stealth_fetcher.py
│   │   └── session_manager.py
│   ├── scraping/
│   │   ├── __init__.py
│   │   ├── feed_scroller.py
│   │   ├── post_parser.py
│   │   ├── details_extractor.py
│   │   └── comments_extractor.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── post.py
│   │   ├── author.py
│   │   ├── comment.py
│   │   └── metadata.py
│   ├── storage/
│   │   ├── __init__.py
│   │   ├── kvs_media_store.py
│   │   └── dataset_writer.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── throttling.py
│   │   ├── retries.py
│   │   ├── validators.py
│   │   └── logging_config.py
│   └── constants/
│       ├── __init__.py
│       ├── categories.py
│       └── regions.py
├── tests/
│   ├── test_categories.py
│   ├── test_regions.py
│   ├── test_post_parser.py
│   └── test_comments_extractor.py
├── examples/
│   ├── input.sample.json
│   └── output.sample.json
├── scripts/
│   ├── run_local.sh
│   └── export_dataset.py
├── .env.example
├── .gitignore
├── pyproject.toml
├── requirements.txt
└── README.md

Use Cases

  • Content researchers use it to collect category-specific posts and comments, so they can analyze themes, sentiment, and creator patterns.
  • Growth teams use it to monitor engagement analytics across regions, so they can spot rising trends and optimize content strategy faster.
  • Data analysts use it to build structured datasets from infinite feeds, so they can run dashboards, scoring models, and weekly reporting.
  • Brand monitoring teams use it to track content mentions and comment discussions, so they can catch reputation risks early and respond with context.
  • Media archiving workflows use it to download images/videos and preserve post metadata, so they can maintain searchable archives for audits or review.

FAQs

How do I choose the right category and region? Use category (0–21) to select the feed vertical you want and region (e.g., us, au, jp, th, sg, ca) to localize results. If you’re validating coverage, start with a lower limit (e.g., 25–50) and increase once results match your expectations.

What’s the difference between limit and detailsLimit? limit controls how many posts you collect from the feed overall. detailsLimit controls how many of those posts are expanded into full detail mode when getDetails=true. This lets you keep a broad feed sample while only deep-extracting the top N posts.

When should I enable media downloads? Turn on saveImages and/or saveVideos when you need local persistence of media for audits, archives, or offline analysis. If your goal is purely analytics, keeping downloads off will reduce bandwidth usage and speed up runs.

Why might runs slow down or collect fewer posts than expected? Feed loading behavior, rate limits, and dynamic content can reduce throughput. Using proxy configuration and keeping getDetails/comment extraction limited (via detailsLimit) generally improves stability and keeps runs consistent.


Performance Benchmarks and Results

Primary Metric: A typical run collects ~120–220 feed posts per minute in list-only mode (details/comments disabled), depending on region latency and scroll load time.

Reliability Metric: With proxy enabled and conservative throttling, successful extraction completion commonly exceeds 95% across repeated runs on the same category/region.

Efficiency Metric: Detail mode increases per-post cost; limiting detailsLimit to 10–20 usually keeps overall runtime within 1.5–2.5× compared to list-only collection at the same limit.

Quality Metric: Post-level fields (id, url, author, category, basic statistics) are typically near-complete; comment depth completeness improves when fewer detailed posts are requested, reducing timeouts and partial thread loads.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

 
 
 

Contributors