Community Scraper

A tool to scrape Skool.com community posts, process them with Gemini AI to extract problems and insights, and store results in Google Sheets via SheetDB for further analysis.

Project Overview

This project helps you gather community problems from Skool.com communities so you can later create solution-based content. It:

Logs into your Skool.com account
Scrapes posts from specified communities
Uses Gemini AI to identify problems discussed in each post
Saves the structured data to a Google Sheet using the SheetDB service

Setup Instructions

Prerequisites

Node.js (v16+)
npm or yarn
A Skool.com account with access to the community you want to scrape
A Google Sheet to store the data
A SheetDB account (SheetDB.io)
Gemini AI API key

Installation

Clone this repository
Install dependencies:
```
npm install
```
Create a .env file based on .env.example:
```
cp .env.example .env
```
Fill in your credentials in the .env file

SheetDB Setup

Create a Google Sheet with the desired headers in the first row. Example headers: id, problem, originalContent, suggestedSolution, tags, category, status, timestamp
Go to SheetDB.io and sign up or log in.
Click "Create API" and paste the URL of your Google Sheet.
Follow the instructions to connect your sheet.
SheetDB will provide you with an API endpoint URL.
(Recommended Security) In your SheetDB API settings, go to the "Authentication" section.
- Select "Token" (or "Bearer Token") authentication.
- SheetDB will generate a token. Copy this token.
Copy the API endpoint URL and the Authentication Token (if generated).
Paste the URL into your .env file as the value for SHEETDB_API_ENDPOINT.
Paste the Token into your .env file as the value for SHEETDB_AUTH_TOKEN.

Usage

Run the scraper:

npm start

For development with auto-restart:

npm run dev

Project Structure

src/
- config/ - Configuration handling
- env.ts - Environment variable validation (Zod)
- services/ - Core functionality
  - skoolScraper.ts - Handles scraping from Skool.com
  - geminiService.ts - Processes posts with Gemini AI
  - sheetDbService.ts - Saves data to Google Sheets via SheetDB
- types/ - TypeScript type definitions
- index.ts - Main entry point

Customization

Adjust the POST_LIMIT_PER_COMMUNITY in src/index.ts to control how many posts are scraped per community.
Modify the Gemini AI prompt in src/services/geminiService.ts to extract different information.
Ensure the headers in your Google Sheet match the keys used in src/services/sheetDbService.ts (mapPostToSheetRow function).

Notes

The selectors in the scraper may need adjustments as Skool.com updates their site.
Be mindful of Skool.com's terms of service when scraping.
Consider adding a longer delay between requests to avoid rate limiting on Skool or Gemini.
Be aware of SheetDB's API limits, especially on the free tier.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.cursor/rules		.cursor/rules
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
appscript.js		appscript.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Community Scraper

Project Overview

Setup Instructions

Prerequisites

Installation

SheetDB Setup

Usage

Project Structure

Customization

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Community Scraper

Project Overview

Setup Instructions

Prerequisites

Installation

SheetDB Setup

Usage

Project Structure

Customization

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages