A tool to scrape Skool.com community posts, process them with Gemini AI to extract problems and insights, and store results in Google Sheets via SheetDB for further analysis.
This project helps you gather community problems from Skool.com communities so you can later create solution-based content. It:
- Logs into your Skool.com account
- Scrapes posts from specified communities
- Uses Gemini AI to identify problems discussed in each post
- Saves the structured data to a Google Sheet using the SheetDB service
- Node.js (v16+)
- npm or yarn
- A Skool.com account with access to the community you want to scrape
- A Google Sheet to store the data
- A SheetDB account (SheetDB.io)
- Gemini AI API key
- Clone this repository
- Install dependencies:
npm install - Create a
.envfile based on.env.example:cp .env.example .env - Fill in your credentials in the
.envfile
- Create a Google Sheet with the desired headers in the first row. Example headers:
id,problem,originalContent,suggestedSolution,tags,category,status,timestamp - Go to SheetDB.io and sign up or log in.
- Click "Create API" and paste the URL of your Google Sheet.
- Follow the instructions to connect your sheet.
- SheetDB will provide you with an API endpoint URL.
- (Recommended Security) In your SheetDB API settings, go to the "Authentication" section.
- Select "Token" (or "Bearer Token") authentication.
- SheetDB will generate a token. Copy this token.
- Copy the API endpoint URL and the Authentication Token (if generated).
- Paste the URL into your
.envfile as the value forSHEETDB_API_ENDPOINT. - Paste the Token into your
.envfile as the value forSHEETDB_AUTH_TOKEN.
Run the scraper:
npm start
For development with auto-restart:
npm run dev
src/config/- Configuration handlingenv.ts- Environment variable validation (Zod)services/- Core functionalityskoolScraper.ts- Handles scraping from Skool.comgeminiService.ts- Processes posts with Gemini AIsheetDbService.ts- Saves data to Google Sheets via SheetDB
types/- TypeScript type definitionsindex.ts- Main entry point
- Adjust the
POST_LIMIT_PER_COMMUNITYinsrc/index.tsto control how many posts are scraped per community. - Modify the Gemini AI prompt in
src/services/geminiService.tsto extract different information. - Ensure the headers in your Google Sheet match the keys used in
src/services/sheetDbService.ts(mapPostToSheetRowfunction).
- The selectors in the scraper may need adjustments as Skool.com updates their site.
- Be mindful of Skool.com's terms of service when scraping.
- Consider adding a longer delay between requests to avoid rate limiting on Skool or Gemini.
- Be aware of SheetDB's API limits, especially on the free tier.