π n8n community node for HeadlessX v2 - Anti-detection web scraping with Camoufox
π Documentation β’ π Quick Start β’ β¨ Features β’ π¦ Installation β’ π§ Configuration
HeadlessX v2 is a next-generation stealth web scraping API powered by Camoufox - an undetectable browser that bypasses anti-bot systems.
| Feature | Description | Use Cases |
|---|---|---|
| π¦ Camoufox Engine | Undetectable Firefox-based browser | Bot detection bypass |
| π Google SERP | Extract search results with anti-detection | SEO monitoring, search analysis |
| π HTML Extraction | Fast raw HTML or JS-rendered content | Web scraping, data mining |
| π Content Extraction | Clean Markdown from any page | Content analysis, text processing |
| πΈ Screenshots | High-quality page captures | Visual testing, documentation |
β οΈ Important: HeadlessX runs as a separate API server. This n8n node is a client that connects to your HeadlessX instance.π Get HeadlessX: github.com/SaifyXPRO/HeadlessX
| Change | Before (v1.x) | After (v2.0) |
|---|---|---|
| API Paths | /api/html |
/api/website/html |
| Operations | 8 operations | 5 streamlined operations |
| Methods | GET + POST duplicates | POST only (simplified) |
| New Features | - | Google SERP, HTML-JS rendering |
| Removed | PDF, Batch, Render | Not in v2 API |
| Operation | Endpoint | Description |
|---|---|---|
| π Extract HTML | POST /api/website/html |
Fast raw HTML extraction |
| π Extract HTML (JS) | POST /api/website/html-js |
HTML with JavaScript rendering |
| π Extract Content | POST /api/website/content |
Clean Markdown content |
| πΈ Screenshot | POST /api/website/screenshot |
High-quality page captures |
| π Google SERP | POST /api/google-serp/search |
Google search results extraction |
| Requirement | Version | Installation |
|---|---|---|
| HeadlessX Server | v2.0+ | Install Guide |
| n8n | 1.0.0+ | n8n Documentation |
| Node.js | 18+ | nodejs.org |
-
Install HeadlessX Server:
git clone https://github.com/SaifyXPRO/HeadlessX.git cd HeadlessX && pnpm install && pnpm dev
-
Install n8n Community Node:
- Go to Settings β Community Nodes in n8n
- Enter:
n8n-nodes-headlessx - Click Install
-
Configure Credentials:
- Create new HeadlessX API credential
- Base URL:
http://localhost:3000 - API Token: Your token
-
Test Connection:
- Add HeadlessX node to workflow
- Select any operation and test
π± Option 1: n8n Community Nodes (Recommended)
- Navigate to Settings β Community Nodes in your n8n instance
- Click Install a community node
- Enter package name:
n8n-nodes-headlessx - Click Install and wait for completion
- Restart n8n if required
π¦ Option 2: npm Installation
# Global installation
npm install -g n8n-nodes-headlessx
# Local installation (for self-hosted n8n)
npm install n8n-nodes-headlessxπ³ Option 3: Docker Setup
FROM n8nio/n8n:latest
USER root
RUN npm install -g n8n-nodes-headlessx
USER nodeDocker Compose Example:
version: '3.8'
services:
headlessx:
build: ./HeadlessX
ports: ["3000:3000"]
environment:
- DATABASE_URL=postgresql://...
restart: unless-stopped
n8n:
image: n8nio/n8n:latest
ports: ["5678:5678"]
volumes: ["n8n_data:/home/node/.n8n"]
depends_on: [headlessx]
restart: unless-stopped
volumes:
n8n_data:| Field | Description | Example | Required |
|---|---|---|---|
| Base URL | HeadlessX server endpoint | http://localhost:3000 |
β |
| API Token | Authentication token | your-secret-token |
β |
| Method | Format | Auto-Applied |
|---|---|---|
| Query Parameter | ?token=your-token |
β |
| Header Authentication | X-Token: your-token |
β |
π Extract HTML
Endpoint: POST /api/website/html
Extract raw HTML content from any webpage quickly without JavaScript rendering.
Parameters:
| Option | Description | Default |
|---|---|---|
| URL | Target webpage URL | Required |
| Timeout | Request timeout (ms) | 30000 |
| Wait Until | Page load condition | load |
| Headers | Custom HTTP headers | - |
| User Agent | Custom user agent | - |
Use Cases:
- Simple page scraping
- Static content extraction
- Fast bulk operations
π Extract HTML (JS Rendered)
Endpoint: POST /api/website/html-js
Extract HTML with full JavaScript rendering for SPAs and dynamic content.
Parameters:
| Option | Description | Default |
|---|---|---|
| URL | Target webpage URL | Required |
| Timeout | Request timeout (ms) | 30000 |
| Extra Wait | Additional wait time after load | 0 |
| Wait Until | Page load condition | networkidle0 |
Use Cases:
- Single Page Applications (SPAs)
- React/Vue/Angular sites
- Dynamic content extraction
π Extract Content
Endpoint: POST /api/website/content
Extract clean, readable Markdown content from any webpage.
Parameters:
| Option | Description | Default |
|---|---|---|
| URL | Target webpage URL | Required |
| Timeout | Request timeout (ms) | 30000 |
| Wait Until | Page load condition | load |
Use Cases:
- Article extraction
- Content analysis
- Text processing
- AI/LLM data preparation
πΈ Take Screenshot
Endpoint: POST /api/website/screenshot
Capture high-quality screenshots of webpages.
Parameters:
| Option | Description | Default |
|---|---|---|
| URL | Target webpage URL | Required |
| Full Page | Capture entire page | true |
| Format | PNG, JPEG, WebP | png |
| Quality | Image quality (1-100) | 80 |
| Wait for Selector | CSS selector to wait for | - |
Use Cases:
- Visual regression testing
- Website monitoring
- Documentation
- Social media content
π Google SERP Search
Endpoint: POST /api/google-serp/search
Extract Google search results with advanced anti-detection.
Parameters:
| Option | Description | Default |
|---|---|---|
| Query | Search query | Required |
| Number of Results | Results to return | 10 |
| Language | Search language | en |
| Country | Result localization | us |
| Safe Search | Safety filter level | off |
Use Cases:
- SEO monitoring
- Competitor analysis
- Keyword research
- Search result tracking
1. π·οΈ Simple Web Scraping
graph LR
A[Manual Trigger] --> B[HeadlessX: Extract HTML]
B --> C[Code Node: Process HTML]
C --> D[Output Results]
Configuration:
{
"operation": "html",
"url": "https://example.com",
"htmlOptions": {
"timeout": 30000,
"waitUntil": "networkidle2"
}
}2. π Google SERP Monitoring
graph LR
A[Schedule Trigger] --> B[HeadlessX: Google SERP]
B --> C[Store Results]
C --> D[Alert on Changes]
Configuration:
{
"operation": "googleSerp",
"query": "your keyword",
"serpOptions": {
"num": 20,
"hl": "en",
"gl": "us"
}
}3. πΈ Website Monitoring
graph LR
A[Schedule Trigger] --> B[HeadlessX: Screenshot]
B --> C[Compare Images]
C --> D[Send Alert]
Configuration:
{
"operation": "screenshot",
"url": "https://your-website.com",
"screenshotOptions": {
"fullPage": true,
"format": "png"
}
}β Connection Issues
"Couldn't connect with these settings"
| Check | Solution |
|---|---|
| Server Running | curl http://localhost:3000/api/health |
| URL Format | Use http://localhost:3000 (no /api) |
| Network Access | Check firewall/Docker networking |
| Token Validity | Verify API token is correct |
β±οΈ Timeout Issues
"Request timeout" errors
| Cause | Solution |
|---|---|
| Slow Page Load | Increase timeout to 60000ms+ |
| Dynamic Content | Use htmlJs operation with extraWait |
| Heavy Resources | Use domcontentloaded wait condition |
We welcome contributions! See CONTRIBUTING.md for guidelines.
MIT License - see LICENSE for details.
Made with β€οΈ by SaifyXPRO