Subito Automotive Details Scraper collects rich vehicle listing details from individual Subito.it car pages and turns them into structured, analysis-ready data. It helps automotive teams and analysts replace manual copy-paste with repeatable extraction for pricing, inventory, and market research. Use Subito Automotive Details Scraper to standardize Italian car listing data at scale.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for subito-automotive-details-scraper you've just found your team — Let’s Chat. 👆👆
This project extracts comprehensive vehicle listing details from Subito.it automotive pages and outputs consistent, structured records you can store, analyze, or feed into apps. It solves the problem of messy, manual data collection across many listings by batching URLs and returning normalized fields. It’s built for dealerships, market researchers, price intelligence teams, automotive data providers, and anyone tracking the Italian used-car market.
- Processes direct vehicle detail URLs in batches for efficient extraction workflows
- Captures both structured specs (make/model/year/mileage) and marketplace signals (favorites, trust info)
- Supports resilient execution with retries and optional “continue on failure” behavior
- Produces consistent JSON records suitable for databases, dashboards, and ML pipelines
- Works well for repeated monitoring to detect pricing shifts and listing changes over time
| Feature | Description |
|---|---|
| Batch URL processing | Extract details from many vehicle listing pages in a single run for faster market coverage. |
| Comprehensive vehicle specs | Collects structured specs like make, model, trim, body type, fuel, gearbox, engine, dimensions, and emissions where available. |
| Seller & trust signals | Extracts advertiser profile details and trust indicators to support reputation scoring and fraud checks. |
| Engagement metrics | Captures favorite counters and related signals to estimate buyer interest and listing momentum. |
| Robust retry handling | Retries transient failures per URL to improve reliability on unstable connections. |
| Optional failure tolerance | Can continue processing the remaining URLs even if some pages fail, reducing wasted runs. |
| Proxy-ready configuration | Supports routed requests to reduce blocking risk and improve consistency during larger runs. |
| Clean structured output | Returns normalized JSON records designed for analytics, comparisons, and downstream enrichment. |
| Field Name | Field Description |
|---|---|
| category_slug | Category identifier for the listing (e.g., "auto") used for filtering and grouping. |
| url | Full listing URL for reference, deduplication, and change tracking. |
| page_title | Page title string useful as a compact human-readable summary and labeling. |
| category | Hierarchical category metadata (id, label, friendly name) for navigation and taxonomy mapping. |
| category_specific_data | Structured sections of vehicle specs grouped by topic (e.g., engine, dimensions, comfort, safety). |
| category_specific_data.title | Section title such as "Caratteristiche", "Motore e consumi", "Comfort". |
| category_specific_data.features | Array of label/value pairs for each spec, often with a semantic URI key. |
| ad | Full advertisement object including seller-written description and core listing identifiers. |
| ad.subject | Listing headline/title as written by the seller. |
| ad.body | Seller description text (useful for NLP, condition notes, and feature mentions). |
| ad.date | Listing timestamp/date string for recency analysis and time-series tracking. |
| ad.images | Listing image references (CDN base URLs) for media ingestion or preview use. |
| ad.features | Key structured listing attributes such as price, mileage, year, fuel, doors, color, registration month. |
| price | Price value (when present) typically in EUR for pricing analytics and alerts. |
| mileage_scalar | Normalized mileage numeric value for filtering and valuation models. |
| year | Vehicle year of registration/manufacture when present. |
| fuel | Fuel type (e.g., petrol, diesel, metano, hybrid, electric) for segmentation analysis. |
| gearbox | Transmission type (manual/automatic) for comparisons and market breakdowns. |
| geo | Region/city/town location metadata for regional pricing and supply analysis. |
| internal_links.header | Related navigation links that can help discover similar vehicles or categories. |
| internal_links.footer | Additional related links for expansion, clustering, or crawling strategies. |
| favorite_counter | Count of users who favorited the listing, indicating popularity and demand signals. |
| advertiser_profile | Seller profile basics such as username, phone visibility, and account characteristics. |
| trust_info | Reputation and trust metadata including feedback scores and presence/response indicators (when available). |
| shipping_costs | Delivery/shipping cost information if offered by the seller. |
| promo | Promotional tier or visibility boosts if applied (e.g., featured placement). |
[
{
"category_slug": "auto",
"url": "https://www.subito.it/auto/fiat-punto-natural-power-x-neopata-auto-perfetta-salerno-620662384.htm",
"page_title": "Fiat Punto Natural Power x neopata auto perfetta - Auto In vendita a Salerno",
"category_specific_data": [
{
"title": "Caratteristiche",
"features": [
{ "label": "Marca", "value": "FIAT", "uri": "/car/brand" },
{ "label": "Modello", "value": "Punto 4ª serie", "uri": "/car/model" },
{ "label": "Allestimento", "value": "Punto 1.4 8V 5 porte Natural Power Easy", "uri": "/car/version" }
]
},
{
"title": "Motore e consumi",
"features": [
{ "label": "Alimentazione", "value": "metano", "uri": "/fuel" },
{ "label": "Cambio", "value": "Manuale", "uri": "/gearbox" },
{ "label": "Cilindrata (cc)", "value": "1368", "uri": "/cubic_capacity" },
{ "label": "Potenza (CV)", "value": "77", "uri": "/horsepower" }
]
}
],
"favorite_counter": { "value": 0 },
"advertiser_profile": { "username": "Rino Fernicola", "show_phone": false },
"geo": {
"region": { "value": "Campania" },
"city": { "value": "Salerno" },
"town": { "value": "Mercato San Severino" }
},
"ad": {
"subject": "Fiat Punto Natural Power x neopata auto perfetta",
"date": "2025-10-17 05:19:59",
"features": {
"price": "3700 €",
"mileage_scalar": "200000",
"year": "2013",
"fuel": "Metano",
"gearbox": "Manuale"
}
}
}
]
Subito Automotive Details Scraper/
├── src/
│ ├── main.py
│ ├── runner.py
│ ├── config/
│ │ ├── schema.json
│ │ ├── settings.py
│ │ └── input.example.json
│ ├── clients/
│ │ ├── http_client.py
│ │ └── session_manager.py
│ ├── extractors/
│ │ ├── listing_details.py
│ │ ├── seller_profile.py
│ │ ├── trust_signals.py
│ │ └── internal_links.py
│ ├── parsers/
│ │ ├── spec_groups_parser.py
│ │ ├── ad_features_parser.py
│ │ └── text_normalizer.py
│ ├── utils/
│ │ ├── retry.py
│ │ ├── validators.py
│ │ ├── logger.py
│ │ └── time_utils.py
│ └── outputs/
│ ├── record_builder.py
│ └── exporters.py
├── data/
│ ├── inputs.sample.txt
│ └── output.sample.json
├── scripts/
│ ├── run_local.sh
│ └── validate_output.py
├── tests/
│ ├── test_parsers.py
│ ├── test_extractors.py
│ └── fixtures/
│ └── listing_page_sample.html
├── .gitignore
├── LICENSE
├── requirements.txt
└── README.md
- Used car dealerships use it to track competitor listings, so they can adjust pricing and inventory strategy with real market evidence.
- Price intelligence teams use it to monitor make/model price ranges, so they can trigger alerts for undervalued vehicles and market shifts.
- Automotive marketplaces use it to aggregate listings into unified catalogs, so they can improve search relevance and buyer experience.
- Business analysts use it to measure regional demand signals (favorites, listing velocity), so they can identify hotspots and seasonal trends.
- Data teams use it to build training datasets from structured specs and ad text, so they can power valuation models and classification pipelines.
Q1: What kind of URLs does this project support? It expects direct vehicle detail page URLs (single listing pages). Search pages, category pages, or filtered result pages typically won’t provide the same stable structure and may produce incomplete results.
Q2: Some fields are missing in the output — is that a bug? Not necessarily. Many listing fields are optional (shipping costs, shop reviews, certain spec groups, promos). The extractor outputs what exists on the page while keeping the overall record structure consistent.
Q3: How should I choose retry and failure settings for large batches? For routine monitoring, a small retry count (e.g., 2) usually balances speed and reliability. If you’re processing a critical batch, increasing retries can improve success rates but will slow down runs for problematic URLs. Enabling “ignore failures” helps complete partial batches rather than stopping everything on one error.
Q4: How do I use the output for price tracking over time?
Store the url as the primary key and keep snapshots with a run timestamp. Then compare price, favorite_counter, and key specs across snapshots to detect price drops, rising demand, or listing edits.
Primary Metric: Processes 50–100 vehicle detail URLs per run with a typical per-URL extraction time of ~2–5 seconds under stable connectivity.
Reliability Metric: Achieves ~92–98% successful extractions in mixed batches when retries are enabled and request routing is configured for consistency.
Efficiency Metric: Maintains steady throughput by reusing sessions and limiting repeated fetches, reducing wasted requests on stable pages and keeping memory usage modest for batch jobs.
Quality Metric: Captures core listing attributes (price, year, mileage, fuel, gearbox, location) consistently, with high completeness on pages that provide structured spec groups and seller trust sections.
