Skip to content

Commit ae5eb26

Browse files
author
Adriano Sanges
committed
Update request headers in scraper.py for improved compatibility
- Modify User-Agent to reflect a Windows environment for better request handling - Simplify headers by removing unnecessary fields while retaining essential ones - Enhance scraping reliability by ensuring headers mimic a typical browser request
1 parent f084e02 commit ae5eb26

File tree

1 file changed

+2
-10
lines changed

1 file changed

+2
-10
lines changed

real-estate-etl/scraper.py

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,8 @@ def parse_price(price_str):
1515

1616
def parse_page(url: str) -> Dict[str, Optional[any]]:
1717
headers = {
18-
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
19-
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
20-
'Accept-Language': 'en-US,en;q=0.9',
21-
'Accept-Encoding': 'gzip, deflate, br',
22-
'Connection': 'keep-alive',
23-
'Upgrade-Insecure-Requests': '1',
24-
'Sec-Fetch-Dest': 'document',
25-
'Sec-Fetch-Mode': 'navigate',
26-
'Sec-Fetch-Site': 'none',
27-
'Sec-Fetch-User': '?1',
18+
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
19+
"Accept-Language": "en-US,en;q=0.9"
2820
}
2921
logging.debug("Parsing page: %s", url)
3022
response = requests.get(url, headers=headers)

0 commit comments

Comments
 (0)