Skip to content

Conversation

@aHishamm
Copy link

This pull request updates the Bayt.com job scraper to address issues that caused it to stop functioning. The previous implementation used outdated HTML selectors, resulting in no job listings being scraped. This update resolves the issue by:

  1. Fixing incorrect class names for job listing elements.
  2. Improving job information extraction by using more reliable attribute-based selectors.
  3. Enhancing error handling to prevent silent failures.

Key Fixes & Improvements

Updated Job Listing Selector

  • Replaced li.has-pointer-d with li[data-js-job], which correctly targets job listings.

Improved Job Information Extraction

  • Job Title & URL: Updated to extract information from <h2> elements instead of a missing jb-title class.
    Company Name: Now fetched from div.t-nowrap.p10l instead of b.jb-company, which is no longer present in the HTML.
  • Job Location: Updated extraction from div.t-mute.t-small, replacing the outdated jb-loc class.
    Better Error Handling & Debugging
  • Wrapped __extract_job_info() in a try-except block to prevent failures on individual job listings.
    Added meaningful error messages to help diagnose potential scraping issues.

Screenshots of the current implementation of Bayt scraper and the new fixed implementaton:

  • Current Bayt scraper:

  • Screenshot 2025-02-21 at 12 13 04 AM
  • Updated Bayt scraper:

  • Screenshot 2025-02-21 at 12 11 41 AM

@aHishamm
Copy link
Author

@nikhil25803 Kindly review this PR, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants