GitHub - katacek/python-events-scraper-demo: Scrapes upcoming events from https://www.python.org

Readme overview

Find upcoming Python events all around the world!
Creating Actor
Publishing Actor to Store
Monetizing Actor
Creating Actor using CLI (command line interface)
Creating Actor using GitHub repository (command line interface)
What next (useful links)

Prerequisities

Account on Apify console: for creating Actor through web interface:
Node.js version 18 or higher with NPM installed: for using Apify CLI
Billing details and payment method set up: for Actor monetization

Find upcoming Python events all around the world!

We will try to find upcoming Python events all around the world, and the best website to find those is Python's official website.

Visit Python's official website events section: https://www.python.org/events/

As you can see there are a lot of upcoming events there. We will try to scrape all the upcoming events with their dates and locations and make an Actor out of it, and, in the end, publish it to Apify Store so that anybody from the community can use it.

Creating Actor

Visit the page to be scraped and inspect it using browser developers tools (aka devTools)

page: https://www.python.org/events/
devTools: press F12 or Right-click a page and select Inspect
in the Elements tab, look for the selector for the content we want to scrape
- (In Firefox it's called the Inspector). You can use this tab to inspect the page's HTML on the left hand side, and its CSS on the right. The items in the HTML view are called elements.
- All elements are wrapped in the html tag such as
  
  for paragraph, for link, …
- using the selector tool, find the selector: .list-recent-events.menu lifor our case

you can test the selector the devtools directly, just put the document.querySelector('.list-recent-events.menu li'); to the Console tab and see the result (it prints the first result)
if you do document.querySelectorAll(), it shows all the given elements
for filtering the happening ones, just do document.querySelectorAll('.list-recent-events.menu li:not(.most-recent-events)');
- good selectors: simple, human-readable, unique and semantically connected to the data.

Create Actor from Apify templates

Visit https://console.apify.com/actors/development/my-actors and click Develop new on the top right corner
Under Python section, select Start with Python template
Check the basic structure, information about the template, … and click Use this template
- there are also links to various resources / tutorial videos
name the actor 😁

Source code adjustments

in the input_schema.json update the prefill and add default value for the start url

{
    "title": "Scrape data from a web page",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "url": {
            "title": "URL of the page",
            "type": "string",
            "description": "The URL of website you want to get the data from.",
            "editor": "textfield",
            "prefill": "https://www.python.org/events/",
            "default": "https://www.python.org/events/"
        }
    },
    "required": ["url"]
}

in the main.py , we are going to replace this part using the selectors we have found earlier
first, change line 30 as well

actor_input = await Actor.get_input() or {'url': 'https://www.python.org/events/'}

and the original

# Parse the HTML content using Beautiful Soup and lxml parser.
soup = BeautifulSoup(response.content, 'lxml')


# Extract all headings from the page (tag name and text).
headings = []
for heading in soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6']):
    heading_object = {'level': heading.name, 'text': heading.text}
    Actor.log.info(f'Extracted heading: {heading_object}')
    headings.append(heading_object)

# Save the extracted headings to the dataset, which is a table-like storage.
await Actor.push_data(headings)

by those

# Defines a function to extract event details from the HTML response.
def extract_event_data(html):
    # Parses the HTML using BeautifulSoup.
    soup = BeautifulSoup(html, 'html.parser')
    # Initializes an empty events list and sets a baseUrl for constructing full URLs.
    events = []
    baseUrl = 'https://www.python.org'
    
    # Finds all <li> elements inside .list-recent-events.menu
    for event in soup.select('.list-recent-events.menu li'):
        # Extract the event title <a> element.
        title_tag = event.select_one('.event-title a')
        # Extract the event date inside a <time> tag.
        date_tag = event.select_one('time')
        # Extract the event location.
        location_tag = event.select_one('.event-location')
        
        # Extracts text values and ensures they have default values ('N/A' if missing).
        title = title_tag.get_text(strip=True) if title_tag else 'N/A'
        url = title_tag['href'] if title_tag and 'href' in title_tag.attrs else 'N/A'
        date = date_tag.get_text(separator=' ', strip=True) if date_tag else 'N/A'
        location = location_tag.get_text(strip=True) if location_tag else 'N/A'
        # Constructs the full event URL by appending the relative href to baseUrl.
        fullUrl = f"{baseUrl}{url}" if url else 'N/A'
        
        # Adds the extracted data into the events list.
        events.append({
            'title': title,
            'url': fullUrl,
            'date': date,
            'location': location
        })
    
    return events

# Calls the extract_event_data() function with the page’s HTML content.
events = extract_event_data(response.content)

# Saves the extracted event data to Apify’s dataset storage (like a database for structured data).
await Actor.push_data(events)

now, just hit the button Save, Build & Start
the Actor starts and take you to the Log tab
results are in the Output tab
- can be exported in various formats
- can be also seen in Storages (main left menu) -> Datasets

Publishing Actor to Store

Go to Apify Console to Actor detail page
- go to the Publication tab
- fill in all the details
- press Publish to store
- check it out by clicking on Store (main menu on the left) -> search for the name of your Actor
docs here

Monetizing Actor

at the Actor detail -> Publication tab open the Monetization card, follow the set up guide
basic info here
detailed info about pricing models here

Creating Actor through CLI

general docs here

brew install apify-cli // npm -g install apify-cli

apify create

select name, Python and Start with Python template

cd your-actor-name

in the input_schema.json update the prefill and add default value for the start url for the https://www.python.org/events/ as we did before
navigate to main.py and the same part of the code to be replaced

run apify-run and see the results in storage/dataset/default folder 🚀

push to apify platform

apify login
apify push

Get you to the browser and see, it is there!

Creating Actor through GitHub repository

You can easily create a new Actor from your github folder - just try to fork this repo to your workspace and follow this online guide.

What next (useful links)

Did you enjoy scraping and want to learn more? Just check out one of the following links

Apify web scraping academy and for python here
Step by step guide to extract data here
Looking for some inspiration what to build? Check the ideas page
Actor whitepaper
Apify open source fair share program
Create Actor from template video
How to build and monetize Actors on Apify Store - Earn passive income from your scrapers video
Apify Discord channel
Apify Actors developers [page](https://apify.com/partners/actor-developers}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.actor		.actor
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Readme overview

Prerequisities

Find upcoming Python events all around the world!

Creating Actor

Publishing Actor to Store

Monetizing Actor

Creating Actor through CLI

Creating Actor through GitHub repository

What next (useful links)

About

Uh oh!

Releases

Packages

Uh oh!

Languages

katacek/python-events-scraper-demo

Folders and files

Latest commit

History

Repository files navigation

Readme overview

Prerequisities

Find upcoming Python events all around the world!

Creating Actor

Publishing Actor to Store

Monetizing Actor

Creating Actor through CLI

Creating Actor through GitHub repository

What next (useful links)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages