bsky-scraper

This is a Python script that collects posts from the Bluesky firehose and saves them to a JSONL file. This tool is designed to be easy to set up and use, making it accessible for anyone interested in archiving Bluesky posts.

See the original post about this on Bluesky.

Features

Collects posts from the Bluesky firehose.
Saves posts to a JSONL file with details such as text, creation time, author, URI, image presence, and reply information.
Uses a cache for efficient author handle resolution.

Requirements

Python >=3.11,<3.14
Poetry for dependency management

Setup

Clone the repository:

git clone https://github.com/deepfates/bsky-scraper.git
cd bsky-scraper

Install dependencies:

Use Poetry to install the required packages:
```
poetry install
```

Usage

Run the script:

You can start collecting posts by running the script with Poetry:
```
poetry run python scrape.py
```
By default, the script collects posts for 30 seconds. You can adjust the duration by modifying the duration_seconds parameter in the start_collection method.
Output:

The collected posts are saved to bluesky_posts.jsonl in the project directory. Each line in the file is a JSON object representing a post.

Customization

Output File: You can change the output file by passing a different filename to the FirehoseScraper constructor.
Collection Duration: Modify the duration_seconds parameter in the start_collection method to change how long the script collects posts.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.

Contact

For questions or feedback, please contact deepfates on Bluesky.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
scrape.py		scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

bsky-scraper

Features

Requirements

Setup

Usage

Customization

License

Contributing

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

deepfates/bsky-scraper

Folders and files

Latest commit

History

Repository files navigation

bsky-scraper

Features

Requirements

Setup

Usage

Customization

License

Contributing

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages