Skip to content

Commit 791d39d

Browse files
newsroomdevstucka
andauthored
docs: add download_agency notes (#141)
* docs: add download_agency notes * Update contributing.md * Update usage.md --------- Co-authored-by: Gerald Rich <newsroomdev@users.noreply.github.com> Co-authored-by: Mike Stucka <stucka@whitedoggies.com>
1 parent 87fabd6 commit 791d39d

2 files changed

Lines changed: 9 additions & 7 deletions

File tree

docs/contributing.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -116,8 +116,8 @@ When coding a new scraper, there are a few important conventions to follow:
116116
- If it's a new state folder, add an empty `__init__.py` to the folder
117117
- Create a `Site` class inside the agency's scraper module with the following attributes/methods:
118118
- `name` - Official name of the agency
119-
- `scrape_meta` - generates a CSV with metadata about videos and other available files (file name, URL, and size at minimum)
120-
- `scrape` - uses the CSV generated by `scrape_meta` to download videos and other files
119+
- `scrape_meta` - generates a JSON with metadata about videos and other available files (file name, URL at a minimum)
120+
- `download_agency` - uses the JSON generated by `scrape_meta` to download videos and other files
121121

122122
Below is a pared down version of San Diego's [Site](https://github.com/biglocalnews/clean-scraper/blob/main/clean/ca/san_diego_pd.py) class to illustrate these conventions.
123123

@@ -285,6 +285,7 @@ Options:
285285
Commands:
286286
list List all available agencies and their slugs.
287287
scrape-meta Command-line interface for generating metadata CSV about...
288+
download_agency Downloads assets retrieved in scrape-meta
288289
```
289290

290291
Running a state is as simple as passing arguments to the appropriate subcommand.
@@ -299,7 +300,7 @@ pipenv run python -m clean.cli list
299300
pipenv run python -m clean.cli scrape-meta ca_san_diego_pd
300301

301302
# Trigger file downloads using agency slug
302-
pipenv run python -m clean.cli scrape ca_san_diego_pd
303+
pipenv run python -m clean.cli download_agency ca_san_diego_pd
303304
```
304305

305306
For more verbose logging, you can ask the system to show debugging information.

docs/usage.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,14 +31,14 @@ You can then run a scraper for an agency using its slug:
3131
clean-scraper scrape-meta ca_san_diego_pd
3232
```
3333

34-
> **NOTE**: Always run `scrape-meta` at least once initially. It generates output required by the `scrape` subcommand.
34+
> **NOTE**: Always run `scrape-meta` at least once initially. It generates output required by the `download_agency` subcommand.
3535
3636
To use the `clean` library in Python, import an agency's scraper and run it directly.
3737

3838
```python
3939
from clean.ca import san_diego_pd
4040

41-
san_diego_pd.scrape()
41+
san_diego_pd.download_agency()
4242
```
4343

4444
## Configuration
@@ -56,6 +56,7 @@ Options:
5656
--help Show this message and exit.
5757

5858
Commands:
59-
list List all available agencies and their slugs.
60-
scrape-meta Command-line interface for downloading CLEAN files.
59+
list List all available agencies and their slugs.
60+
scrape-meta Command-line interface for generating metadata JSON about...
61+
download_agency Downloads assets retrieved in scrape-meta
6162
```

0 commit comments

Comments
 (0)