You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+100Lines changed: 100 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -97,9 +97,25 @@ PLANET = {
97
97
"USER_AGENT": "MyPlanet/1.0", # Customize the User-Agent for feed requests
98
98
"RECENT_POSTS_LIMIT": 10, # Number of recent posts to show
99
99
"RECENT_BLOGS_LIMIT": 10, # Number of recent blogs to show
100
+
"FETCH_ORIGINAL_CONTENT": False, # Fetch and archive full post content from the original URL
101
+
"FETCH_CONTENT_DELAY": 0, # Seconds to wait between content fetches (int or float)
100
102
}
101
103
```
102
104
105
+
#### Original Content Archiving
106
+
107
+
When `FETCH_ORIGINAL_CONTENT` is `True`, django-planet will fetch the full HTML of each post's original URL using `readability-lxml` to extract the article body. The result is stored in `Post.original_content` and shown on the post detail page instead of the feed summary.
108
+
109
+
```python
110
+
PLANET= {
111
+
"FETCH_ORIGINAL_CONTENT": True,
112
+
"FETCH_CONTENT_DELAY": 1, # 1 second between fetches to be polite to servers
113
+
}
114
+
```
115
+
116
+
- If fetching fails for a post, a WARNING is logged and `original_content` remains `None` (the feed summary is shown as fallback).
117
+
- Use the `planet_fetch_post_content` management command to backfill existing posts.
118
+
103
119
4. Run migrations:
104
120
105
121
```bash
@@ -323,6 +339,90 @@ All admin interfaces include search and filtering capabilities.
323
339
- Updates feed metadata (etag, last_checked)
324
340
- Creates new Post and Author entries as needed
325
341
342
+
**`planet_fetch_post_content`**
343
+
- Backfills `original_content` for posts where it is missing
344
+
- Optional `--feed <id>` argument to limit to a specific feed
345
+
- Optional `--limit <n>` argument to cap the number of posts processed
346
+
- Respects `FETCH_CONTENT_DELAY` between requests
347
+
348
+
## 🔍 Post Filtering
349
+
350
+
By default, all feed entries are saved. You can configure a **post filter backend** to accept only relevant posts before they are stored.
Accepts entries whose title or summary contains at least one of the configured keywords (case-insensitive). Rejected entries are logged at `INFO` level.
django-planet uses Python's standard `logging` module. All loggers use names under the `planet.*` namespace (e.g. `planet.utils`, `planet.management.commands.planet_update_all_feeds`).
403
+
404
+
Following Python library best practices, **no handlers are attached by default** — the host project controls all logging output. Add a `LOGGING` configuration in your Django settings to see log output:
405
+
406
+
```python
407
+
LOGGING= {
408
+
"version": 1,
409
+
"disable_existing_loggers": False,
410
+
"handlers": {
411
+
"console": {
412
+
"class": "logging.StreamHandler",
413
+
},
414
+
},
415
+
"loggers": {
416
+
"planet": {
417
+
"handlers": ["console"],
418
+
"level": "INFO", # Use "DEBUG" for more verbosity
419
+
},
420
+
},
421
+
}
422
+
```
423
+
424
+
At `INFO` level you'll see feed add/update summaries and 304 skips. At `DEBUG` level you'll also see individual fetch details, per-entry creation, and `to_datetime()` edge cases.
All django-planet settings are defined in a single `PLANET` dictionary in your Django settings module. Any key you don't specify falls back to its default.
The backend class used to filter incoming feed entries before they are saved. See [Post Filter Backends](#post-filter-backends) below.
44
+
45
+
`TOPIC_KEYWORDS`
46
+
: **Type:**`list[str]`
47
+
**Default:**`[]`
48
+
49
+
Keywords used by `KeywordFilterBackend` to filter posts. Only entries whose title or summary contains at least one keyword (case-insensitive) are accepted.
50
+
51
+
`FETCH_ORIGINAL_CONTENT`
52
+
: **Type:**`bool`
53
+
**Default:**`False`
54
+
55
+
When `True`, django-planet fetches the full HTML of each post's original URL using `readability-lxml` and stores it in `Post.original_content`. See [Usage > Content Archiving](usage.md#original-content-archiving).
56
+
57
+
`FETCH_CONTENT_DELAY`
58
+
: **Type:**`int | float`
59
+
**Default:**`0`
60
+
61
+
Seconds to wait between content fetches. Set this to a positive value (e.g., `1`) to be polite to origin servers.
62
+
63
+
## Post Filter Backends
64
+
65
+
By default, all feed entries are saved. You can configure a post filter backend to accept only relevant posts before they are stored.
Accepts entries whose title or summary contains at least one of the configured keywords (case-insensitive). Rejected entries are logged at `INFO` level.
87
+
88
+
When `TOPIC_KEYWORDS` is empty, the backend accepts all entries (fail-open).
89
+
90
+
### Writing a Custom Backend
91
+
92
+
Subclass `BasePostFilterBackend` and implement `filter_entries`:
93
+
94
+
```python
95
+
from planet.backends.base import BasePostFilterBackend
django-planet uses Python's standard `logging` module. All loggers use names under the `planet.*` namespace (e.g., `planet.utils`, `planet.management.commands.planet_update_all_feeds`).
116
+
117
+
Following Python library best practices, **no handlers are attached by default** — the host project controls all logging output. Add a `LOGGING` configuration to see log output:
118
+
119
+
```python
120
+
LOGGING= {
121
+
"version": 1,
122
+
"disable_existing_loggers": False,
123
+
"handlers": {
124
+
"console": {
125
+
"class": "logging.StreamHandler",
126
+
},
127
+
},
128
+
"loggers": {
129
+
"planet": {
130
+
"handlers": ["console"],
131
+
"level": "INFO", # Use "DEBUG" for more verbosity
132
+
},
133
+
},
134
+
}
135
+
```
136
+
137
+
At `INFO` level you'll see feed add/update summaries and 304 skips. At `DEBUG` level you'll also see individual fetch details, per-entry creation, and `to_datetime()` edge cases.
0 commit comments