Skip to content

Commit 8adbfe7

Browse files
authored
Merge pull request #138 from matagus/last-fixes-before-1.0.0-version
Several fixes before version 1.0.0
2 parents aa1753c + 6241c13 commit 8adbfe7

49 files changed

Lines changed: 2439 additions & 126 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,9 @@ settings_local.py
3434
##PyCharm
3535
.idea
3636

37-
# Sphinx
37+
# Sphinx / mkdocs / readthedocs
3838
docs/_build
39+
site/
3940

4041
# thumbnails, etc
4142
screenshots/.DS_Store

.readthedocs.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ build:
1212

1313
mkdocs:
1414
configuration: mkdocs.yml
15-
fail_on_warning: false
15+
fail_on_warning: true
1616

1717
# Optional but recommended, declare the Python requirements required
1818
# to build your documentation

README.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,9 +97,25 @@ PLANET = {
9797
"USER_AGENT": "MyPlanet/1.0", # Customize the User-Agent for feed requests
9898
"RECENT_POSTS_LIMIT": 10, # Number of recent posts to show
9999
"RECENT_BLOGS_LIMIT": 10, # Number of recent blogs to show
100+
"FETCH_ORIGINAL_CONTENT": False, # Fetch and archive full post content from the original URL
101+
"FETCH_CONTENT_DELAY": 0, # Seconds to wait between content fetches (int or float)
100102
}
101103
```
102104

105+
#### Original Content Archiving
106+
107+
When `FETCH_ORIGINAL_CONTENT` is `True`, django-planet will fetch the full HTML of each post's original URL using `readability-lxml` to extract the article body. The result is stored in `Post.original_content` and shown on the post detail page instead of the feed summary.
108+
109+
```python
110+
PLANET = {
111+
"FETCH_ORIGINAL_CONTENT": True,
112+
"FETCH_CONTENT_DELAY": 1, # 1 second between fetches to be polite to servers
113+
}
114+
```
115+
116+
- If fetching fails for a post, a WARNING is logged and `original_content` remains `None` (the feed summary is shown as fallback).
117+
- Use the `planet_fetch_post_content` management command to backfill existing posts.
118+
103119
4. Run migrations:
104120

105121
```bash
@@ -323,6 +339,90 @@ All admin interfaces include search and filtering capabilities.
323339
- Updates feed metadata (etag, last_checked)
324340
- Creates new Post and Author entries as needed
325341

342+
**`planet_fetch_post_content`**
343+
- Backfills `original_content` for posts where it is missing
344+
- Optional `--feed <id>` argument to limit to a specific feed
345+
- Optional `--limit <n>` argument to cap the number of posts processed
346+
- Respects `FETCH_CONTENT_DELAY` between requests
347+
348+
## 🔍 Post Filtering
349+
350+
By default, all feed entries are saved. You can configure a **post filter backend** to accept only relevant posts before they are stored.
351+
352+
### Configuration
353+
354+
```python
355+
PLANET = {
356+
"POST_FILTER_BACKEND": "planet.backends.accept_all.AcceptAllBackend", # default
357+
"TOPIC_KEYWORDS": [],
358+
}
359+
```
360+
361+
### Built-in Backends
362+
363+
**`planet.backends.accept_all.AcceptAllBackend`** *(default)*
364+
Accepts every entry unchanged. No configuration required.
365+
366+
**`planet.backends.keyword.KeywordFilterBackend`**
367+
Accepts entries whose title or summary contains at least one of the configured keywords (case-insensitive). Rejected entries are logged at `INFO` level.
368+
369+
```python
370+
PLANET = {
371+
"POST_FILTER_BACKEND": "planet.backends.keyword.KeywordFilterBackend",
372+
"TOPIC_KEYWORDS": ["python", "django", "open source"],
373+
}
374+
```
375+
376+
When `TOPIC_KEYWORDS` is empty the backend accepts all entries (fail-open).
377+
378+
### Writing a Custom Backend
379+
380+
Subclass `BasePostFilterBackend` and implement `filter_entries`:
381+
382+
```python
383+
from planet.backends.base import BasePostFilterBackend
384+
385+
class MyBackend(BasePostFilterBackend):
386+
def filter_entries(self, entries, feed):
387+
# entries: list of feedparser entry objects
388+
# feed: planet.models.Feed instance
389+
return [e for e in entries if passes_my_check(e)]
390+
```
391+
392+
Then point to it in your settings:
393+
394+
```python
395+
PLANET = {
396+
"POST_FILTER_BACKEND": "myapp.backends.MyBackend",
397+
}
398+
```
399+
400+
## 📋 Logging
401+
402+
django-planet uses Python's standard `logging` module. All loggers use names under the `planet.*` namespace (e.g. `planet.utils`, `planet.management.commands.planet_update_all_feeds`).
403+
404+
Following Python library best practices, **no handlers are attached by default** — the host project controls all logging output. Add a `LOGGING` configuration in your Django settings to see log output:
405+
406+
```python
407+
LOGGING = {
408+
"version": 1,
409+
"disable_existing_loggers": False,
410+
"handlers": {
411+
"console": {
412+
"class": "logging.StreamHandler",
413+
},
414+
},
415+
"loggers": {
416+
"planet": {
417+
"handlers": ["console"],
418+
"level": "INFO", # Use "DEBUG" for more verbosity
419+
},
420+
},
421+
}
422+
```
423+
424+
At `INFO` level you'll see feed add/update summaries and 304 skips. At `DEBUG` level you'll also see individual fetch details, per-entry creation, and `to_datetime()` edge cases.
425+
326426
## 📸 Screenshots
327427

328428
### Post List

docs/admin.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Admin Interface
2+
3+
Django-planet registers all models in the Django admin with search, filtering, and cross-linking between related objects.
4+
5+
## "Add Feed by URL" Workflow
6+
7+
The Feed admin has a special workflow for adding new feeds. When you click "Add Feed", you only need to provide the feed URL:
8+
9+
1. Enter the feed URL and click Save
10+
2. django-planet automatically creates a **Blog** entry (using the feed's domain as a placeholder title) if one doesn't already exist
11+
3. A **Feed** stub is created, ready to be populated on the next `planet_update_all_feeds` run
12+
13+
This is the same operation as `python manage.py planet_add_feed <url>`, but accessible from the admin.
14+
15+
!!! note
16+
17+
The feed's title and metadata will be populated automatically the next time feeds are updated.
18+
19+
## BlogAdmin
20+
21+
- **List display:** title, URL, date created
22+
- **Search:** by title, URL
23+
- **Inline feeds:** Read-only tabular inline showing all feeds for the blog, with links to each feed's admin page
24+
25+
## FeedAdmin
26+
27+
- **List display:** title, URL, blog, language, etag, last modified, last checked, active status
28+
- **List filter:** by language
29+
- **Search:** by title, URL, blog title
30+
- **Fieldsets:**
31+
- *General:* title, URL, blog, language
32+
- *Feed Status:* etag, last modified, last checked, is_active
33+
- *Authors:* read-only list of all authors who have posts in this feed, with links to each author's admin page
34+
35+
## PostAdmin
36+
37+
- **List display:** title, feed, guid, date published, date created
38+
- **List filter:** by feed title, language
39+
- **Search:** by title, blog title
40+
- **Optimized queries:** uses `select_related` for feed and blog to minimize database queries
41+
42+
## AuthorAdmin
43+
44+
- **List display:** name, email
45+
- **Search:** by name
46+
- **Fieldsets:**
47+
- *General:* name, email, profile URL
48+
- *Feeds:* read-only list of all feeds this author has contributed to, with links to each feed's admin page
49+
50+
## PostAuthorDataAdmin
51+
52+
- **List display:** author name, is_contributor flag, post
53+
- **List filter:** by is_contributor, author

docs/configuration.md

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
# Configuration Reference
2+
3+
All django-planet settings are defined in a single `PLANET` dictionary in your Django settings module. Any key you don't specify falls back to its default.
4+
5+
## PLANET_CONFIG
6+
7+
```python
8+
PLANET = {
9+
"USER_AGENT": "MyPlanet/1.0",
10+
"RECENT_POSTS_LIMIT": 10,
11+
"RECENT_BLOGS_LIMIT": 10,
12+
"POST_FILTER_BACKEND": "planet.backends.accept_all.AcceptAllBackend",
13+
"TOPIC_KEYWORDS": [],
14+
"FETCH_ORIGINAL_CONTENT": False,
15+
"FETCH_CONTENT_DELAY": 0,
16+
}
17+
```
18+
19+
### Settings Reference
20+
21+
`USER_AGENT`
22+
: **Type:** `str`
23+
**Default:** `"Django Planet/<version>"`
24+
25+
The User-Agent header sent when fetching feeds and post content.
26+
27+
`RECENT_POSTS_LIMIT`
28+
: **Type:** `int`
29+
**Default:** `10`
30+
31+
Number of posts shown by the `{% recent_posts %}` template tag.
32+
33+
`RECENT_BLOGS_LIMIT`
34+
: **Type:** `int`
35+
**Default:** `10`
36+
37+
Number of blogs shown by the `{% recent_blogs %}` template tag.
38+
39+
`POST_FILTER_BACKEND`
40+
: **Type:** `str` (dotted Python path)
41+
**Default:** `"planet.backends.accept_all.AcceptAllBackend"`
42+
43+
The backend class used to filter incoming feed entries before they are saved. See [Post Filter Backends](#post-filter-backends) below.
44+
45+
`TOPIC_KEYWORDS`
46+
: **Type:** `list[str]`
47+
**Default:** `[]`
48+
49+
Keywords used by `KeywordFilterBackend` to filter posts. Only entries whose title or summary contains at least one keyword (case-insensitive) are accepted.
50+
51+
`FETCH_ORIGINAL_CONTENT`
52+
: **Type:** `bool`
53+
**Default:** `False`
54+
55+
When `True`, django-planet fetches the full HTML of each post's original URL using `readability-lxml` and stores it in `Post.original_content`. See [Usage > Content Archiving](usage.md#original-content-archiving).
56+
57+
`FETCH_CONTENT_DELAY`
58+
: **Type:** `int | float`
59+
**Default:** `0`
60+
61+
Seconds to wait between content fetches. Set this to a positive value (e.g., `1`) to be polite to origin servers.
62+
63+
## Post Filter Backends
64+
65+
By default, all feed entries are saved. You can configure a post filter backend to accept only relevant posts before they are stored.
66+
67+
### AcceptAllBackend (default)
68+
69+
```python
70+
PLANET = {
71+
"POST_FILTER_BACKEND": "planet.backends.accept_all.AcceptAllBackend",
72+
}
73+
```
74+
75+
Accepts every entry unchanged. No configuration required.
76+
77+
### KeywordFilterBackend
78+
79+
```python
80+
PLANET = {
81+
"POST_FILTER_BACKEND": "planet.backends.keyword.KeywordFilterBackend",
82+
"TOPIC_KEYWORDS": ["python", "django", "open source"],
83+
}
84+
```
85+
86+
Accepts entries whose title or summary contains at least one of the configured keywords (case-insensitive). Rejected entries are logged at `INFO` level.
87+
88+
When `TOPIC_KEYWORDS` is empty, the backend accepts all entries (fail-open).
89+
90+
### Writing a Custom Backend
91+
92+
Subclass `BasePostFilterBackend` and implement `filter_entries`:
93+
94+
```python
95+
from planet.backends.base import BasePostFilterBackend
96+
97+
98+
class MyBackend(BasePostFilterBackend):
99+
def filter_entries(self, entries, feed):
100+
# entries: list of feedparser entry objects
101+
# feed: planet.models.Feed instance
102+
return [e for e in entries if passes_my_check(e)]
103+
```
104+
105+
Then point to it in your settings:
106+
107+
```python
108+
PLANET = {
109+
"POST_FILTER_BACKEND": "myapp.backends.MyBackend",
110+
}
111+
```
112+
113+
## Logging
114+
115+
django-planet uses Python's standard `logging` module. All loggers use names under the `planet.*` namespace (e.g., `planet.utils`, `planet.management.commands.planet_update_all_feeds`).
116+
117+
Following Python library best practices, **no handlers are attached by default** — the host project controls all logging output. Add a `LOGGING` configuration to see log output:
118+
119+
```python
120+
LOGGING = {
121+
"version": 1,
122+
"disable_existing_loggers": False,
123+
"handlers": {
124+
"console": {
125+
"class": "logging.StreamHandler",
126+
},
127+
},
128+
"loggers": {
129+
"planet": {
130+
"handlers": ["console"],
131+
"level": "INFO", # Use "DEBUG" for more verbosity
132+
},
133+
},
134+
}
135+
```
136+
137+
At `INFO` level you'll see feed add/update summaries and 304 skips. At `DEBUG` level you'll also see individual fetch details, per-entry creation, and `to_datetime()` edge cases.

0 commit comments

Comments
 (0)