Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,16 @@ on:
paths-ignore:
- ".gitignore"
- "README.md"
pull_request:
- "pre-commit-config.yaml"
- "LICENSE"

jobs:
test:
strategy:
max-parallel: 6
matrix:
os: [ "ubuntu-latest", "windows-latest", "macos-latest" ]
python-version: [ "3.9", "3.10", "3.11", "3.12", "3.13" ]
python-version: [ "3.9", "3.10", "3.11", "3.12", "3.13", "3.14" ]

runs-on: ${{ matrix.os }}

Expand Down
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ repos:
- run
- mypy
- rss_parser
pass_filenames: false
language: system
stages: [ push ]

Expand Down
32 changes: 28 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,31 @@ pip install dist/*.whl
- Models for RSS-specific schemas have been moved from `rss_parser.models` to `rss_parser.models.rss`. Generic types remain unchanged
- Date parsing has been improved and now uses pydantic's `validator` instead of `email.utils`, producing better datetime objects where it previously defaulted to `str`

## V2 -> V3 Migration

`rss-parser` 3.x upgrades the runtime models to [Pydantic v2](https://docs.pydantic.dev/latest/migration/). Highlights:

- **New default models** now inherit from `pydantic.BaseModel` v2 and use `model_validate`/`model_dump`. If you extend our classes, switch from `dict()`/`json()` to `model_dump()`/`model_dump_json()`.
- **Legacy compatibility** lives under `rss_parser.models.legacy`. Point your custom parser at the legacy schema if you must stay on the v1 API surface.
- **Collections**: list-like XML fields now use `OnlyList[...]` directly with an automatic `default_factory` so that attributes are always lists (no more `Optional[OnlyList[T]] = Field(..., default=[])`). Update custom schemas accordingly.
- **Custom hooks**: if you relied on `rss_parser.pydantic_proxy`, import it from `rss_parser.models.legacy.pydantic_proxy`. The top-level module only re-exports it for backwards compatibility.

See the “Legacy Models” section below for sample snippets showing how to stay on the older types. Tests in this repo cover both tracks to guarantee matching output.

## Legacy Models

Pydantic v1-based models are still available under `rss_parser.models.legacy`. They retain the previous behaviour and re-export the `import_v1_pydantic` helper as `rss_parser.models.legacy.pydantic_proxy.import_v1_pydantic`. You can continue to use them by pointing your parser at the legacy schema:

```python
from rss_parser import RSSParser
from rss_parser.models.legacy.rss import RSS as LegacyRSS

class LegacyRSSParser(RSSParser):
schema = LegacyRSS
```

Tests in this repository run against both the v2 and legacy models to ensure parity.

## Usage

### Quickstart
Expand Down Expand Up @@ -163,18 +188,17 @@ If you don't want to deal with these conditions and want to parse something **al
```python
from typing import Optional

from pydantic import Field

from rss_parser.models.rss.item import Item
from rss_parser.models.types.only_list import OnlyList
from rss_parser.models.types.tag import Tag
from rss_parser.pydantic_proxy import import_v1_pydantic

pydantic = import_v1_pydantic()
...


class OptionalChannelElementsMixin(...):
...
items: Optional[OnlyList[Tag[Item]]] = pydantic.Field(alias="item", default=[])
items: Optional[OnlyList[Tag[Item]]] = Field(alias="item", default_factory=list)
```

### Tag Field
Expand Down
720 changes: 398 additions & 322 deletions poetry.lock

Large diffs are not rendered by default.

10 changes: 7 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "rss-parser"
version = "2.1.1"
version = "3.0.0a2"
description = "Typed pythonic RSS/Atom parser"
authors = ["dhvcc <[email protected]>"]
license = "GPL-3.0"
Expand Down Expand Up @@ -28,6 +28,8 @@ classifiers = [
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Programming Language :: Python :: 3.14",
]
packages = [{ include = "rss_parser" }, { include = "rss_parser/py.typed" }]

Expand All @@ -39,7 +41,7 @@ packages = [{ include = "rss_parser" }, { include = "rss_parser/py.typed" }]

[tool.poetry.dependencies]
python = "^3.9"
pydantic = ">1.9"
pydantic = "<3.0"
xmltodict = "^0.13.0"
types-xmltodict = "^0.14.0.20241009"

Expand Down Expand Up @@ -85,6 +87,9 @@ select = [
"RUF", # ruff
]

[tool.ruff.lint.pep8-naming]
ignore-names = ["LegacyRSS"]

[tool.ruff.per-file-ignores]
"tests/**.py" = [
"S101", # Use of assert detected
Expand All @@ -95,7 +100,6 @@ select = [
"**/__init__.py" = ["F401"]
"rss_parser/models/atom/**" = ["A003"]


[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
2 changes: 1 addition & 1 deletion rss_parser/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
from ._parser import AtomParser, BaseParser, RSSParser

__all__ = ("BaseParser", "AtomParser", "RSSParser")
__all__ = ("AtomParser", "BaseParser", "RSSParser")
4 changes: 4 additions & 0 deletions rss_parser/_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@ def parse(
if root_key:
root = root.get(root_key, root)

if hasattr(schema, "model_validate"):
return schema.model_validate(root)

# Pydantic v1 only
return schema.parse_obj(root)


Expand Down
35 changes: 14 additions & 21 deletions rss_parser/models/__init__.py
Original file line number Diff line number Diff line change
@@ -1,32 +1,25 @@
"""
Models created according to https://www.rssboard.org/rss-specification.
from __future__ import annotations

Some types and validation may be a bit custom to account for broken standards in some RSS feeds.
"""

from json import loads
from typing import TYPE_CHECKING
from pydantic import BaseModel, ConfigDict

from rss_parser.models.utils import camel_case
from rss_parser.pydantic_proxy import import_v1_pydantic

if TYPE_CHECKING:
from pydantic import v1 as pydantic
else:
pydantic = import_v1_pydantic()


class XMLBaseModel(pydantic.BaseModel):
class Config:
alias_generator = camel_case
class XMLBaseModel(BaseModel):
model_config = ConfigDict(alias_generator=camel_case)

def json_plain(self, **kw):
def json_plain(self, **kwargs) -> str:
"""
Run pydantic's json with custom encoder to encode Tags as only content.
Serialize the model while flattening Tag instances into their content.
"""
from rss_parser.models.types.tag import Tag # noqa: PLC0415

return self.json(models_as_dict=False, encoder=Tag.flatten_tag_encoder, **kw)
return self.model_dump_json(fallback=Tag.flatten_tag_encoder, **kwargs)

def dict_plain(self, **kwargs):
from rss_parser.models.types.tag import Tag # noqa: PLC0415

return self.model_dump(mode="json", fallback=Tag.flatten_tag_encoder, **kwargs)


def dict_plain(self, **kw):
return loads(self.json_plain(**kw))
__all__ = ("XMLBaseModel",)
7 changes: 3 additions & 4 deletions rss_parser/models/atom/atom.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,14 @@
from typing import Optional

from pydantic import Field

from rss_parser.models import XMLBaseModel
from rss_parser.models.atom.feed import Feed
from rss_parser.models.types.tag import Tag
from rss_parser.pydantic_proxy import import_v1_pydantic

pydantic = import_v1_pydantic()


class Atom(XMLBaseModel):
"""Atom 1.0"""

version: Optional[Tag[str]] = pydantic.Field(alias="@version")
version: Optional[Tag[str]] = Field(alias="@version", default=None)
feed: Tag[Feed]
13 changes: 6 additions & 7 deletions rss_parser/models/atom/entry.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
from typing import Optional

from pydantic import Field

from rss_parser.models import XMLBaseModel
from rss_parser.models.atom.person import Person
from rss_parser.models.types.date import DateTimeOrStr
from rss_parser.models.types.only_list import OnlyList
from rss_parser.models.types.tag import Tag
from rss_parser.pydantic_proxy import import_v1_pydantic

pydantic = import_v1_pydantic()


class RequiredAtomEntryMixin(XMLBaseModel):
Expand All @@ -22,10 +21,10 @@ class RequiredAtomEntryMixin(XMLBaseModel):


class RecommendedAtomEntryMixin(XMLBaseModel):
authors: Optional[OnlyList[Tag[Person]]] = pydantic.Field(alias="author", default=[])
authors: OnlyList[Tag[Person]] = Field(alias="author", default_factory=OnlyList)
"Entry authors."

links: Optional[OnlyList[Tag[str]]] = pydantic.Field(alias="link", default=[])
links: OnlyList[Tag[str]] = Field(alias="link", default_factory=OnlyList)
"The URL of the entry."

content: Optional[Tag[str]] = None
Expand All @@ -36,10 +35,10 @@ class RecommendedAtomEntryMixin(XMLBaseModel):


class OptionalAtomEntryMixin(XMLBaseModel):
categories: Optional[OnlyList[Tag[dict]]] = pydantic.Field(alias="category", default=[])
categories: OnlyList[Tag[dict]] = Field(alias="category", default_factory=OnlyList)
"Specifies a categories that the entry belongs to."

contributors: Optional[OnlyList[Tag[Person]]] = pydantic.Field(alias="contributor", default=[])
contributors: OnlyList[Tag[Person]] = Field(alias="contributor", default_factory=OnlyList)
"Entry contributors."

rights: Optional[Tag[str]] = None
Expand Down
15 changes: 7 additions & 8 deletions rss_parser/models/atom/feed.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
from typing import Optional

from pydantic import Field

from rss_parser.models import XMLBaseModel
from rss_parser.models.atom.entry import Entry
from rss_parser.models.atom.person import Person
from rss_parser.models.types.date import DateTimeOrStr
from rss_parser.models.types.only_list import OnlyList
from rss_parser.models.types.tag import Tag
from rss_parser.pydantic_proxy import import_v1_pydantic

pydantic = import_v1_pydantic()


class RequiredAtomFeedMixin(XMLBaseModel):
Expand All @@ -23,21 +22,21 @@ class RequiredAtomFeedMixin(XMLBaseModel):


class RecommendedAtomFeedMixin(XMLBaseModel):
authors: Optional[OnlyList[Tag[Person]]] = pydantic.Field(alias="author", default=[])
authors: OnlyList[Tag[Person]] = Field(alias="author", default_factory=OnlyList)
"Names one author of the feed. A feed may have multiple author elements."

links: Optional[OnlyList[Tag[str]]] = pydantic.Field(alias="link", default=[])
links: OnlyList[Tag[str]] = Field(alias="link", default_factory=OnlyList)
"The URL to the feed. A feed may have multiple link elements."


class OptionalAtomFeedMixin(XMLBaseModel):
entries: Optional[OnlyList[Tag[Entry]]] = pydantic.Field(alias="entry", default=[])
entries: OnlyList[Tag[Entry]] = Field(alias="entry", default_factory=OnlyList)
"The entries in the feed. A feed may have multiple entry elements."

categories: Optional[OnlyList[Tag[dict]]] = pydantic.Field(alias="category", default=[])
categories: OnlyList[Tag[dict]] = Field(alias="category", default_factory=OnlyList)
"Specifies a categories that the feed belongs to. The feed may have multiple categories elements."

contributors: Optional[OnlyList[Tag[Person]]] = pydantic.Field(alias="contributor", default=[])
contributors: OnlyList[Tag[Person]] = Field(alias="contributor", default_factory=OnlyList)
"Feed contributors."

generator: Optional[Tag[str]] = None
Expand Down
3 changes: 0 additions & 3 deletions rss_parser/models/atom/person.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,6 @@

from rss_parser.models import XMLBaseModel
from rss_parser.models.types.tag import Tag
from rss_parser.pydantic_proxy import import_v1_pydantic

pydantic = import_v1_pydantic()


class Person(XMLBaseModel):
Expand Down
3 changes: 0 additions & 3 deletions rss_parser/models/atom/source.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,6 @@
from rss_parser.models import XMLBaseModel
from rss_parser.models.types.date import DateTimeOrStr
from rss_parser.models.types.tag import Tag
from rss_parser.pydantic_proxy import import_v1_pydantic

pydantic = import_v1_pydantic()


class Source(XMLBaseModel):
Expand Down
36 changes: 36 additions & 0 deletions rss_parser/models/legacy/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
"""
Models created according to https://www.rssboard.org/rss-specification.

Some types and validation may be a bit custom to account for broken standards in some RSS feeds.
"""

import sys
from json import loads
from typing import TYPE_CHECKING

if sys.version_info >= (3, 14):
raise ImportError("Legacy models are not supported in Python 3.14 and above")

from rss_parser.models.legacy.pydantic_proxy import import_v1_pydantic
from rss_parser.models.legacy.utils import camel_case

if TYPE_CHECKING:
from pydantic import v1 as pydantic
else:
pydantic = import_v1_pydantic()


class XMLBaseModel(pydantic.BaseModel):
class Config:
alias_generator = camel_case

def json_plain(self, **kw):
"""
Run pydantic's json with custom encoder to encode Tags as only content.
"""
from rss_parser.models.legacy.types.tag import Tag # noqa: PLC0415

return self.json(models_as_dict=False, encoder=Tag.flatten_tag_encoder, **kw)

def dict_plain(self, **kw):
return loads(self.json_plain(**kw))
3 changes: 3 additions & 0 deletions rss_parser/models/legacy/atom/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from .atom import Atom

__all__ = ("Atom",)
15 changes: 15 additions & 0 deletions rss_parser/models/legacy/atom/atom.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from typing import Optional

from rss_parser.models.legacy import XMLBaseModel
from rss_parser.models.legacy.atom.feed import Feed
from rss_parser.models.legacy.pydantic_proxy import import_v1_pydantic
from rss_parser.models.legacy.types.tag import Tag

pydantic = import_v1_pydantic()


class Atom(XMLBaseModel):
"""Atom 1.0"""

version: Optional[Tag[str]] = pydantic.Field(alias="@version")
feed: Tag[Feed]
Loading