The preview of the new Python 3 port has broken HTML escaping in the XML feeds

I am using:  
**O.S**:  Fedora 40
**Browser**:  Firefox 131.0.2
**Platform**: desktop

## Problem
The preview of the new Python 3 port has broken HTML escaping in the XML feeds

eg try to view this in the browser:

  https://planetpython.org/3/rss10.xml

and it will complain about undefined entities, due to having raw unescaped HTML in the XML document

By comparison the original Python 2 code escaped HTML in the feed

```
$ wget https://planetpython.org/rss10.xml
$ grep "content:encoded" rss10.xml | head -1
	<content:encoded>&lt;p&gt;As is probably apparent from the sequence of blog posts about the topic in the
$ wget https://planetpython.org/3/rss10.xml
$ grep "content:encoded" rss10.xml.1 | head -1
	<content:encoded><p>As is probably apparent from the sequence of blog posts about the topic in the
```

## Details
![Screenshot from 2024-10-24 14-08-18](https://github.com/user-attachments/assets/a4ac1a59-629f-4bf1-a38a-8cec413f10df)

This problem is caused by a mistake in the python 3 conversion done in #577, specially in commit https://github.com/python/planet/pull/577/commits/86e31f90403c4659471396beeba922584e08d12e replaced code patterns like:

```
feed[key] = sanitize.HTML(feed[key])
```

with

```
feed[key] = Markup(feed[key])
```

which is not providing functionally equivalent behaviour.

The `sanitize.HTML` method would parse the HTML and strip out various undesirable elements and attributes, and escaping was later performed by the template processor.

The `Markup` method will not parse anything, it'll just wrap the `str` in a `Markup` class, as a way to designate it as being safe to use as-is without further escaping. As a result when you later try to escape the variable in jinga using `... | e`, it will do nothing at all, resulting in raw HTML being put into the XML document, leading to the later parsing errors.

I think either the original sanitizer code needs to be re-instated and made to work with py3, or perhaps an external library such as https://github.com/matthiask/html-sanitizer/ could be leveraged  ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

The preview of the new Python 3 port has broken HTML escaping in the XML feeds #582

Problem

Details

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

The preview of the new Python 3 port has broken HTML escaping in the XML feeds #582

Description

Problem

Details

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions