Skip to content

nh3 clean doesn't include html, head or body tags even when included in ALLOWED_TAGS #32

@barkhabol

Description

@barkhabol

While using nh3 library, we came across a use case, where HTML content is expected for a field, but we need to remove the content that can cause XSS attack. Using nh3.clean() directly on the input text doesn't give the expected result and a lot of useful data is getting trimmed ultimately modifying the html template input.

import nh3
text = '''
<!DOCTYPE html>
<html>
<head>
  <title>HTML Tutorial</title>
</head>
<body>
  <h1>This is a heading</h1>
  <p>This is a paragraph.</p>
</body>
</html>
'''

nh3.ALLOWED_TAGS.add('title')
nh3.ALLOWED_TAGS.add('head')
nh3.ALLOWED_TAGS.add('html')
nh3.ALLOWED_TAGS.add('div')
nh3.ALLOWED_TAGS.add('body')

print(nh3.clean(text,tags=nh3.ALLOWED_TAGS,strip_comments=False))

Output: 
<title>HTML Tutorial</title>
 <h1>This is a heading</h1>
 <p>This is a paragraph.</p> 

We don't want to trim the html or head or body tags. Is there any limitation to nh3 library which does not allow these tags?

Metadata

Metadata

Assignees

No one assigned

    Labels

    upstreamupstream ammonia issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions