Skip to content

Strip HTML from event descriptions before insertion#276

Open
benthamite wants to merge 7 commits intokidd:masterfrom
benthamite:fix/strip-html-descriptions
Open

Strip HTML from event descriptions before insertion#276
benthamite wants to merge 7 commits intokidd:masterfrom
benthamite:fix/strip-html-descriptions

Conversation

@benthamite
Copy link
Copy Markdown

Summary

Google Calendar API returns event descriptions as HTML, containing tags like <a>, <br>, <html-blob>, <u>, <ul>/<li>, and HTML entities like &amp;, &nbsp;, etc. These are currently inserted raw into the :org-gcal: drawer, producing malformed content in Org files.

This causes problems for tools that parse Org files (e.g. org-roam interprets https://meet.google.com/abc</a> as a valid link, inserting a broken path into its database).

Changes

  • Add org-gcal--strip-html function that converts HTML descriptions to plain text:
    • <br> → newlines
    • <li>\n- (Org list items)
    • All other tags → removed
    • Common HTML entities decoded (&amp;, &lt;, &gt;, &nbsp;, &quot;, &#39;)
    • Consecutive blank lines collapsed
  • Apply it when binding desc in org-gcal--update-entry

Round-trip safety

Descriptions posted back to Google Calendar via org-gcal-post-at-point are read back from the Org file as plain text, which Google Calendar accepts. No formatting is lost that wasn't already lost by storing raw HTML in an Org file.

Example

Before:

<html-blob>Click the link:<br><a href="https://meet.jit.si/Weekly">https://meet.jit.si/Weekly</a><br></html-blob>

After:

Click the link:
https://meet.jit.si/Weekly

Fixes #258.

Google Calendar API returns event descriptions as HTML, containing
tags like <a>, <br>, <html-blob>, <u>, <ul>/<li>, and HTML entities
like &amp;, &nbsp;, etc.  These were inserted raw into the :org-gcal:
drawer, producing malformed content in Org files.

Add `org-gcal--strip-html' to convert HTML descriptions to plain
text and apply it when binding the `desc' variable in
`org-gcal--update-entry'.

Fixes kidd#258.
@telotortium
Copy link
Copy Markdown
Collaborator

@benthamite I'm not necessarily opposed to something like this, but I want to allow people to edit their event descriptions locally, including the HTML. Perhaps a per-headline property could be set in the Org file on the events that you want to strip HTML from.

@telotortium telotortium self-requested a review February 25, 2026 02:11
Copy link
Copy Markdown
Collaborator

@telotortium telotortium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments - also merge the latest master into this branch.

@benthamite
Copy link
Copy Markdown
Author

@benthamite I'm not necessarily opposed to something like this, but I want to allow people to edit their event descriptions locally, including the HTML. Perhaps a per-headline property could be set in the Org file on the events that you want to strip HTML from.

Fair. How about a user option, org-gcal-strip-html-descriptions, which strips html globally when set to non-nil (defaulting to nil)? I think the typical workflow for people who are bothered by the html in the description is to always disable it, rather than on a per-heading basis.

@telotortium
Copy link
Copy Markdown
Collaborator

@benthamite I'm not necessarily opposed to something like this, but I want to allow people to edit their event descriptions locally, including the HTML. Perhaps a per-headline property could be set in the Org file on the events that you want to strip HTML from.

Fair. How about a user option, org-gcal-strip-html-descriptions, which strips html globally when set to non-nil (defaulting to nil)? I think the typical workflow for people who are bothered by the html in the description is to always disable it, rather than on a per-heading basis.

It should probably at least be a per-calendar option, with a global user-customizable default, so that shared calendars that you import are not corrupted.

@telotortium
Copy link
Copy Markdown
Collaborator

@benthamite Also merge latest master - it has some fixes to the CI

Add `org-gcal-strip-html-descriptions' (boolean, default nil) as
the global default, and `org-gcal-strip-html-descriptions-overrides'
(alist of calendar-id to boolean) for per-calendar overrides.

This allows users to strip HTML globally while preserving it for
specific shared calendars, as requested in PR review.
@benthamite
Copy link
Copy Markdown
Author

benthamite commented Feb 25, 2026

Done. I've added two user options:

  • org-gcal-strip-html-descriptions (boolean, default nil) — the global default.
  • org-gcal-strip-html-descriptions-overrides — an alist of (calendar-id . boolean) for per-calendar overrides.

This way users can e.g. strip HTML globally while preserving it for specific shared calendars:

(setopt org-gcal-strip-html-descriptions t)
(setopt org-gcal-strip-html-descriptions-overrides
      '(("shared-calendar@group.calendar.google.com" . nil)))

Also confirmed the branch is up to date with latest master (merged in 09fb52a).

@telotortium
Copy link
Copy Markdown
Collaborator

@benthamite Could you add tests please?

- org-gcal-test--strip-html: unit tests for the HTML-to-text conversion
  (tags, entities, list items, blank line collapsing)
- org-gcal-test--strip-html-p: predicate tests covering global default,
  per-calendar enable override, and per-calendar disable override
- org-gcal-test--update-entry-strip-html: integration test verifying
  HTML is converted when stripping is enabled
- org-gcal-test--update-entry-preserve-html: integration test verifying
  HTML is preserved when stripping is disabled (default)
- org-gcal-test--update-entry-strip-html-per-calendar: integration test
  verifying per-calendar override takes effect
The regex <[^>]+> also matches Org timestamps like
<2019-10-06 Sun 17:00-21:00>.  Use </?[a-zA-Z][^>]*> instead,
which only matches actual HTML tags.
@benthamite
Copy link
Copy Markdown
Author

Added tests and merged latest master:

  • org-gcal-test--strip-html: unit tests for HTML-to-text conversion (tags, entities, list items, blank line collapsing)
  • org-gcal-test--strip-html-p: predicate tests covering global default nil/t and per-calendar overrides in both directions
  • org-gcal-test--update-entry-strip-html: integration test — HTML stripped when enabled
  • org-gcal-test--update-entry-preserve-html: integration test — HTML preserved when disabled (default)
  • org-gcal-test--update-entry-strip-html-per-calendar: integration test — per-calendar override works

Also merged latest master (picks up PR #279).

@benthamite
Copy link
Copy Markdown
Author

@telotortium, please let me know if there is anything else you’d like me to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

render event description html

2 participants