Skip to content

fix: Properly render code snippets from Medium/RSS posts (#53)#103

Open
aditya-pandey-dev wants to merge 1 commit intoOpenAstronomy:mainfrom
aditya-pandey-dev:fix-markdownify-medium-codeblocks
Open

fix: Properly render code snippets from Medium/RSS posts (#53)#103
aditya-pandey-dev wants to merge 1 commit intoOpenAstronomy:mainfrom
aditya-pandey-dev:fix-markdownify-medium-codeblocks

Conversation

@aditya-pandey-dev
Copy link

This pull request addresses issue #53 by improving how Medium and RSS feed HTML content is processed and rendered on the Universe_OA site.

Key changes:

The grab.py script now converts Medium/RSS HTML content into markdown format using the markdownify library.

Code blocks, which are enclosed in triple backticks in markdown, are rendered properly on the site. On inspection, these appear correctly wrapped in

 tags.

Markdown elements such as headings, lists, and bold text are now accurately converted from HTML, eliminating unwanted or leftover raw HTML in the output.

The conversion and rendering behavior have been thoroughly tested on local servers to ensure no UI bugs or internal errors.

The changes ensure that code snippets maintain their formatting integrity, enabling users to easily copy and paste clean code blocks from blog posts without distortion.

Screenshots included with this PR demonstrate:

Terminal output showing the successful content conversion and Nikola static site build.

Rendered blog view with correctly formatted code blocks.

Browser inspection of the code block HTML structure confirming the proper use of tags.

Testing:

Local environment verified with clean rendering.
No regressions or errors encountered.
Code blocks appear consistent with expectations across different browsers and devices.

The main purpose of this PR to enhance readability and usability of blog entries sourced from external feeds, improving the overall user experience on the platform.

Please review and consider merging.

https://drive.google.com/file/d/1PvHdPKl-1h3yzgFwauqnbegeMTy9a6D6/view?usp=sharing

https://drive.google.com/file/d/1AWJzjwuHb_qA5QT1K1--XwJVcyeM9BnI/view?usp=sharing

https://drive.google.com/file/d/1EaDhSC3LP3uR83jE5Zj2Y20ZclHdaOYW/view?usp=sharing

https://drive.google.com/file/d/1D2Ls4uLF-lE3GnwfEMn2CHhr3tCxfi75/view?usp=sharing

https://drive.google.com/file/d/1ArEyuZZYVKKwhrk8UpY4D-MH-KpLxl8M/view?usp=sharing

@aditya-pandey-dev
Copy link
Author

I have created this PR for issue #53. Kindly review and merge if everything looks fine.
PR link: #103

Let me know if any changes are needed. Thank you.

@nabobalis
Copy link
Member

So the fix is to use BS4 to correctly render the incoming text?

@aditya-pandey-dev
Copy link
Author

Yes I used BeautifulSoup4 to parse and clean up the HTML from the incoming Medium/RSS content, so that code snippets and formatting are rendered properly. Is there is anything specific you'd like me to explain or adjust?

Comment on lines +117 to +121
if html:
content = convert_html_to_markdown(content)
print("\n------------MARKDOWN OUTPUT---------------\n")
print(content)
print("\n------------------END---------------------\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these?

Copy link
Author

@aditya-pandey-dev aditya-pandey-dev Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those prints were only for local debugging to inspect the converted markdown output. I’ll remove them in the next commit so the script stays clean.

Comment on lines +181 to +193
sample_html = '''
<h1>Test Heading</h1>
<ul>
<li>Point 1</li>
<li>Point 2</li>
</ul>
<b>Bold Demo</b>
<pre><code>print("Hello world")</code></pre>
'''

print("\n------MARKDOWN OUTPUT------\n")
print(convert_html_to_markdown(sample_html))
print("\n------END------\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this?

Copy link
Author

@aditya-pandey-dev aditya-pandey-dev Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those prints were only for local debugging to inspect the converted markdown output. I’ll also remove them in the next commit so the script stays clean.

@nabobalis
Copy link
Member

No, I was just surprised at how simple the fix is.

@aditya-pandey-dev
Copy link
Author

I had one small question right now the HTML → markdown conversion happens inside grab.py before passing the content to Nikola. Do you think this logic is in the right place, or would you prefer moving it into a separate helper or module so it’s easier to test and reuse for other feeds in the future?

@nabobalis
Copy link
Member

Seems fine to me, will have to let David provide the final word.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants