Skip to content

v0.2.1

Choose a tag to compare

@gyedongjeon gyedongjeon released this 02 Jan 02:07
· 15 commits to main since this release
bd829fc

Release v0.2.1: Robust Content Extraction

Summary

This release focuses on resolving critical content extraction issues on major news platforms (New York Times and BBC News) and improving the reading experience for multimedia content.

Key Features & Fixes

1. BBC News Video Support

  • Interactive Video Placeholders: Replaces blank video areas with a visible poster image and a "Play Video" button.
  • Deep Content Extraction: Implemented Shadow DOM traversal to locate high-quality poster images hidden in BBC's custom players.
  • Click-to-Restore: Users can now click the video placeholder to exit Reader Mode and watch the video on the original page.

2. New York Times Lead Images

  • Improved Heuristics: Fixed a bug where minor body images blocked the extraction of high-quality lead images from the article header.
  • Noise Reduction: Enhanced filtering to ignore ad-related elements while preserving media content.

3. Architecture & Refactoring

  • Modularization: Split the monolithic content.js into focused modules:
    • src/content/utils.js: Core extraction logic.
    • src/content/ui.js: DOM generation.
    • src/content/main.js: Orchestration and events.
  • CI/CD Pipeline: Added GitHub Actions workflow to run unit tests automatically on PRs.
  • Test Structure: Split unit tests to match the new module structure, improving maintainability.

4. System Improvements

  • Robustness: Updated attribute sanitization to allow interactive elements (data-action) and proper styling.
  • Testing: Added comprehensive unit tests for video transformation and complex page structures.

Verification

  • Verified on live BBC and NYT articles.
  • Automated tests passing.

Checklist

  • Version bumped to 0.2.1 in manifest.json and package.json.
  • Changelog updated (this PR).
  • All tests passing.