[✨ FEATURE]: Add Keyword Extraction & Action Item Detection to Meeting Summarizer

### Feature Description 📝

Add structured NLP-based output to the meeting-summarizer pipeline that 
automatically extracts:

  - 📌 Keywords/Key Topics — most important terms from the meeting
  - ✅ Action Items — sentences where someone commits to doing something
  - 📝 Structured JSON output — instead of just a plain paragraph summary

This requires no model training. Implementation uses lightweight, 
fully local libraries:
  - keyBERT — keyword ranking using BERT embeddings (1 pip install)
  - spaCy — sentence parsing and named entity recognition
  - regex patterns — rule-based action item detection 
    (flags sentences with "will", "should", "must", "need to", etc.)

### Motivation 🌟

Currently the meeting-summarizer returns a single plain-text paragraph. 
For UX research sessions — which is RUXAILAB's primary use case — 
researchers need structured takeaways fast. Reading a full paragraph 
summary after every session is not practical, especially when:

  - Multiple sessions are run in the same day
  - Stakeholders need a quick written record of decisions made
  - Follow-up tasks need to be tracked across team members

Tools like Otter.ai and Fireflies charge for this. RUXAILAB can offer 
it for free, fully local, with no API key or internet connection needed.
This directly supports the platform's open-science mission.

### Expected Behavior 🤔

When the summarizer runs on a meeting transcript, it should return 
a structured output like:

  {
    "summary": "The team discussed mobile layout issues and 
                contrast problems on the login page...",

    "keywords": ["mobile layout", "contrast", "login page", 
                 "usability test", "navigation"],

    "action_items": [
      "John will fix the contrast issue on the login page",
      "Team should retest the mobile flow next sprint",
      "Maria needs to update the heuristics checklist"
    ]
  }

A CLI flag --structured should toggle this format on.
Default behavior (plain summary) stays unchanged for compatibility.
Unit tests should cover a sample transcript with known action items 
and verify they are correctly extracted.

### Additional Information ℹ️

Proposed implementation steps:

  1. pip install keyBERT spaCy and add to requirements.txt
  2. Add keyword extraction module using keyBERT on the transcript text
  3. Add action item detector using regex + spaCy dependency parsing
  4. Wrap output in structured JSON when --structured flag is passed
  5. Write unit tests with a sample UX research transcript

Estimated effort: 1–2 weeks. No GPU or paid API required.
Fully offline and open-source compatible.

I am interested in implementing this as part of my GSoC 2026 
contribution. Happy to discuss the approach on Discord or here 
before starting. I can also share a working prototype/notebook 
as a proof of concept first if that helps evaluation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[✨ FEATURE]: Add Keyword Extraction & Action Item Detection to Meeting Summarizer #1710

Feature Description 📝

Motivation 🌟

Expected Behavior 🤔

Additional Information ℹ️

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[✨ FEATURE]: Add Keyword Extraction & Action Item Detection to Meeting Summarizer #1710

Description

Feature Description 📝

Motivation 🌟

Expected Behavior 🤔

Additional Information ℹ️

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions