-
Notifications
You must be signed in to change notification settings - Fork 285
Description
Feature Description π
Add structured NLP-based output to the meeting-summarizer pipeline that
automatically extracts:
- π Keywords/Key Topics β most important terms from the meeting
- β Action Items β sentences where someone commits to doing something
- π Structured JSON output β instead of just a plain paragraph summary
This requires no model training. Implementation uses lightweight,
fully local libraries:
- keyBERT β keyword ranking using BERT embeddings (1 pip install)
- spaCy β sentence parsing and named entity recognition
- regex patterns β rule-based action item detection
(flags sentences with "will", "should", "must", "need to", etc.)
Motivation π
Currently the meeting-summarizer returns a single plain-text paragraph.
For UX research sessions β which is RUXAILAB's primary use case β
researchers need structured takeaways fast. Reading a full paragraph
summary after every session is not practical, especially when:
- Multiple sessions are run in the same day
- Stakeholders need a quick written record of decisions made
- Follow-up tasks need to be tracked across team members
Tools like Otter.ai and Fireflies charge for this. RUXAILAB can offer
it for free, fully local, with no API key or internet connection needed.
This directly supports the platform's open-science mission.
Expected Behavior π€
When the summarizer runs on a meeting transcript, it should return
a structured output like:
{
"summary": "The team discussed mobile layout issues and
contrast problems on the login page...",
"keywords": ["mobile layout", "contrast", "login page",
"usability test", "navigation"],
"action_items": [
"John will fix the contrast issue on the login page",
"Team should retest the mobile flow next sprint",
"Maria needs to update the heuristics checklist"
]
}
A CLI flag --structured should toggle this format on.
Default behavior (plain summary) stays unchanged for compatibility.
Unit tests should cover a sample transcript with known action items
and verify they are correctly extracted.
Additional Information βΉοΈ
Proposed implementation steps:
- pip install keyBERT spaCy and add to requirements.txt
- Add keyword extraction module using keyBERT on the transcript text
- Add action item detector using regex + spaCy dependency parsing
- Wrap output in structured JSON when --structured flag is passed
- Write unit tests with a sample UX research transcript
Estimated effort: 1β2 weeks. No GPU or paid API required.
Fully offline and open-source compatible.
I am interested in implementing this as part of my GSoC 2026
contribution. Happy to discuss the approach on Discord or here
before starting. I can also share a working prototype/notebook
as a proof of concept first if that helps evaluation.