Skip to content

feat: add Microsoft Graph provider#352

Draft
JanPokorny wants to merge 4 commits intomainfrom
feat/onedrive-transcript#locki-rdyya8sn
Draft

feat: add Microsoft Graph provider#352
JanPokorny wants to merge 4 commits intomainfrom
feat/onedrive-transcript#locki-rdyya8sn

Conversation

@JanPokorny
Copy link
Copy Markdown
Contributor

@JanPokorny JanPokorny commented Apr 28, 2026

Needs: kagenti/onecli#23
Resolves: #159

@xjacka
Copy link
Copy Markdown
Contributor

xjacka commented Apr 28, 2026

🛡️ Humr — Code Review

PR #352: feat: add Microsoft Graph provider

Author: JanPokorny | Branch: feat/onedrive-transcript#locki-rdyya8sn → main | Changes: +641 −3 (12 files)

Summary

Introduces a new onedrive-transcript Humr agent that polls OneDrive via Microsoft Graph for Teams meeting transcripts, parses VTT files using a bundled Python script, generates structured meeting notes, and posts results to Slack. Ships with a Dockerfile, three Python helper scripts, a CLAUDE.md-based agent prompt, Helm templates, and deploy task updates.

Findings

  • 🟡 Warning: .tmp/onecli is committed as a git submodule under a .tmp/ directory — almost certainly an accidental development artifact. Anyone cloning the repo without matching submodule credentials will get a broken workspace. This directory should be in .gitignore, not the tree. (/.tmp/onecli)
  • 🟡 Warning: fetch-new-transcripts.py has no per-transcript error handling — if a Graph API call or VTT download raises an exception, the whole script fails and all remaining transcripts are lost. Each download should be wrapped in try/except to isolate failures. (packages/agents/onedrive-transcript/workspace/work/scripts/fetch-new-transcripts.py)
  • 🟡 Warning: urllib.request.urlopen is called without a timeout throughout fetch-new-transcripts.py — a stalled Graph API response will block the agent indefinitely. Add timeout=30 (or similar) to all calls. (packages/agents/onedrive-transcript/workspace/work/scripts/fetch-new-transcripts.py)
  • 🟢 Suggestion: mark-processed.py caps the list at 20 entries but README.md says "capped at 5 entries" — inconsistency that will confuse operators. (packages/agents/onedrive-transcript/README.md)
  • 🟢 Suggestion: The Dockerfile installs uv via curl https://astral.sh/uv/install.sh | sh without pinning a version — builds at different times may pull different uv versions. Pin via a release URL for reproducibility. (packages/agents/onedrive-transcript/Dockerfile)
  • 🟢 Suggestion: No tests exist for any of the three Python helper scripts. Even minimal tests for VTT parsing edge cases would prevent regressions. (packages/agents/onedrive-transcript/workspace/work/scripts/)

Documentation Check (doc-drift)

⚠️ doc-drift unavailable: clone of branch feat/onedrive-transcript#locki-rdyya8sn failed — remote authentication error (no repository credentials in this environment).

Verdict

COMMENT — Solid new agent pattern, but three issues need attention before merge: the accidental .tmp/onecli submodule, no per-transcript error isolation in fetch-new-transcripts.py, and no timeouts on Graph API calls.


Review by Humr · automated code guardian

@JanPokorny JanPokorny marked this pull request as draft April 28, 2026 10:24
New agent template that polls OneDrive for Teams meeting recordings,
downloads VTT transcripts, processes them into structured meeting notes,
and posts results to Slack.

- Agent package at packages/agents/onedrive-transcript/
- Bundled parse-vtt.py script for VTT parsing (speaker extraction,
  segment merging, metadata)
- process-transcript Claude Code skill for structured notes generation
- Helm template gated by onedriveTranscriptTemplate.enabled
- Build config updated in deploy/tasks.toml

Closes #159

Assisted-By: Claude (Anthropic AI) <noreply@anthropic.com>
Signed-off-by: Jan Pokorný <JenomPokorny@gmail.com>
…nscripts

Signed-off-by: Jan Pokorný <JenomPokorny@gmail.com>
Old README described a hypothetical OneDrive-recordings-folder polling
flow with a Microsoft Graph MCP server. The actual implementation uses
direct Graph REST calls (no MCP) authenticated via OneCLI's MITM proxy,
and discovers transcripts through the calendar/onlineMeetings API.

Updates:
- Reflect actual auth setup (Calendars.Read, OnlineMeetings.Read,
  OnlineMeetingTranscript.Read.All scopes; admin consent one-click)
- Document the 5-step workflow the agent actually performs
- Document scope limit: scheduled meetings only (no MeetNow/channel)
- Schedule setup: cron */30, sessionMode: continuous, task prompt,
  Slack MCP server config
- Workspace contents reference (CLAUDE.md, scripts/, state/)
- Drop stale references to /process-transcript skill (hoisted into
  CLAUDE.md) and Microsoft Graph MCP server (replaced by direct REST)

Also drops the stale .tmp/onecli git submodule reference. The OneCLI
fork lives at .locki/include/onecli now.

Signed-off-by: Jan Pokorný <JenomPokorny@gmail.com>
- Wrap each meeting/transcript Graph call in try/except so a single
  failure (rate limit, transient 5xx, missing meeting) only loses that
  one item instead of aborting the whole run.
- Add timeout=30 to all urlopen calls so a stalled Graph response can't
  hang the agent indefinitely.
- Add .tmp/ to .gitignore — local-development scratch dir for temp
  clones, build outputs, etc., should never be committed.

Signed-off-by: Jan Pokorný <JenomPokorny@gmail.com>
@JanPokorny JanPokorny force-pushed the feat/onedrive-transcript#locki-rdyya8sn branch from 2e7f4eb to 841b64a Compare April 28, 2026 10:27
@xjacka
Copy link
Copy Markdown
Contributor

xjacka commented Apr 28, 2026

🛡️ Humr — Code Review

PR #352: feat: add Microsoft Graph provider

Author: JanPokorny | Branch: feat/onedrive-transcript#locki-rdyya8sn → main | Changes: +707 −3 (12 files)

Summary

Re-review of the onedrive-transcript agent. Two new commits since last review: a README rewrite and a fix commit that hardens fetch-new-transcripts.py with per-transcript error isolation and a global timeout constant, and adds .tmp/ to .gitignore.

Changes since last review

Previous HEAD: 2ce8fbc (2026-04-28T00:00:00Z) — verdict COMMENT

  • Fixed: .tmp/onecli accidental submodule — .tmp/ now in .gitignore and submodule removed from tree
  • Fixed: Per-transcript error handling in fetch-new-transcripts.pytry/except wraps each meeting resolution, transcript listing, and VTT download
  • Fixed: Missing timeouts in fetch-new-transcripts.pyTIMEOUT = 30 constant applied to all urlopen calls via graph_get and graph_get_bytes
  • Fixed: README/mark-processed.py cap inconsistency — both now say 20 entries
  • 🔁 Still present: uv install in Dockerfile is not version-pinned (curl https://astral.sh/uv/install.sh | sh)
  • 🔁 Still present: No tests for Python scripts

Findings

  • 🟡 Warning: uv is installed via curl https://astral.sh/uv/install.sh | sh with no version pin — different builds will silently pull different uv versions, risking unexpected parser/runtime behavior regressions. Pin by using the --version X.Y.Z flag or a versioned release URL. (packages/agents/onedrive-transcript/Dockerfile:6)
  • 🟢 Suggestion: In parse-vtt.py, multi-line cue text is joined before speaker extraction — if a single VTT cue contains two separate <v Speaker> lines (uncommon but valid), both speakers' speech is attributed to the first speaker. Splitting on <v boundaries before extraction would handle this edge case. (packages/agents/onedrive-transcript/workspace/work/scripts/parse-vtt.py:123)
  • 🟢 Suggestion: No tests for any of the three Python helper scripts. Even minimal tests for parse-vtt.py's BOM stripping, timestamp parsing, and speaker merging would prevent regressions. (packages/agents/onedrive-transcript/workspace/work/scripts/)

Documentation Check (doc-drift)

⚠️ doc-drift unavailable: clone of branch feat/onedrive-transcript#locki-rdyya8sn failed — remote authentication error (no repository credentials in this environment).

Verdict

COMMENT — Four of six prior findings resolved; uv version-pinning and missing tests remain. New parse-vtt.py is well-structured with good encoding fallback, but multi-speaker cue merging has a minor edge case.


Review by Humr · automated code guardian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OneDrive Teams transcript processing agent

2 participants