An LLM Skill that enables systematic analysis of qualitative text data (interview transcripts, open-ended survey responses, field notes) to produce traceable code tables, themes, and curated quotes. Designed for rigorous, transparent analysis with participant labeling and reproducible outputs.
The skill produces Markdown files under <transcripts-root>/outputs/. Treat this
folder as your audit trail: every theme and summary is expected to remain
traceable to specific quotes.
The output files are:
participants.md: A mapping table of participant IDs (P1,P2, ...) to transcript filenames and per-participant output filenames.<participant-id>.md(for example,P1.md): A per-participant code table with original quotes and participant identifiers.final/codes.md: A merged code table across participants (generated byscripts/merge_codes.py).final/themes.md: A themes table that summarizes patterns across codes.final/findings.md: A narrative findings report with representative quotes.final/quote-check.md: A quote validation report (generated byscripts/validate_quotes.py).
To run a typical workflow:
-
Create a transcript folder and put your transcript files inside it. This folder is
<transcripts-root>. -
Start the analysis by providing
<transcripts-root>and (optionally) a participant list. If you do not provide labels, the skill assignsP1,P2,P3, ... in transcript order and writesoutputs/participants.md. -
Analyze transcripts one at a time. After each transcript, the skill writes a per-participant Markdown file (for example,
outputs/P1.md) that contains a code table. -
After all transcripts are coded, the skill will merge the per-participant code tables by script
scripts/merge_codes.py:-
Run:
python3 <skill-root>/scripts/merge_codes.py \ --outputs <transcripts-root>/outputs
-
-
Validate that quoted text appears in the original transcripts by script
scripts/validate_quotes.py:-
Run:
python3 <skill-root>/scripts/validate_quotes.py \ --outputs <transcripts-root>/outputs \ --transcripts-root <transcripts-root>
-
-
Create
outputs/final/themes.mdandoutputs/final/findings.mdas the final synthesis across participants.
This skill is designed to support rigorous qualitative analysis, but AI-assisted thematic analysis has important limitations. In most settings, an LLM is more reliable as an assistant analyst than as the sole analyst, especially for publishable qualitative work.
Key limitations to account for:
- Methodological fit varies by TA approach. Thematic analysis is a family of approaches. Reflexive thematic analysis emphasizes the researcher’s interpretive work and reflexivity, which does not map cleanly onto automated “coding as classification.” This can make fully automated, publication-grade reflexive TA difficult to justify without substantial human analytic work.
- Traceability can break without strict constraints. LLMs can produce plausible-sounding themes that are not sufficiently supported by the dataset, or introduce statements that are not present in the source text. Requiring quote-linked outputs and auditing “theme → quote → transcript” reduces (but does not eliminate) this risk.
- Context and nuance can be flattened. LLM summaries can over-generalize, merge distinct meanings, or miss contextual cues, especially for sensitive topics and small samples where misinterpretation costs are high.
- Empirical performance is setting-dependent. Research evaluating LLMs for coding and theme discovery suggests they can reach moderate to high agreement with humans in constrained settings (for example, codebook-guided or classification-framed tasks), but results vary by data, prompts, and evaluation design. Treat outputs as hypotheses to verify, not conclusions to accept uncritically.
- Reporting expectations still apply. If you are writing for academic audiences, you must disclose the model’s role and keep an audit trail that supports methodological transparency. Use qualitative reporting guidelines (for example, SRQR, COREQ) to ensure you describe data collection and analysis decisions clearly.
Practical safeguards that improve rigor:
- Use AI to propose initial codes, candidate themes, and retrieval support, then make final analytic decisions yourself.
- Enforce quote-grounding: every code/theme must include supporting quotes with participant IDs.
- Sample-review outputs, search for counterexamples, and document revisions.
- Re-run the same prompts/settings to check stability, and reconcile drift with explicit human judgment.
- Confirm data governance constraints before uploading sensitive transcripts to third-party systems.
The following resources discuss thematic analysis quality and AI-assisted qualitative research evaluation:
- Braun & Clarke: Resources for thematic analysis
- O’Brien et al. (2014): Standards for Reporting Qualitative Research (SRQR)
- COREQ (EQUATOR Network)
- Empirical evaluations of LLMs for coding/theme work (examples)