Skip to content

jarvisluk/thematic-analysis-skill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Thematic Analysis Skill

An LLM Skill that enables systematic analysis of qualitative text data (interview transcripts, open-ended survey responses, field notes) to produce traceable code tables, themes, and curated quotes. Designed for rigorous, transparent analysis with participant labeling and reproducible outputs.

Outputs

The skill produces Markdown files under <transcripts-root>/outputs/. Treat this folder as your audit trail: every theme and summary is expected to remain traceable to specific quotes.

The output files are:

  • participants.md: A mapping table of participant IDs (P1, P2, ...) to transcript filenames and per-participant output filenames.
  • <participant-id>.md (for example, P1.md): A per-participant code table with original quotes and participant identifiers.
  • final/codes.md: A merged code table across participants (generated by scripts/merge_codes.py).
  • final/themes.md: A themes table that summarizes patterns across codes.
  • final/findings.md: A narrative findings report with representative quotes.
  • final/quote-check.md: A quote validation report (generated by scripts/validate_quotes.py).

How to use

To run a typical workflow:

  1. Create a transcript folder and put your transcript files inside it. This folder is <transcripts-root>.

  2. Start the analysis by providing <transcripts-root> and (optionally) a participant list. If you do not provide labels, the skill assigns P1, P2, P3, ... in transcript order and writes outputs/participants.md.

  3. Analyze transcripts one at a time. After each transcript, the skill writes a per-participant Markdown file (for example, outputs/P1.md) that contains a code table.

  4. After all transcripts are coded, the skill will merge the per-participant code tables by script scripts/merge_codes.py:

    • Run:

      python3 <skill-root>/scripts/merge_codes.py \
        --outputs <transcripts-root>/outputs
      
  5. Validate that quoted text appears in the original transcripts by script scripts/validate_quotes.py:

    • Run:

      python3 <skill-root>/scripts/validate_quotes.py \
        --outputs <transcripts-root>/outputs \
        --transcripts-root <transcripts-root>
      
  6. Create outputs/final/themes.md and outputs/final/findings.md as the final synthesis across participants.

Limitations of AI-assisted thematic analysis

This skill is designed to support rigorous qualitative analysis, but AI-assisted thematic analysis has important limitations. In most settings, an LLM is more reliable as an assistant analyst than as the sole analyst, especially for publishable qualitative work.

Key limitations to account for:

  • Methodological fit varies by TA approach. Thematic analysis is a family of approaches. Reflexive thematic analysis emphasizes the researcher’s interpretive work and reflexivity, which does not map cleanly onto automated “coding as classification.” This can make fully automated, publication-grade reflexive TA difficult to justify without substantial human analytic work.
  • Traceability can break without strict constraints. LLMs can produce plausible-sounding themes that are not sufficiently supported by the dataset, or introduce statements that are not present in the source text. Requiring quote-linked outputs and auditing “theme → quote → transcript” reduces (but does not eliminate) this risk.
  • Context and nuance can be flattened. LLM summaries can over-generalize, merge distinct meanings, or miss contextual cues, especially for sensitive topics and small samples where misinterpretation costs are high.
  • Empirical performance is setting-dependent. Research evaluating LLMs for coding and theme discovery suggests they can reach moderate to high agreement with humans in constrained settings (for example, codebook-guided or classification-framed tasks), but results vary by data, prompts, and evaluation design. Treat outputs as hypotheses to verify, not conclusions to accept uncritically.
  • Reporting expectations still apply. If you are writing for academic audiences, you must disclose the model’s role and keep an audit trail that supports methodological transparency. Use qualitative reporting guidelines (for example, SRQR, COREQ) to ensure you describe data collection and analysis decisions clearly.

Practical safeguards that improve rigor:

  • Use AI to propose initial codes, candidate themes, and retrieval support, then make final analytic decisions yourself.
  • Enforce quote-grounding: every code/theme must include supporting quotes with participant IDs.
  • Sample-review outputs, search for counterexamples, and document revisions.
  • Re-run the same prompts/settings to check stability, and reconcile drift with explicit human judgment.
  • Confirm data governance constraints before uploading sensitive transcripts to third-party systems.

References

The following resources discuss thematic analysis quality and AI-assisted qualitative research evaluation:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages