Add data model mapping converter: edit data models in a spreadsheet by arielkr256 · Pull Request #2071 · panther-labs/panther-analysis

arielkr256 · 2026-05-22T15:50:11Z

Summary

Adds .scripts/mapping_converter/, a small tool that round-trips Panther Data Model YAMLs through a flat CSV. The CSV layout puts one row per unified field and one column pair ((Value), (Type)) per log type, which is the shape a spreadsheet is good at — side-by-side comparison across log types, find/replace across many fields at once, and pasting a fresh vendor's mappings as a new column.

The tool is framework-agnostic. Column 1 of the CSV holds the unified field name, and the script reads it by position — so the names can come from Sigma, OCSF, ECS, CIM, or any internal taxonomy. Nothing in the tool is Sigma-specific.

The net-new workflow this unlocks

Bootstrapping a new family of data models from a spreadsheet. Today, adding (for example) a set of IdP data models — Okta, OneLogin, JumpCloud — means hand-writing each *_data_model.yml plus its .py sibling, keeping the Filename: field in sync, and remembering the dual-file convention. With this tool the flow is:

Author one CSV: column 1 is unified field names (in whatever framework you've picked), then (Value) / (Type) columns per IdP log type.
Run csv2yml --output-dir data_models/idp_data_models/.
The tool creates each *_data_model.yml and a matching stub .py with placeholder functions for every Method: mapping — so the dual-file rule is satisfied out of the gate and the model loads without ImportError. Fill in the TODO bodies, run make data-models-unit-test, commit.

The same CSV then becomes the source of truth for ongoing edits. Re-running csv2yml against the same --output-dir updates the existing YAMLs in place via ruamel.yaml round-trip mode: top-level comments, key ordering, quoting, and comments attached to retained mapping entries all survive. yml→csv→yml on the existing EDR data models is byte-identical.

Beyond bootstrapping, the spreadsheet view is useful for:

Auditing coverage gaps across vendors at a glance (empty cells = no mapping).
Bulk-renaming a unified field across every vendor at once.
Pasting a new framework's field names as column 1 to retarget existing mappings.

What's in the PR

mapping_converter.py — yml2csv and csv2yml subcommands. Uses panther_analysis_tool.analysis_utils.load_analysis_specs for discovery and ruamel.yaml for lossless write-back.
README.md — usage, CSV format, roundtrip guarantees, and a step-by-step "Bootstrapping a brand-new set of data models (e.g. IdPs)" section.
edr_mappings.csv — generated reference CSV covering the three existing EDR data models (CarbonBlack, CrowdStrike, SentinelOne), included as a worked example.

Lives under .scripts/ so PAT doesn't try to load it as a detection.

Adds .scripts/mapping_converter/, a framework-agnostic tool for round-tripping Panther Data Model YAMLs through a flat CSV. Enables editing field mappings in a spreadsheet and bootstrapping new data model families (EDR, IdP, etc.) from a CSV of unified-field → vendor-field mappings authored against any framework (Sigma, OCSF, ECS, internal taxonomy, …). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor · 2026-05-22T15:50:19Z

PR Summary

Low Risk
Adds a standalone developer script that can generate/update data model YAMLs from CSV; risk is mainly accidental overwrites or incorrect mapping merges if used improperly, but it doesn’t affect runtime behavior unless invoked.

Overview
Introduces a new .scripts/mapping_converter utility that round-trips Panther data model Mappings between *_data_model.yml files and a spreadsheet-friendly CSV (yml2csv export and csv2yml import).

On import, the tool can update existing data model YAMLs in place (merging Mappings by Name while preserving formatting/comments via ruamel.yaml) or create new data model YAMLs for previously-unknown log types, emitting a matching stub *_data_model.py with placeholder functions for any Method mappings. A reference edr_mappings.csv is included as a worked example.

^{Reviewed by Cursor Bugbot for commit fdb6705. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit fdb6705. Configure here.}

cursor · 2026-05-22T15:58:15Z

+                    template_path, yml_data = template
+                    print(f"Updating existing file for {log_type}: {template_path}")
+                    self._merge_mappings(yml_data, mappings)
+                    self._write_yml_file(template_path, yml_data)


Updated files written to template-dir instead of output-dir

Medium Severity

When --template-dir differs from --output-dir, csv_to_yml writes updated files back to template_path (which lives in --template-dir) rather than into --output-dir. The README example shows --output-dir ./new_models --template-dir data_models/edr_data_models, implying output lands in ./new_models, but existing files are instead modified in the template directory. The --template-dir help text describes it as a source of templates, not a write target, making this destructive in-place modification unexpected.

Additional Locations (1)

.scripts/mapping_converter/README.md#L56-L61

^{Reviewed by Cursor Bugbot for commit fdb6705. Configure here.}

cursor · 2026-05-22T15:58:15Z

+TerminalSessionId,,,,,$.tgt_process_sessionId,Path
+User,$.process_username,Path,$.event.UserName,Path,$.tgt_process_user,Path
+_event_category,get_event_category,Method,,,get_event_category,Method
+_os,get_os,Method,,,get_os,Method


Included CSV missing four CrowdStrike FDR mappings

Medium Severity

The edr_mappings.csv is missing four Crowdstrike.FDREvent mappings that exist in crowdstrike_fdr_data_model.yml: ProcessName, TargetFilename, _event_category, and _os. Running csv2yml with this CSV against the CrowdStrike model would silently drop those mappings, since _merge_mappings removes entries absent from the CSV. This contradicts the PR's claim of byte-identical roundtrip fidelity.

^{Reviewed by Cursor Bugbot for commit fdb6705. Configure here.}

arielkr256 requested a review from a team as a code owner May 22, 2026 15:50

cursor Bot reviewed May 22, 2026

View reviewed changes

arielkr256 added the data_models Updates or additions to Data Models label Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data model mapping converter: edit data models in a spreadsheet#2071

Add data model mapping converter: edit data models in a spreadsheet#2071
arielkr256 wants to merge 1 commit into
developfrom
data-model-mapping-converter

arielkr256 commented May 22, 2026 •

edited

Loading

Uh oh!

cursor Bot commented May 22, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 22, 2026

Uh oh!

cursor Bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arielkr256 commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The net-new workflow this unlocks

What's in the PR

Uh oh!

cursor Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 22, 2026

Choose a reason for hiding this comment

Updated files written to template-dir instead of output-dir

Uh oh!

cursor Bot May 22, 2026

Choose a reason for hiding this comment

Included CSV missing four CrowdStrike FDR mappings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

arielkr256 commented May 22, 2026 •

edited

Loading

cursor Bot commented May 22, 2026 •

edited

Loading