Skip to content

Add data model mapping converter: edit data models in a spreadsheet#2071

Open
arielkr256 wants to merge 1 commit into
developfrom
data-model-mapping-converter
Open

Add data model mapping converter: edit data models in a spreadsheet#2071
arielkr256 wants to merge 1 commit into
developfrom
data-model-mapping-converter

Conversation

@arielkr256

@arielkr256 arielkr256 commented May 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds .scripts/mapping_converter/, a small tool that round-trips Panther Data Model YAMLs through a flat CSV. The CSV layout puts one row per unified field and one column pair ((Value), (Type)) per log type, which is the shape a spreadsheet is good at — side-by-side comparison across log types, find/replace across many fields at once, and pasting a fresh vendor's mappings as a new column.

The tool is framework-agnostic. Column 1 of the CSV holds the unified field name, and the script reads it by position — so the names can come from Sigma, OCSF, ECS, CIM, or any internal taxonomy. Nothing in the tool is Sigma-specific.

The net-new workflow this unlocks

Bootstrapping a new family of data models from a spreadsheet. Today, adding (for example) a set of IdP data models — Okta, OneLogin, JumpCloud — means hand-writing each *_data_model.yml plus its .py sibling, keeping the Filename: field in sync, and remembering the dual-file convention. With this tool the flow is:

  1. Author one CSV: column 1 is unified field names (in whatever framework you've picked), then (Value) / (Type) columns per IdP log type.
  2. Run csv2yml --output-dir data_models/idp_data_models/.
  3. The tool creates each *_data_model.yml and a matching stub .py with placeholder functions for every Method: mapping — so the dual-file rule is satisfied out of the gate and the model loads without ImportError. Fill in the TODO bodies, run make data-models-unit-test, commit.

The same CSV then becomes the source of truth for ongoing edits. Re-running csv2yml against the same --output-dir updates the existing YAMLs in place via ruamel.yaml round-trip mode: top-level comments, key ordering, quoting, and comments attached to retained mapping entries all survive. yml→csv→yml on the existing EDR data models is byte-identical.

Beyond bootstrapping, the spreadsheet view is useful for:

  • Auditing coverage gaps across vendors at a glance (empty cells = no mapping).
  • Bulk-renaming a unified field across every vendor at once.
  • Pasting a new framework's field names as column 1 to retarget existing mappings.

What's in the PR

  • mapping_converter.pyyml2csv and csv2yml subcommands. Uses panther_analysis_tool.analysis_utils.load_analysis_specs for discovery and ruamel.yaml for lossless write-back.
  • README.md — usage, CSV format, roundtrip guarantees, and a step-by-step "Bootstrapping a brand-new set of data models (e.g. IdPs)" section.
  • edr_mappings.csv — generated reference CSV covering the three existing EDR data models (CarbonBlack, CrowdStrike, SentinelOne), included as a worked example.

Lives under .scripts/ so PAT doesn't try to load it as a detection.

Adds .scripts/mapping_converter/, a framework-agnostic tool for round-tripping
Panther Data Model YAMLs through a flat CSV. Enables editing field mappings in
a spreadsheet and bootstrapping new data model families (EDR, IdP, etc.) from
a CSV of unified-field → vendor-field mappings authored against any framework
(Sigma, OCSF, ECS, internal taxonomy, …).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@arielkr256 arielkr256 requested a review from a team as a code owner May 22, 2026 15:50
@cursor

cursor Bot commented May 22, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Adds a standalone developer script that can generate/update data model YAMLs from CSV; risk is mainly accidental overwrites or incorrect mapping merges if used improperly, but it doesn’t affect runtime behavior unless invoked.

Overview
Introduces a new .scripts/mapping_converter utility that round-trips Panther data model Mappings between *_data_model.yml files and a spreadsheet-friendly CSV (yml2csv export and csv2yml import).

On import, the tool can update existing data model YAMLs in place (merging Mappings by Name while preserving formatting/comments via ruamel.yaml) or create new data model YAMLs for previously-unknown log types, emitting a matching stub *_data_model.py with placeholder functions for any Method mappings. A reference edr_mappings.csv is included as a worked example.

Reviewed by Cursor Bugbot for commit fdb6705. Bugbot is set up for automated code reviews on this repo. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit fdb6705. Configure here.

template_path, yml_data = template
print(f"Updating existing file for {log_type}: {template_path}")
self._merge_mappings(yml_data, mappings)
self._write_yml_file(template_path, yml_data)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated files written to template-dir instead of output-dir

Medium Severity

When --template-dir differs from --output-dir, csv_to_yml writes updated files back to template_path (which lives in --template-dir) rather than into --output-dir. The README example shows --output-dir ./new_models --template-dir data_models/edr_data_models, implying output lands in ./new_models, but existing files are instead modified in the template directory. The --template-dir help text describes it as a source of templates, not a write target, making this destructive in-place modification unexpected.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit fdb6705. Configure here.

TerminalSessionId,,,,,$.tgt_process_sessionId,Path
User,$.process_username,Path,$.event.UserName,Path,$.tgt_process_user,Path
_event_category,get_event_category,Method,,,get_event_category,Method
_os,get_os,Method,,,get_os,Method

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Included CSV missing four CrowdStrike FDR mappings

Medium Severity

The edr_mappings.csv is missing four Crowdstrike.FDREvent mappings that exist in crowdstrike_fdr_data_model.yml: ProcessName, TargetFilename, _event_category, and _os. Running csv2yml with this CSV against the CrowdStrike model would silently drop those mappings, since _merge_mappings removes entries absent from the CSV. This contradicts the PR's claim of byte-identical roundtrip fidelity.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit fdb6705. Configure here.

@arielkr256 arielkr256 added the data_models Updates or additions to Data Models label Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data_models Updates or additions to Data Models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant