Add data model mapping converter: edit data models in a spreadsheet#2071
Add data model mapping converter: edit data models in a spreadsheet#2071arielkr256 wants to merge 1 commit into
Conversation
Adds .scripts/mapping_converter/, a framework-agnostic tool for round-tripping Panther Data Model YAMLs through a flat CSV. Enables editing field mappings in a spreadsheet and bootstrapping new data model families (EDR, IdP, etc.) from a CSV of unified-field → vendor-field mappings authored against any framework (Sigma, OCSF, ECS, internal taxonomy, …). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR SummaryLow Risk Overview On import, the tool can update existing data model YAMLs in place (merging Reviewed by Cursor Bugbot for commit fdb6705. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit fdb6705. Configure here.
| template_path, yml_data = template | ||
| print(f"Updating existing file for {log_type}: {template_path}") | ||
| self._merge_mappings(yml_data, mappings) | ||
| self._write_yml_file(template_path, yml_data) |
There was a problem hiding this comment.
Updated files written to template-dir instead of output-dir
Medium Severity
When --template-dir differs from --output-dir, csv_to_yml writes updated files back to template_path (which lives in --template-dir) rather than into --output-dir. The README example shows --output-dir ./new_models --template-dir data_models/edr_data_models, implying output lands in ./new_models, but existing files are instead modified in the template directory. The --template-dir help text describes it as a source of templates, not a write target, making this destructive in-place modification unexpected.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit fdb6705. Configure here.
| TerminalSessionId,,,,,$.tgt_process_sessionId,Path | ||
| User,$.process_username,Path,$.event.UserName,Path,$.tgt_process_user,Path | ||
| _event_category,get_event_category,Method,,,get_event_category,Method | ||
| _os,get_os,Method,,,get_os,Method |
There was a problem hiding this comment.
Included CSV missing four CrowdStrike FDR mappings
Medium Severity
The edr_mappings.csv is missing four Crowdstrike.FDREvent mappings that exist in crowdstrike_fdr_data_model.yml: ProcessName, TargetFilename, _event_category, and _os. Running csv2yml with this CSV against the CrowdStrike model would silently drop those mappings, since _merge_mappings removes entries absent from the CSV. This contradicts the PR's claim of byte-identical roundtrip fidelity.
Reviewed by Cursor Bugbot for commit fdb6705. Configure here.


Summary
Adds
.scripts/mapping_converter/, a small tool that round-trips Panther Data Model YAMLs through a flat CSV. The CSV layout puts one row per unified field and one column pair ((Value),(Type)) per log type, which is the shape a spreadsheet is good at — side-by-side comparison across log types, find/replace across many fields at once, and pasting a fresh vendor's mappings as a new column.The tool is framework-agnostic. Column 1 of the CSV holds the unified field name, and the script reads it by position — so the names can come from Sigma, OCSF, ECS, CIM, or any internal taxonomy. Nothing in the tool is Sigma-specific.
The net-new workflow this unlocks
Bootstrapping a new family of data models from a spreadsheet. Today, adding (for example) a set of IdP data models — Okta, OneLogin, JumpCloud — means hand-writing each
*_data_model.ymlplus its.pysibling, keeping theFilename:field in sync, and remembering the dual-file convention. With this tool the flow is:(Value)/(Type)columns per IdP log type.csv2yml --output-dir data_models/idp_data_models/.*_data_model.ymland a matching stub.pywith placeholder functions for everyMethod:mapping — so the dual-file rule is satisfied out of the gate and the model loads withoutImportError. Fill in theTODObodies, runmake data-models-unit-test, commit.The same CSV then becomes the source of truth for ongoing edits. Re-running
csv2ymlagainst the same--output-dirupdates the existing YAMLs in place viaruamel.yamlround-trip mode: top-level comments, key ordering, quoting, and comments attached to retained mapping entries all survive. yml→csv→yml on the existing EDR data models is byte-identical.Beyond bootstrapping, the spreadsheet view is useful for:
What's in the PR
mapping_converter.py—yml2csvandcsv2ymlsubcommands. Usespanther_analysis_tool.analysis_utils.load_analysis_specsfor discovery andruamel.yamlfor lossless write-back.README.md— usage, CSV format, roundtrip guarantees, and a step-by-step "Bootstrapping a brand-new set of data models (e.g. IdPs)" section.edr_mappings.csv— generated reference CSV covering the three existing EDR data models (CarbonBlack, CrowdStrike, SentinelOne), included as a worked example.Lives under
.scripts/so PAT doesn't try to load it as a detection.