Skip to content

Conversation

@haggit-eliyahu
Copy link
Contributor

@haggit-eliyahu haggit-eliyahu commented Jan 14, 2026


Description

  • excluded all of the regex-violating strings from commercial integrations
  • added a validate_param_description validator to exclude commercial integration's parameters with too long description fields
  • added a default to "ShouldInstalledInSystem" field because manny integrations misses it.
  • increased the maximum allowed amount of words in parameter names

Checklist:

Please ensure you have completed the following items before submitting your PR.
This helps us review your contribution faster and more efficiently.

General Checks:

  • I have read and followed the project's contributing.md guide.
  • My code follows the project's coding style guidelines.
  • I have performed a self-review of my own code.
  • My changes do not introduce any new warnings.
  • My changes pass all existing tests.
  • I have added new tests where appropriate to cover my changes. (If applicable)
  • I have updated the documentation where necessary (e.g., README, API docs). (If applicable)

Open-Source Specific Checks:

  • My changes do not introduce any Personally Identifiable Information (PII) or sensitive customer data.
  • My changes do not expose any internal-only code examples, configurations, or URLs.
  • All code examples, comments, and messages are generic and suitable for a public repository.
  • I understand that any internal context or sensitive details related to this work are handled separately in internal systems (Buganizer for Google team members).

For Google Team Members and Reviewers Only:

  • I have included the Buganizer ID in the PR title or description (e.g., "Internal Buganizer ID: 123456789" or "Related Buganizer: go/buganizer/123456789").
  • I have ensured that all internal discussions and PII related to this work remain in Buganizer.
  • I have tagged the PR with one or more labels that reflect the pull request purpose.

Screenshots (If Applicable)

If your changes involve UI or visual elements, please include screenshots or GIFs here.
Ensure any sensitive data is redacted or generalized.


Further Comments / Questions

Any additional comments, questions, or areas where you'd like specific feedback.

@haggit-eliyahu haggit-eliyahu requested a review from a team as a code owner January 14, 2026 12:27
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @haggit-eliyahu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the data validation mechanisms within the mp package, specifically targeting commercial integration metadata. The changes introduce a dedicated validator for parameter descriptions, provide a default for a previously optional field, and broaden the scope of regular expression exclusions. These updates collectively aim to resolve Pydantic validation issues for existing data, improve data integrity, and offer greater flexibility in defining integration parameters without compromising system stability.

Highlights

  • Enhanced Parameter Description Validation: A new validation function, validate_param_description, has been introduced to enforce length constraints on parameter descriptions. This validator is now applied across various data models for actions, connectors, integrations, and jobs, ensuring descriptions adhere to defined maximum lengths while allowing specific existing long descriptions to be excluded from this check.
  • Improved Handling of Missing Integration Fields: The ShouldInstalledInSystem field in IntegrationMetadata now defaults to False if it is not explicitly provided, preventing potential errors when processing integration metadata that might omit this field.
  • Increased Flexibility for Parameter Names: The maximum allowed number of words for parameter names (PARAM_NAME_MAX_WORDS) has been increased from 7 to 15, providing more flexibility for descriptive parameter naming.
  • Expanded Regex Exclusions for Commercial Integrations: The regular expressions for SCRIPT_DISPLAY_NAME_REGEX and SCRIPT_IDENTIFIER_REGEX have been significantly expanded to include numerous existing commercial integration names and parameter identifiers that previously violated Pydantic validation rules, thus preventing validation failures for these established entries.
  • Package Version Update: The mp package version has been updated from 1.25.0 to 1.25.4.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@haggit-eliyahu haggit-eliyahu added the mp mp related change label Jan 14, 2026
@haggit-eliyahu haggit-eliyahu force-pushed the exclude_existing_pydantic_violations branch from 508960e to 02aa5f2 Compare January 14, 2026 12:29
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several fixes to handle validation issues with existing commercial integrations. This includes excluding certain parameter descriptions from length validation, adding numerous exceptions to regex patterns for script and parameter names, increasing the maximum word count for parameter names, and providing a default value for a missing field. The approach of adding exceptions is pragmatic for dealing with legacy data. The implementation of the new validator and its application across various data models is well-executed. I have identified a potential bug in one of the new regex patterns and some redundant entries that should be cleaned up for better maintainability.

I am having trouble creating individual review comments. Click here to see my feedback.

packages/mp/src/mp/core/constants.py (476)

high

This regex pattern appears to contain a typo and might be the result of a copy-paste error.

  1. The part Google Rapid Response \(GR$ seems incomplete and is likely intended to be Google Rapid Response \(GRR\). However, an exclusion for Google Rapid Response (GRR) already exists on line 468.
  2. The second part of the alternation, Tenable\.io - List Endpoint Vulnerabilities$, is missing a ^ at the beginning, which means it would match any string ending with that text, not the exact string. This is inconsistent with other patterns in this list.

This looks like it should just be an exclusion for the Tenable script name.

    r"|^Tenable\.io - List Endpoint Vulnerabilities$"

packages/mp/src/mp/core/constants.py (560-606)

medium

There are several duplicated patterns in this list of exceptions for PARAM_DISPLAY_NAME_REGEX. While this doesn't break the functionality of the regular expression, it adds unnecessary clutter and makes the list harder to read and maintain.

For example:

  • r"|^Fetch\ Backwards\ Time\ Interval\ \(minutes\)$\" appears 5 times.
  • r"|^Extract\ urls\ from\ HTML\ email\ part\?\$" appears 3 times.
  • r"|^Create\ a\ Separate\ Siemplify\ Alert\ per\ Attached\ Mail\ File\?\$" appears 3 times.

Please remove the duplicate entries to improve code clarity.

# Conflicts:
#	packages/mp/pyproject.toml
#	packages/mp/uv.lock
@haggit-eliyahu haggit-eliyahu force-pushed the exclude_existing_pydantic_violations branch from 02aa5f2 to 51d4d16 Compare January 14, 2026 12:41
@haggit-eliyahu
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix validation issues for commercial integrations by adding exclusion lists and a new validator for parameter descriptions. The changes correctly apply the new validator for various parameter types. However, there's a critical issue where this new parameter-specific validator is incorrectly applied to action descriptions, which have a different length constraint, potentially breaking validation for valid data. Additionally, the growing exclusion lists added to the constants file are a maintainability concern; I've suggested moving them to separate data files to keep the codebase cleaner.

Comment on lines +416 to +467
EXCLUDED_LONG_PARAM_DESCRIPTION_PREFIXES: set[str] = {
"\t\n\nIf provided, the connector will use this value for Siemplify Rule Generator. Please re",
"A comma separated list CSV encoding types used for decoding your CSV files, e.g. utf-8, lati",
"A comma-separated string of email headers to add to Google SecOps events, such as “DKIM-Siga",
"A custom alert name. You can provide placeholders in the following format: [name of the fiel",
"A custom case name. When you configure this parameter, the connector adds a new key called c",
"A custom rule generator.\nYou can use placeholders in the format [field_name], for example: ",
"A custom rule generator. You can provide placeholders in the following format: [name of the ",
"A filter condition that specifies the email labels to search for. This parameter accepts mul",
"A regular expression pattern to run on the value found in the Environment Field Name field. ",
'A regular expression pattern to run on the value found in the "Environment Field Name" field',
"By default, the search will be executed in the default mailbox specified in the integration ",
"Client email of your service account. You can configure either this parameter or the User Se",
"Comma separated. e.g. customer.combo_name,category.sym,status.sym,priority.sym,active,log_ag",
"End date of the search. Search will return only records equal or before this point in time.",
"Grouping mechanism that will be used to create Siemplify Alerts. Possible values: Host, ",
"If defined - connector will extract the environment from the specified event field. You can ",
"If provided, connector will use this value for Alert Name. Please refer to the documentation",
"If provided, connector will use this value for Siemplify Alert Name. Please refer to the doc",
"If provided, connector will use this value for Google Secops Alert Name. Please refer to the",
"If provided, the connector uses this value for Chronicle SOAR",
"If specified, connector will use this value from the Microsoft Azure Sentinel API response f",
"Number of days before the first connector iteration to retrieve vulnerabilities from. This p",
"Optional. Specify custom query parameter you want to add to the list users search call. For ",
"Provide a delimiter character, with which the action will split the input it gets into a num",
"Search field for free text queries (When query doesn't specify a field name).",
"Search pattern for a elastic index.\r\nIn elastic, index is like a DatabaseName, and data is",
"Specify a comma separated list of alert attributes that should be used as a fallback for the",
"Specify a comma separated list of incident or alert attributes that should be used as a fall",
"Specify a comma-separated list of engines that should be used to retrieve information, wheth",
"Specify a comma-separated list of fields to return. Example of values:assetType,project,fold",
"Specify a comma-separated list of the event types that need to be returned. If nothing is pr",
"Specify a limit for how many events for a single offense connector should query from Qradar ",
'Specify a time frame for the results. If "Alert Time Till Now" is selected, action will use ',
'Specify a time frame for the results. If "Custom" is selected, you also need to provide "Sta',
"Specify a time frame for the results. If “Alert Time Till Now” is selected, action will use ",
"Specify the amount of time in minutes to pass before the connector will try to fetch events ",
"Specify the filter to fetch the recommendations for. Parameter expects a string of a format ",
"Specify the query that needs to be executed. Note: the query should follow a strict pattern ",
"Specify the query that needs to be executed. Note: this query should follow a strict pattern",
"Specify the time frame for the search. Only hours and days are supported. Note: end time wil",
'Specify the wait mode for the action. If "Until Timeout" is selected, action will wait until',
"Specify what attributes need to be used, when the action is to search for similar alerts. If",
'Specify what selection should be used for users. If "From Entities & User Identifiers" is se',
"Start date of the search. Search will return only records equal or after this point in time.",
"The client email address of your workload identity. You can configure either this parameter ",
"The conditions that are required for the custom fields for the action to resume running a pl",
"The content of the service account key JSON file. You can configure either this parameter or",
"The number of days for the action to wait before refreshing the entity summary. The action g",
'The search query to perform. It is in Lucene syntax.\r\nIE1: "*" (this is a wildcard that wi',
'When provided, connector will add a new key called "custom_case_name" to the',
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These large exclusion lists (EXCLUDED_LONG_PARAM_DESCRIPTION_PREFIXES, and the additions to SCRIPT_DISPLAY_NAME_REGEX and PARAM_DISPLAY_NAME_REGEX) make the constants.py file difficult to read and maintain.

To improve this, I recommend moving these lists to separate data files (e.g., YAML or JSON) and loading them at runtime. This will keep the code cleaner and make the exclusion lists easier to manage.

For example, you could have a file long_description_exclusions.json:

[
    "\\t\\n\\nIf provided, the connector will use this value for Siemplify Rule Generator. Please re",
    "A comma separated list CSV encoding types used for decoding your CSV files, e.g. utf-8, lati",
    "..."
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TalShafir1 what do you say? is it necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will help maintaining the constants module. loading all of them as constants all the time is also a waste. We can definitely store these string in text, or yaml files instead and use them only when needed, or at least move them to a different module like exclusions.py or something

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mp mp related change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants