-
Notifications
You must be signed in to change notification settings - Fork 3.3k
WIP feat(profiler): introduce custom SQLAlchemy-based profiler #15723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ation with existing systems - Added a feature flag in `ge_profiling_config.py` to toggle the use of the custom SQLAlchemy profiler. - Updated `snowflake_profiler.py`, `sql_common.py`, and `sql_generic_profiler.py` to support the new profiler. - Implemented the custom SQLAlchemy profiler in `datahub_sql_profiler.py` with database-specific optimizations. - Created database handler utilities for SQL generation in `database_handlers.py`. - Added statistical calculation methods in `stats_calculator.py`. - Introduced temporary table handling in `temp_table_handler.py`. - Developed unit and integration tests for the new profiler and its components. - Documented migration plan for transitioning from Great Expectations to the custom profiler.
| if custom_sql: | ||
| bq_sql = custom_sql | ||
| else: | ||
| bq_sql = f"SELECT * FROM `{table}`" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential SQL injection via string-based query concatenation - critical severity
SQL injection might be possible in these locations, especially if the strings being concatenated are controlled via user input.
Remediation: If possible, rebuild the query to use prepared statements or an ORM. If that is not possible, make sure the user input is verified or sanitized. As an added layer of protection, we also recommend installing a WAF that blocks SQL injection attacks.
View details in Aikido Security
| sample_pc = 100 * self.config.sample_size / row_count | ||
| table_ref = f"{schema}.{table}" if schema else table | ||
| sql = ( | ||
| f"SELECT * FROM `{table_ref}` " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential SQL injection via string-based query concatenation - critical severity
SQL injection might be possible in these locations, especially if the strings being concatenated are controlled via user input.
Remediation: If possible, rebuild the query to use prepared statements or an ORM. If that is not possible, make sure the user input is verified or sanitized. As an added layer of protection, we also recommend installing a WAF that blocks SQL injection attacks.
View details in Aikido Security
| schema = table.schema or "public" | ||
| query = sa.text( | ||
| f"SELECT reltuples::bigint FROM pg_class " | ||
| f"WHERE oid = '{schema}.{table.name}'::regclass" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential SQL injection via string-based query concatenation - critical severity
SQL injection might be possible in these locations, especially if the strings being concatenated are controlled via user input.
Remediation: If possible, rebuild the query to use prepared statements or an ORM. If that is not possible, make sure the user input is verified or sanitized. As an added layer of protection, we also recommend installing a WAF that blocks SQL injection attacks.
View details in Aikido Security
| schema = table.schema or "information_schema" | ||
| query = sa.text( | ||
| f"SELECT table_rows FROM information_schema.tables " | ||
| f"WHERE table_schema = '{schema}' AND table_name = '{table.name}'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential SQL injection via string-based query concatenation - critical severity
SQL injection might be possible in these locations, especially if the strings being concatenated are controlled via user input.
Remediation: If possible, rebuild the query to use prepared statements or an ORM. If that is not possible, make sure the user input is verified or sanitized. As an added layer of protection, we also recommend installing a WAF that blocks SQL injection attacks.
View details in Aikido Security
|
✅ Meticulous spotted 0 visual differences across 984 screens tested: view results. Meticulous evaluated ~8 hours of user flows against your PR. Expected differences? Click here. Last updated for commit 9fbe153. This comment will update as new commits are pushed. |
Introduce custom SQLAlchemy-based profiler and integration with existing systems
ge_profiling_config.pyto toggle the use of the custom SQLAlchemy profiler.snowflake_profiler.py,sql_common.py, andsql_generic_profiler.pyto support the new profiler.datahub_sql_profiler.pywith database-specific optimizations.database_handlers.py.stats_calculator.py.temp_table_handler.py.