Skip to content

Conversation

@sgomezvillamor
Copy link
Contributor

Introduce custom SQLAlchemy-based profiler and integration with existing systems

  • Added a feature flag in ge_profiling_config.py to toggle the use of the custom SQLAlchemy profiler.
  • Updated snowflake_profiler.py, sql_common.py, and sql_generic_profiler.py to support the new profiler.
  • Implemented the custom SQLAlchemy profiler in datahub_sql_profiler.py with database-specific optimizations.
  • Created database handler utilities for SQL generation in database_handlers.py.
  • Added statistical calculation methods in stats_calculator.py.
  • Introduced temporary table handling in temp_table_handler.py.
  • Developed unit and integration tests for the new profiler and its components.
  • Documented migration plan for transitioning from Great Expectations to the custom profiler.

…ation with existing systems

- Added a feature flag in `ge_profiling_config.py` to toggle the use of the custom SQLAlchemy profiler.
- Updated `snowflake_profiler.py`, `sql_common.py`, and `sql_generic_profiler.py` to support the new profiler.
- Implemented the custom SQLAlchemy profiler in `datahub_sql_profiler.py` with database-specific optimizations.
- Created database handler utilities for SQL generation in `database_handlers.py`.
- Added statistical calculation methods in `stats_calculator.py`.
- Introduced temporary table handling in `temp_table_handler.py`.
- Developed unit and integration tests for the new profiler and its components.
- Documented migration plan for transitioning from Great Expectations to the custom profiler.
@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Dec 18, 2025
if custom_sql:
bq_sql = custom_sql
else:
bq_sql = f"SELECT * FROM `{table}`"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential SQL injection via string-based query concatenation - critical severity
SQL injection might be possible in these locations, especially if the strings being concatenated are controlled via user input.

Remediation: If possible, rebuild the query to use prepared statements or an ORM. If that is not possible, make sure the user input is verified or sanitized. As an added layer of protection, we also recommend installing a WAF that blocks SQL injection attacks.
View details in Aikido Security

sample_pc = 100 * self.config.sample_size / row_count
table_ref = f"{schema}.{table}" if schema else table
sql = (
f"SELECT * FROM `{table_ref}` "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential SQL injection via string-based query concatenation - critical severity
SQL injection might be possible in these locations, especially if the strings being concatenated are controlled via user input.

Remediation: If possible, rebuild the query to use prepared statements or an ORM. If that is not possible, make sure the user input is verified or sanitized. As an added layer of protection, we also recommend installing a WAF that blocks SQL injection attacks.
View details in Aikido Security

schema = table.schema or "public"
query = sa.text(
f"SELECT reltuples::bigint FROM pg_class "
f"WHERE oid = '{schema}.{table.name}'::regclass"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential SQL injection via string-based query concatenation - critical severity
SQL injection might be possible in these locations, especially if the strings being concatenated are controlled via user input.

Remediation: If possible, rebuild the query to use prepared statements or an ORM. If that is not possible, make sure the user input is verified or sanitized. As an added layer of protection, we also recommend installing a WAF that blocks SQL injection attacks.
View details in Aikido Security

schema = table.schema or "information_schema"
query = sa.text(
f"SELECT table_rows FROM information_schema.tables "
f"WHERE table_schema = '{schema}' AND table_name = '{table.name}'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential SQL injection via string-based query concatenation - critical severity
SQL injection might be possible in these locations, especially if the strings being concatenated are controlled via user input.

Remediation: If possible, rebuild the query to use prepared statements or an ORM. If that is not possible, make sure the user input is verified or sanitized. As an added layer of protection, we also recommend installing a WAF that blocks SQL injection attacks.
View details in Aikido Security

@codecov
Copy link

codecov bot commented Dec 18, 2025

Codecov Report

❌ Patch coverage is 70.16807% with 213 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...estion/source/sql_profiler/datahub_sql_profiler.py 65.86% 113 Missing ⚠️
...ngestion/source/sql_profiler/temp_table_handler.py 12.28% 50 Missing ⚠️
.../ingestion/source/sql_profiler/stats_calculator.py 91.03% 13 Missing ⚠️
...ingestion/source/sql_profiler/database_handlers.py 86.04% 12 Missing ⚠️
...src/datahub/ingestion/source/sql_profiler/utils.py 52.00% 12 Missing ⚠️
...ahub/ingestion/source/sql_profiler/type_mapping.py 90.19% 5 Missing ⚠️
...b/ingestion/source/snowflake/snowflake_profiler.py 20.00% 4 Missing ⚠️
...tahub/ingestion/source/sql/sql_generic_profiler.py 20.00% 4 Missing ⚠️

❌ Your patch check has failed because the patch coverage (70.16%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

📢 Thoughts on this report? Let us know!

@alwaysmeticulous
Copy link

alwaysmeticulous bot commented Dec 18, 2025

✅ Meticulous spotted 0 visual differences across 984 screens tested: view results.

Meticulous evaluated ~8 hours of user flows against your PR.

Expected differences? Click here. Last updated for commit 9fbe153. This comment will update as new commits are pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants