[FEATURE]: Improve summary stats report for string datatype columns

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Problem statement

The "profile" method uses the Databricks built-in "summary()" function to generate the summary statistics report. This function returns some predefined metrics that may not be suitable for text variables, especially ones associated with numeric types (mean,stddev, etc).  For some of them, the summary stats already produce Null values, while for others they still return results, such as the minimum and maximum metrics, as in the example below.

( 'work_order_id': {'count': 6,
  'mean': None,
  'stddev': None,
  'min': 'INVALID',
  '25%': None,
  '50%': None,
  '75%': None,
  'max': 'WO-005',
  'count_non_null': 6,
  'count_null': 0})

### Proposed Solution

One approach is to return null values for "min" and "max" variables too, once it seems a more consistent output for this datatype. Another approach would be to generate/add a set of specific metrics for low cardinality string datatypes, like "count_distinct","max_length","mean_length","median_length","min_length","histogram",etc.

### Additional Context

It would be useful to keep track of changes in the data source profile through time, and this data is available only in the summary_stats variable. I think that the variables metrics could be datatype driven.
  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE]: Improve summary stats report for string datatype columns #670

Is there an existing issue for this?

Problem statement

Proposed Solution

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEATURE]: Improve summary stats report for string datatype columns #670

Description

Is there an existing issue for this?

Problem statement

Proposed Solution

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions