Skip to content

[RFC] Field Statistics #10614

@paulstn

Description

@paulstn

Field Statistics in Explore

Background

The Field Statistics feature is a new addition to OpenSearchDashboards' Explore plugin that provides users with comprehensive statistical information about fields in their indices. This feature addresses the need for users to quickly understand the composition and distribution of their data without writing complex queries.

The feature will be implemented as a new tab in the Explore interface, similar to the existing Logs, Patterns, and Visualization tabs. It will display a table showing field names, document counts, distinct value counts, and additional type-specific statistics in an expandable row format.

Overview

Component Structure

src/plugins/explore/public/components/
├── tabs/
│   └── field_stats_tab.tsx              # Main tab component
└── field_stats/
    ├── field_stats_container.tsx        # Data fetching and state management
    ├── field_stats_table.tsx            # Table component with column definitions
    ├── field_stats_row_details.tsx      # Expandable row details
    ├── field_stats_queries.ts           # PPL query generation functions
    ├── field_stats_types.ts             # TypeScript interfaces and types
    └── detail_sections/
        ├── top_values_section.tsx       # Top values detail component
        ├── numeric_summary_section.tsx  # Numeric statistics component
        └── date_range_section.tsx       # Date range detail component

Tab Registration

The Field Stats tab will be registered in register_tabs.ts:

tabRegistry.registerTab({
  id: EXPLORE_FIELD_STATS_TAB_ID,
  label: 'Field Stats',
  flavor: [ExploreFlavor.Logs, ExploreFlavor.Metrics],
  order: 25, // After Visualization tab
  supportedLanguages: [EXPLORE_DEFAULT_LANGUAGE],
  component: FieldStatsTab,
});

PPL Query Design

Field Statistics Query Strategy

When the tab is opened, we don't need a query to find all of the fields. Instead, we'll use the existing Redux selector selectDataset to get the current index pattern and then use getIndexPatternFieldList to get all fields.

Getting Fields Without Query

const dataset = useSelector(selectDataset);
const fields = useMemo(() => {
  return getIndexPatternFieldList(dataset, {});
}, [dataset]);

Basic Field Statistics Query

For each field, we'll fetch the following information:

source = <index>
| stats count(`<field>`) as count,
        dc(`<field>`) as dc,
        count() as total_count
| eval percentage_total = (count * 100.0) / total_count

Expanded Row Statistics

When a row is expanded, we display additional information based on field type. Different PPL queries are used for:

  • String/IP/Boolean Fields: Top values query
  • Number Fields: Top values + summary statistics (min, median, avg, max)
  • Date/Timestamp Fields: Earliest and latest values
  • Other Field Types: Example values

Query Implementation

Top Values:

source = <index>
| top 10 <field>

Numeric Summary:

# For top values:
source = <index>
| top 10 <field>

# For summary statistics:
source = <index>
| stats min(<field>) as min,
        percentile(<field>, 50) as median,
        avg(<field>) as avg,
        max(<field>) as max

Date Summary:

source = <index>
| stats min(<field>) as earliest,
        max(<field>) as latest

Example value:

source = <index>
| head 10
| fields <field>
| where isnotnull(<field>)

Tab Component Details

FieldStatsTab Component

The main tab component that renders the action bar and container:

export const FieldStatsTab = () => {
  return (
    <div className="explore-field-stats-tab tab-container">
      <ActionBar data-test-subj="fieldStatsTabActionBar" />
      <FieldStatsContainer data-test-subj="fieldStatsTabContainer" />
    </div>
  );
};

FieldStatsContainer Component

Manages data fetching and state:

export const FieldStatsContainer = () => {
  const dataset = useSelector(selectDataset);
  const [fieldStats, setFieldStats] = useState<Record<string, FieldStatsItem>>({});
  const [loadingFields, setLoadingFields] = useState<Set<string>>(new Set());
  const [expandedRows, setExpandedRows] = useState<Set<string>>(new Set());

  // Get fields from dataset
  const fields = useMemo(() => {
    if (!dataset) return [];
    return getIndexPatternFieldList(dataset);
  }, [dataset]);

  // Fetch field statistics on mount
  useEffect(() => {
    if (!fields.length || !dataset) return;
    fetchAllFieldStats(fields, dataset, setFieldStats, setLoadingFields);
  }, [fields, dataset]);

  // Handle row expansion
  const handleRowExpand = async (fieldName: string) => {
    // Logic to expand/collapse rows and fetch details if needed
  };

  return (
    <FieldStatsTable
      items={fieldStats}
      expandedRows={expandedRows}
      ...
    />
  );
};

Fetch All Field Stats Function

Handles fetching statistics for all fields in parallel:

const fetchAllFieldStats = async (
  fields: DataViewField[],
  dataset: DataView,
  setFieldStats,
  setLoadingFields
) => {
  // Mark all fields as loading

  // Fetch stats for each field in parallel
  const promises = fields.map(async (field) => {
    try {
      const query = getFieldStatsQuery(dataset.title, field.name);
      const result = await executeQuery(query);

      // Process result and update state
      const stats: FieldStatsItem = {
        name: field.name,
        type: field.type,
        docCount: result.hits[0]?.count || 0,
        distinctCount: result.hits[0]?.dc || 0,
        docPercentage: result.hits[0]?.percentage_total || 0,
      };

      setFieldStats((prev) => ({ ...prev, [field.name]: stats }));
    } catch (error) {
      // Handle errors with default values
    } finally {
      // Remove field from loading set
    }
  });

  await Promise.all(promises);
};

Field Detail Sections Configuration

Configuration-driven approach for field details:

interface FieldDetailSection {
  id: string;
  label: string;
  isApplicable: (fieldType: string) => boolean;
  fetchData: (field: FieldStatsItem, index: string) => Promise<any>;
  component: React.ComponentType<{ data: any; field: FieldStatsItem }>;
}

const fieldDetailSections: FieldDetailSection[] = [
  {
    id: 'topValues',
    label: 'Top Values',
    isApplicable: (type) => ['string', 'ip', 'boolean'].includes(type),
    fetchData: async (field, index) => {
      // Fetch top values using PPL query
    },
    component: TopValuesSection,
  },
  {
    id: 'numericSummary',
    label: 'Summary Statistics',
    isApplicable: (type) => ['number'].includes(type),
    fetchData: async (field, index) => {
      // Fetch numeric summary stats
    },
    component: NumericSummarySection,
  },
  // Additional sections for date ranges, etc.
];

Fetch Field Details Function

Handles fetching details for a specific field using the configuration:

const fetchFieldDetails = async (
  field: FieldStatsItem,
  dataset: DataView,
  setFieldDetails,
  setDetailsLoading
) => {
  setDetailsLoading((prev) => new Set(prev).add(field.name));

  try {
    // Get applicable sections for this field type
    const applicableSections = fieldDetailSections.filter((section) =>
      section.isApplicable(field.type)
    );

    // Fetch data for all applicable sections in parallel
    const sectionPromises = applicableSections.map(async (section) => ({
      id: section.id,
      data: await section.fetchData(field, dataset.title),
    }));

    const sectionData = await Promise.all(sectionPromises);

    // Convert to object format and update state
    const details = sectionData.reduce(
      (acc, { id, data }) => ({
        ...acc,
        [id]: data,
      }),
      {}
    );

    setFieldDetails((prev) => ({ ...prev, [field.name]: details }));
  } catch (error) {
    // Handle errors
  } finally {
    // Remove from loading set
  }
};

FieldStatsTable Component

Renders the EuiBasicTable with expandable rows:

export const FieldStatsTable = ({
  items,
  expandedRows,
  fieldDetails,
  onRowExpand,
  loadingFields,
}) => {
  const columns = getFieldStatsColumns(expandedRows, onRowExpand, loadingFields);

  const itemIdToExpandedRowMap = useMemo(() => {
    const map: Record<Field, ReactNode> = {};
    expandedRows.forEach((field) => {
      if (fieldDetails[field]) {
        map[field] = (
          <FieldStatsRowDetails
            field={field}
            details={fieldDetails[field]}
          />
        );
      }
    });
    return map;
  }, [expandedRows, fieldDetails]);

  return (
    <EuiBasicTable
      items={items}
      columns={columns}
      itemId="name"
      itemIdToExpandedRowMap={itemIdToExpandedRowMap}
      isExpandable={true}
      sorting={{
        sort: {
          field: 'name',
          direction: 'asc',
        },
      }}
      ...
    />
  );
};

Field Stats Row Details Component

Renders the detail sections side by side based on configuration:

export const FieldStatsRowDetails = ({ field, details }) => {
  if (details.error) {
    return <EuiCallOut color="danger" title="Failed to load details" />;
  }

  // Get applicable sections for this field
  const applicableSections = fieldDetailSections.filter(
    (section) => section.isApplicable(field.type) && details[section.id]
  );

  return (
    <EuiFlexGroup direction="column" gutterSize="m">
      {applicableSections.map((section) => {
        const SectionComponent = section.component;
        return (
          <EuiFlexItem key={section.id}>
            <EuiPanel paddingSize="s">
              <EuiTitle size="xs">
                <h4>{section.label}</h4>
              </EuiTitle>
              <EuiSpacer size="s" />
              <SectionComponent data={details[section.id]} field={field} />
            </EuiPanel>
          </EuiFlexItem>
        );
      })}
    </EuiFlexGroup>
  );
};

Detail Section Components

Individual components for each type of detail section:

// Top Values Section Component
export const TopValuesSection = () => {
  return (
    ...
  );
};

// Numeric Summary Section Component
export const NumericSummarySection = () => {
  return (
    ...
  );
};

// Date Range Section Component
export const DateRangeSection = () => {
  return (
    ...
  );
};

Performance Considerations

  1. Lazy Loading: Only fetch detailed statistics when rows are expanded
  2. Pagination: Implement virtual scrolling for large numbers of fields
  3. Parallel Queries: Execute field statistics queries in parallel for faster loading
  4. Debouncing: Debounce rapid expand/collapse actions
  5. Progressive Loading: Show fields immediately from the dataset, then load statistics progressively

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions