-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Field Statistics in Explore
Background
The Field Statistics feature is a new addition to OpenSearchDashboards' Explore plugin that provides users with comprehensive statistical information about fields in their indices. This feature addresses the need for users to quickly understand the composition and distribution of their data without writing complex queries.
The feature will be implemented as a new tab in the Explore interface, similar to the existing Logs, Patterns, and Visualization tabs. It will display a table showing field names, document counts, distinct value counts, and additional type-specific statistics in an expandable row format.
Overview
Component Structure
src/plugins/explore/public/components/
├── tabs/
│ └── field_stats_tab.tsx # Main tab component
└── field_stats/
├── field_stats_container.tsx # Data fetching and state management
├── field_stats_table.tsx # Table component with column definitions
├── field_stats_row_details.tsx # Expandable row details
├── field_stats_queries.ts # PPL query generation functions
├── field_stats_types.ts # TypeScript interfaces and types
└── detail_sections/
├── top_values_section.tsx # Top values detail component
├── numeric_summary_section.tsx # Numeric statistics component
└── date_range_section.tsx # Date range detail component
Tab Registration
The Field Stats tab will be registered in register_tabs.ts
:
tabRegistry.registerTab({
id: EXPLORE_FIELD_STATS_TAB_ID,
label: 'Field Stats',
flavor: [ExploreFlavor.Logs, ExploreFlavor.Metrics],
order: 25, // After Visualization tab
supportedLanguages: [EXPLORE_DEFAULT_LANGUAGE],
component: FieldStatsTab,
});
PPL Query Design
Field Statistics Query Strategy
When the tab is opened, we don't need a query to find all of the fields. Instead, we'll use the existing Redux selector selectDataset
to get the current index pattern and then use getIndexPatternFieldList
to get all fields.
Getting Fields Without Query
const dataset = useSelector(selectDataset);
const fields = useMemo(() => {
return getIndexPatternFieldList(dataset, {});
}, [dataset]);
Basic Field Statistics Query
For each field, we'll fetch the following information:
source = <index>
| stats count(`<field>`) as count,
dc(`<field>`) as dc,
count() as total_count
| eval percentage_total = (count * 100.0) / total_count
Expanded Row Statistics
When a row is expanded, we display additional information based on field type. Different PPL queries are used for:
- String/IP/Boolean Fields: Top values query
- Number Fields: Top values + summary statistics (min, median, avg, max)
- Date/Timestamp Fields: Earliest and latest values
- Other Field Types: Example values
Query Implementation
Top Values:
source = <index>
| top 10 <field>
Numeric Summary:
# For top values:
source = <index>
| top 10 <field>
# For summary statistics:
source = <index>
| stats min(<field>) as min,
percentile(<field>, 50) as median,
avg(<field>) as avg,
max(<field>) as max
Date Summary:
source = <index>
| stats min(<field>) as earliest,
max(<field>) as latest
Example value:
source = <index>
| head 10
| fields <field>
| where isnotnull(<field>)
Tab Component Details
FieldStatsTab Component
The main tab component that renders the action bar and container:
export const FieldStatsTab = () => {
return (
<div className="explore-field-stats-tab tab-container">
<ActionBar data-test-subj="fieldStatsTabActionBar" />
<FieldStatsContainer data-test-subj="fieldStatsTabContainer" />
</div>
);
};
FieldStatsContainer Component
Manages data fetching and state:
export const FieldStatsContainer = () => {
const dataset = useSelector(selectDataset);
const [fieldStats, setFieldStats] = useState<Record<string, FieldStatsItem>>({});
const [loadingFields, setLoadingFields] = useState<Set<string>>(new Set());
const [expandedRows, setExpandedRows] = useState<Set<string>>(new Set());
// Get fields from dataset
const fields = useMemo(() => {
if (!dataset) return [];
return getIndexPatternFieldList(dataset);
}, [dataset]);
// Fetch field statistics on mount
useEffect(() => {
if (!fields.length || !dataset) return;
fetchAllFieldStats(fields, dataset, setFieldStats, setLoadingFields);
}, [fields, dataset]);
// Handle row expansion
const handleRowExpand = async (fieldName: string) => {
// Logic to expand/collapse rows and fetch details if needed
};
return (
<FieldStatsTable
items={fieldStats}
expandedRows={expandedRows}
...
/>
);
};
Fetch All Field Stats Function
Handles fetching statistics for all fields in parallel:
const fetchAllFieldStats = async (
fields: DataViewField[],
dataset: DataView,
setFieldStats,
setLoadingFields
) => {
// Mark all fields as loading
// Fetch stats for each field in parallel
const promises = fields.map(async (field) => {
try {
const query = getFieldStatsQuery(dataset.title, field.name);
const result = await executeQuery(query);
// Process result and update state
const stats: FieldStatsItem = {
name: field.name,
type: field.type,
docCount: result.hits[0]?.count || 0,
distinctCount: result.hits[0]?.dc || 0,
docPercentage: result.hits[0]?.percentage_total || 0,
};
setFieldStats((prev) => ({ ...prev, [field.name]: stats }));
} catch (error) {
// Handle errors with default values
} finally {
// Remove field from loading set
}
});
await Promise.all(promises);
};
Field Detail Sections Configuration
Configuration-driven approach for field details:
interface FieldDetailSection {
id: string;
label: string;
isApplicable: (fieldType: string) => boolean;
fetchData: (field: FieldStatsItem, index: string) => Promise<any>;
component: React.ComponentType<{ data: any; field: FieldStatsItem }>;
}
const fieldDetailSections: FieldDetailSection[] = [
{
id: 'topValues',
label: 'Top Values',
isApplicable: (type) => ['string', 'ip', 'boolean'].includes(type),
fetchData: async (field, index) => {
// Fetch top values using PPL query
},
component: TopValuesSection,
},
{
id: 'numericSummary',
label: 'Summary Statistics',
isApplicable: (type) => ['number'].includes(type),
fetchData: async (field, index) => {
// Fetch numeric summary stats
},
component: NumericSummarySection,
},
// Additional sections for date ranges, etc.
];
Fetch Field Details Function
Handles fetching details for a specific field using the configuration:
const fetchFieldDetails = async (
field: FieldStatsItem,
dataset: DataView,
setFieldDetails,
setDetailsLoading
) => {
setDetailsLoading((prev) => new Set(prev).add(field.name));
try {
// Get applicable sections for this field type
const applicableSections = fieldDetailSections.filter((section) =>
section.isApplicable(field.type)
);
// Fetch data for all applicable sections in parallel
const sectionPromises = applicableSections.map(async (section) => ({
id: section.id,
data: await section.fetchData(field, dataset.title),
}));
const sectionData = await Promise.all(sectionPromises);
// Convert to object format and update state
const details = sectionData.reduce(
(acc, { id, data }) => ({
...acc,
[id]: data,
}),
{}
);
setFieldDetails((prev) => ({ ...prev, [field.name]: details }));
} catch (error) {
// Handle errors
} finally {
// Remove from loading set
}
};
FieldStatsTable Component
Renders the EuiBasicTable with expandable rows:
export const FieldStatsTable = ({
items,
expandedRows,
fieldDetails,
onRowExpand,
loadingFields,
}) => {
const columns = getFieldStatsColumns(expandedRows, onRowExpand, loadingFields);
const itemIdToExpandedRowMap = useMemo(() => {
const map: Record<Field, ReactNode> = {};
expandedRows.forEach((field) => {
if (fieldDetails[field]) {
map[field] = (
<FieldStatsRowDetails
field={field}
details={fieldDetails[field]}
/>
);
}
});
return map;
}, [expandedRows, fieldDetails]);
return (
<EuiBasicTable
items={items}
columns={columns}
itemId="name"
itemIdToExpandedRowMap={itemIdToExpandedRowMap}
isExpandable={true}
sorting={{
sort: {
field: 'name',
direction: 'asc',
},
}}
...
/>
);
};
Field Stats Row Details Component
Renders the detail sections side by side based on configuration:
export const FieldStatsRowDetails = ({ field, details }) => {
if (details.error) {
return <EuiCallOut color="danger" title="Failed to load details" />;
}
// Get applicable sections for this field
const applicableSections = fieldDetailSections.filter(
(section) => section.isApplicable(field.type) && details[section.id]
);
return (
<EuiFlexGroup direction="column" gutterSize="m">
{applicableSections.map((section) => {
const SectionComponent = section.component;
return (
<EuiFlexItem key={section.id}>
<EuiPanel paddingSize="s">
<EuiTitle size="xs">
<h4>{section.label}</h4>
</EuiTitle>
<EuiSpacer size="s" />
<SectionComponent data={details[section.id]} field={field} />
</EuiPanel>
</EuiFlexItem>
);
})}
</EuiFlexGroup>
);
};
Detail Section Components
Individual components for each type of detail section:
// Top Values Section Component
export const TopValuesSection = () => {
return (
...
);
};
// Numeric Summary Section Component
export const NumericSummarySection = () => {
return (
...
);
};
// Date Range Section Component
export const DateRangeSection = () => {
return (
...
);
};
Performance Considerations
- Lazy Loading: Only fetch detailed statistics when rows are expanded
- Pagination: Implement virtual scrolling for large numbers of fields
- Parallel Queries: Execute field statistics queries in parallel for faster loading
- Debouncing: Debounce rapid expand/collapse actions
- Progressive Loading: Show fields immediately from the dataset, then load statistics progressively