Skip to content

Count number of documents with at least one ignored field #109146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 46 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
ea1d2c8
feature: count number of documents with at least one ignored field
salvatore-campagna May 29, 2024
5b0bf9c
Update docs/changelog/109146.yaml
salvatore-campagna May 29, 2024
0516fa7
Update docs/changelog/109146.yaml
salvatore-campagna May 29, 2024
b468362
fix: adding missing 'Logs' to changelog schema
salvatore-campagna May 29, 2024
46571dd
fix: constructor invokation
salvatore-campagna May 29, 2024
7419e0b
fix: constructor invokation
salvatore-campagna May 29, 2024
4fffc73
docs: update docs stats page
salvatore-campagna May 29, 2024
0d88613
fix: skip null check
salvatore-campagna May 29, 2024
50911a7
fix: add missing docs_with_ignored_fields
salvatore-campagna May 29, 2024
19e2bc2
fix: a few more tests
salvatore-campagna May 29, 2024
0952941
Merge branch 'main' into feature/108092-docs-with-ignored-fields
salvatore-campagna May 29, 2024
9ed9505
fix: update transport version id after main merge
salvatore-campagna May 29, 2024
aebeb19
fix: use -1 as init value
salvatore-campagna May 30, 2024
c9639d7
Merge branch 'main' into feature/108092-docs-with-ignored-fields
salvatore-campagna May 30, 2024
b0f61ae
note: souce only snapshot unsupported
salvatore-campagna May 30, 2024
8c92be8
Revert "fix: use -1 as init value"
salvatore-campagna May 30, 2024
bb74c2f
nit: remove this
salvatore-campagna May 30, 2024
4baab8a
fix: introduce sum_doc_freq_terms_ignored_field
salvatore-campagna May 30, 2024
5e2475e
fix: add missing sum_doc_freq_terms_ignored_field
salvatore-campagna May 30, 2024
9063532
fix: extract method
salvatore-campagna May 30, 2024
9d44e50
fix: adjust error message
salvatore-campagna May 30, 2024
789f27f
do, or do not, there is npo try
salvatore-campagna May 30, 2024
aa29fb7
fix: make ingored field stats optional
salvatore-campagna May 31, 2024
14fd47c
fix: missing ignored field stats
salvatore-campagna May 31, 2024
52d8628
fix: mssing boolean param
salvatore-campagna May 31, 2024
88023d1
fix: writeVLong requires positive long values
salvatore-campagna May 31, 2024
8e45c2a
fix: use positive and negative values
salvatore-campagna May 31, 2024
da184fa
fix: use object to collect ignored field stats
salvatore-campagna Jun 1, 2024
7081c93
fix: default constructor, equals and hash code
salvatore-campagna Jun 1, 2024
bd4fe4a
nit: some code cleanup
salvatore-campagna Jun 3, 2024
188ab53
Merge branch 'main' into feature/108092-docs-with-ignored-fields
salvatore-campagna Jun 3, 2024
01fc479
fix: use a separate object for ignored field stats
salvatore-campagna Jun 3, 2024
675a368
fix: boolean not required anymore
salvatore-campagna Jun 3, 2024
33b943e
fix: use ignored field source when acquiring searcher
salvatore-campagna Jun 3, 2024
b4f4748
fix: restore unwanted changes
salvatore-campagna Jun 3, 2024
e656e7c
fix: reuse an already open searcher for searchable snapshot
salvatore-campagna Jun 3, 2024
d842108
fix: remove docs from cluster stats
salvatore-campagna Jun 3, 2024
0ec7272
fix: make method names consistent
salvatore-campagna Jun 3, 2024
2ca4842
fix: explicitly initialize value
salvatore-campagna Jun 3, 2024
c663ff5
nit: rename capability
salvatore-campagna Jun 3, 2024
573377d
fix: remove unused code
salvatore-campagna Jun 3, 2024
bbb9cae
Merge branch 'main' into feature/108092-docs-with-ignored-fields
salvatore-campagna Jun 4, 2024
4a6e586
docs: improve docs
salvatore-campagna Jun 4, 2024
38da229
Merge branch 'main' into feature/108092-docs-with-ignored-fields
salvatore-campagna Jun 4, 2024
2b795ae
Merge branch 'main' into feature/108092-docs-with-ignored-fields
salvatore-campagna Jun 4, 2024
72e6935
Merge branch 'main' into feature/108092-docs-with-ignored-fields
salvatore-campagna Jun 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/changelog/109146.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 109146
summary: Count number of documents with at least one ignored field
area: Logs
type: feature
issues:
- 108092
14 changes: 14 additions & 0 deletions docs/reference/cluster/stats.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,20 @@ space of deleted Lucene documents when a segment is merged.
`total_size_in_bytes`::
(integer)
Total size in bytes across all primary shards assigned to selected nodes.

`docs_with_ignored_fields`::
(integer)
Total number of documents including at least one ignored field.
+
This number is based on documents in Lucene segments and does not take
into account deleted documents.

`sum_doc_freq_terms_ignored_fields`::
(integer)
Sum of the terms frequencies for the _ignored field over all documents.
+
This number is based on documents in Lucene segments and does not take
into account deleted documents.
=====

`store`::
Expand Down
5 changes: 5 additions & 0 deletions docs/reference/rest-api/common-parms.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -556,6 +556,11 @@ Size of the index in <<byte-units, byte units>>.

`translog`::
<<index-modules-translog,Translog>> statistics.

`ignored_field`::
Includes statistics about the number of documents including the number of total documents,
the number of documents with at least one ignored field and the sum of term frequencies
for the `_ignored` field over all documents in the index.
--
end::index-metric[]

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
setup:
- do:
indices.create:
index: test1
body:
settings:
number_of_shards: 3
number_of_replicas: 0
mappings:
properties:
email:
type: keyword
ignore_above: 15
date_of_birth:
type: date
format: "dd-MM-yyyy"
ignore_malformed: true
ip_address:
type: ip
ignore_malformed: true

- do:
indices.create:
index: test2
body:
settings:
number_of_shards: 3
number_of_replicas: 0
mappings:
properties:
email:
type: keyword
ignore_above: 15
date_of_birth:
type: date
format: "dd-MM-yyyy"
ignore_malformed: true
ip_address:
type: ip
ignore_malformed: true

- do:
indices.create:
index: test3
body:
settings:
number_of_shards: 3
number_of_replicas: 0
mappings:
properties:
email:
type: keyword
ignore_above: 15
date_of_birth:
type: date
format: "dd-MM-yyyy"
ignore_malformed: true
ip_address:
type: ip
ignore_malformed: true

- do:
bulk:
index: test1
refresh: true
body:
- { "index": { "_id": "001" } }
- {"email": "[email protected]", "date_of_birth": "10-11-1992", "ip_address": "117.12.45.79" }
- { "index": { "_id": "002" } }
- { "email": "[email protected]", "date_of_birth": "12-04-1993", "ip_address": "178.22.231.24" }

- do:
bulk:
index: test2
refresh: true
body:
- { "index": { "_id": "003" } }
- { "email": "[email protected]", "date_of_birth": "09-02-1990", "ip_address": "117.12.45.79" }
- { "index": { "_id": "004" } }
- { "email": "[email protected]", "date_of_birth": "12-24-1991", "ip_address": "133.45.123.812" }

- do:
bulk:
index: test3
refresh: false # document not flushed to Lucene segment
body:
- { "index": { "_id": "005" } }
- { "email": "[email protected]", "date_of_birth": "04-05-1992", "ip_address": "123.77.18.488" }

---
"ignored fields stats with docs metric":
- requires:
test_runner_features: [ capabilities ]
capabilities:
- method: GET
path: /{index}/_stats
capabilities: [ ignored_field_doc_stats_counter ]
reason: "Counting docs with ignored fields required"

- do:
indices.stats:
metric: [ ignored_field ]

- match: { _all.primaries.ignored_field.total_docs: 4 }
- match: { _all.primaries.ignored_field.docs_with_ignored_fields: 2 }
- match: { _all.primaries.ignored_field.sum_doc_freq_terms_ignored_fields: 3 }
- match: { _all.total.ignored_field.total_docs: 4 }
- match: { _all.total.ignored_field.docs_with_ignored_fields: 2 }
- match: { _all.total.ignored_field.sum_doc_freq_terms_ignored_fields: 3 }

- match: { indices.test1.primaries.ignored_field.total_docs: 2 }
- match: { indices.test1.primaries.ignored_field.docs_with_ignored_fields: 0 }
- match: { indices.test1.primaries.ignored_field.sum_doc_freq_terms_ignored_fields: 0 }
- match: { indices.test1.total.ignored_field.total_docs: 2 }
- match: { indices.test1.total.ignored_field.docs_with_ignored_fields: 0 }
- match: { indices.test1.total.ignored_field.sum_doc_freq_terms_ignored_fields: 0 }

- match: { indices.test2.primaries.ignored_field.total_docs: 2 }
- match: { indices.test2.primaries.ignored_field.docs_with_ignored_fields: 2 }
- match: { indices.test2.primaries.ignored_field.sum_doc_freq_terms_ignored_fields: 3 }
- match: { indices.test2.total.ignored_field.total_docs: 2 }
- match: { indices.test2.total.ignored_field.docs_with_ignored_fields: 2 }
- match: { indices.test2.total.ignored_field.sum_doc_freq_terms_ignored_fields: 3 }

# Not refreshed
- match: { indices.test3.primaries.ignored_field.total_docs: 0 }
- match: { indices.test3.primaries.ignored_field.docs_with_ignored_fields: 0 }
- match: { indices.test3.primaries.ignored_field.sum_doc_freq_terms_ignored_fields: 0 }
- match: { indices.test3.total.ignored_field.total_docs: 0 }
- match: { indices.test3.total.ignored_field.docs_with_ignored_fields: 0 }
- match: { indices.test3.total.ignored_field.sum_doc_freq_terms_ignored_fields: 0 }

---
"ignored fields stats with docs metric with all metrics":
- requires:
test_runner_features: [ capabilities ]
capabilities:
- method: GET
path: /{index}/_stats
capabilities: [ ignored_field_doc_stats_counter ]
reason: "Counting docs with ignored fields required"

- do:
indices.stats:
index: test*
metric: _all

- match: { _all.primaries.ignored_field.total_docs: 4 }
- match: { _all.primaries.ignored_field.docs_with_ignored_fields: 2 }
- match: { _all.primaries.ignored_field.sum_doc_freq_terms_ignored_fields: 3 }
- match: { _all.total.ignored_field.total_docs: 4 }
- match: { _all.total.ignored_field.docs_with_ignored_fields: 2 }
- match: { _all.total.ignored_field.sum_doc_freq_terms_ignored_fields: 3 }

- match: { indices.test1.primaries.ignored_field.total_docs: 2 }
- match: { indices.test1.primaries.ignored_field.docs_with_ignored_fields: 0 }
- match: { indices.test1.primaries.ignored_field.sum_doc_freq_terms_ignored_fields: 0 }
- match: { indices.test1.total.ignored_field.total_docs: 2 }
- match: { indices.test1.total.ignored_field.docs_with_ignored_fields: 0 }
- match: { indices.test1.total.ignored_field.sum_doc_freq_terms_ignored_fields: 0 }

- match: { indices.test2.primaries.ignored_field.total_docs: 2 }
- match: { indices.test2.primaries.ignored_field.docs_with_ignored_fields: 2 }
- match: { indices.test2.primaries.ignored_field.sum_doc_freq_terms_ignored_fields: 3 }
- match: { indices.test2.total.ignored_field.total_docs: 2 }
- match: { indices.test2.total.ignored_field.docs_with_ignored_fields: 2 }
- match: { indices.test2.total.ignored_field.sum_doc_freq_terms_ignored_fields: 3 }

# Not refreshed
- match: { indices.test3.primaries.ignored_field.total_docs: 0 }
- match: { indices.test3.primaries.ignored_field.docs_with_ignored_fields: 0 }
- match: { indices.test3.primaries.ignored_field.sum_doc_freq_terms_ignored_fields: 0 }
- match: { indices.test3.total.ignored_field.total_docs: 0 }
- match: { indices.test3.total.ignored_field.docs_with_ignored_fields: 0 }
- match: { indices.test3.total.ignored_field.sum_doc_freq_terms_ignored_fields: 0 }

Original file line number Diff line number Diff line change
Expand Up @@ -834,7 +834,8 @@ public void testFlagOrdinalOrder() {
Flag.Bulk,
Flag.Shards,
Flag.Mappings,
Flag.DenseVector };
Flag.DenseVector,
Flag.IgnoredFieldStats };

assertThat(flags.length, equalTo(Flag.values().length));
for (int i = 0; i < flags.length; i++) {
Expand Down Expand Up @@ -1000,6 +1001,7 @@ private static void set(Flag flag, IndicesStatsRequestBuilder builder, boolean s
// We don't actually expose shards in IndexStats, but this test fails if it isn't handled
builder.request().flags().set(Flag.Shards, set);
case DenseVector -> builder.setDenseVector(set);
case IgnoredFieldStats -> builder.setIncludeIgnoredFieldsStats(set);
default -> fail("new flag? " + flag);
}
}
Expand Down Expand Up @@ -1046,6 +1048,8 @@ private static boolean isSet(Flag flag, CommonStats response) {
return response.getNodeMappings() != null;
case DenseVector:
return response.getDenseVectorStats() != null;
case IgnoredFieldStats:
return response.getIgnoredFieldStats() != null;
default:
fail("new flag? " + flag);
return false;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ static TransportVersion def(int id) {
public static final TransportVersion ML_INFERENCE_GOOGLE_AI_STUDIO_COMPLETION_ADDED = def(8_672_00_0);
public static final TransportVersion WATCHER_REQUEST_TIMEOUTS = def(8_673_00_0);
public static final TransportVersion ML_INFERENCE_ENHANCE_DELETE_ENDPOINT = def(8_674_00_0);
public static final TransportVersion IGNORED_FIELDS_STATS = def(8_675_00_0);

/*
* STOP! READ THIS FIRST! No, really,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import org.elasticsearch.index.search.stats.SearchStats;
import org.elasticsearch.index.shard.DenseVectorStats;
import org.elasticsearch.index.shard.DocsStats;
import org.elasticsearch.index.shard.IgnoredFieldStats;
import org.elasticsearch.index.shard.IndexShard;
import org.elasticsearch.index.shard.IndexingStats;
import org.elasticsearch.index.shard.ShardCountStats;
Expand Down Expand Up @@ -109,6 +110,8 @@ public class CommonStats implements Writeable, ToXContentFragment {

@Nullable
public DenseVectorStats denseVectorStats;
@Nullable
public IgnoredFieldStats ignoredFieldStats;

public CommonStats() {
this(CommonStatsFlags.NONE);
Expand Down Expand Up @@ -139,6 +142,7 @@ public CommonStats(CommonStatsFlags flags) {
case Shards -> shards = new ShardCountStats();
case Mappings -> nodeMappings = new NodeMappingStats();
case DenseVector -> denseVectorStats = new DenseVectorStats();
case IgnoredFieldStats -> ignoredFieldStats = new IgnoredFieldStats();
default -> throw new IllegalStateException("Unknown Flag: " + flag);
}
}
Expand Down Expand Up @@ -182,6 +186,7 @@ public static CommonStats getShardLevelStats(IndicesQueryCache indicesQueryCache
// Setting to 1 because the single IndexShard passed to this method implies 1 shard
stats.shards = new ShardCountStats(1);
case DenseVector -> stats.denseVectorStats = indexShard.denseVectorStats();
case IgnoredFieldStats -> stats.ignoredFieldStats = indexShard.ignoredFieldStats();
default -> throw new IllegalStateException("Unknown or invalid flag for shard-level stats: " + flag);
}
} catch (AlreadyClosedException e) {
Expand Down Expand Up @@ -219,6 +224,9 @@ public CommonStats(StreamInput in) throws IOException {
if (in.getTransportVersion().onOrAfter(VERSION_SUPPORTING_DENSE_VECTOR_STATS)) {
denseVectorStats = in.readOptionalWriteable(DenseVectorStats::new);
}
if (in.getTransportVersion().onOrAfter(TransportVersions.IGNORED_FIELDS_STATS)) {
ignoredFieldStats = in.readOptionalWriteable(IgnoredFieldStats::new);
}
}

@Override
Expand Down Expand Up @@ -249,6 +257,9 @@ public void writeTo(StreamOutput out) throws IOException {
if (out.getTransportVersion().onOrAfter(VERSION_SUPPORTING_DENSE_VECTOR_STATS)) {
out.writeOptionalWriteable(denseVectorStats);
}
if (out.getTransportVersion().onOrAfter(TransportVersions.IGNORED_FIELDS_STATS)) {
out.writeOptionalWriteable(ignoredFieldStats);
}
}

@Override
Expand All @@ -275,7 +286,8 @@ public boolean equals(Object o) {
&& Objects.equals(bulk, that.bulk)
&& Objects.equals(shards, that.shards)
&& Objects.equals(nodeMappings, that.nodeMappings)
&& Objects.equals(denseVectorStats, that.denseVectorStats);
&& Objects.equals(denseVectorStats, that.denseVectorStats)
&& Objects.equals(ignoredFieldStats, that.ignoredFieldStats);
}

@Override
Expand All @@ -300,7 +312,8 @@ public int hashCode() {
bulk,
shards,
nodeMappings,
denseVectorStats
denseVectorStats,
ignoredFieldStats
);
}

Expand Down Expand Up @@ -465,6 +478,14 @@ public void add(CommonStats stats) {
} else {
denseVectorStats.add(stats.getDenseVectorStats());
}
if (ignoredFieldStats == null) {
if (stats.getIgnoredFieldStats() != null) {
ignoredFieldStats = new IgnoredFieldStats();
ignoredFieldStats.add(stats.ignoredFieldStats);
}
} else {
ignoredFieldStats.add(stats.getIgnoredFieldStats());
}
}

@Nullable
Expand Down Expand Up @@ -567,6 +588,11 @@ public DenseVectorStats getDenseVectorStats() {
return denseVectorStats;
}

@Nullable
public IgnoredFieldStats getIgnoredFieldStats() {
return ignoredFieldStats;
}

/**
* Utility method which computes total memory by adding
* FieldData, PercolatorCache, Segments (index writer, version map)
Expand Down Expand Up @@ -609,6 +635,7 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
addIfNonNull(builder, params, bulk);
addIfNonNull(builder, params, nodeMappings);
addIfNonNull(builder, params, denseVectorStats);
addIfNonNull(builder, params, ignoredFieldStats);
return builder;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,8 @@ public enum Flag {
Bulk("bulk", 17),
Shards("shard_stats", 18),
Mappings("mappings", 19),
DenseVector("dense_vector", 20);
DenseVector("dense_vector", 20),
IgnoredFieldStats("ignored_field", 21);

private final String restName;
private final int index;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -275,10 +275,19 @@ public IndicesStatsRequest denseVector(boolean denseVector) {
return this;
}

public IndicesStatsRequest ignoredFieldsStats(boolean ignoredField) {
flags.set(Flag.IgnoredFieldStats, ignoredField);
return this;
}

public boolean denseVector() {
return flags.isSet(Flag.DenseVector);
}

public boolean ignoredField() {
return flags.isSet(Flag.IgnoredFieldStats);
}

@Override
public void writeTo(StreamOutput out) throws IOException {
super.writeTo(out);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -159,4 +159,9 @@ public IndicesStatsRequestBuilder setDenseVector(boolean denseVector) {
request.denseVector(denseVector);
return this;
}

public IndicesStatsRequestBuilder setIncludeIgnoredFieldsStats(boolean includeIgnoredFieldsStats) {
request.ignoredFieldsStats(includeIgnoredFieldsStats);
return this;
}
}
Loading