Skip to content

Count number of documents with at least one ignored field #109146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 46 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
ea1d2c8
feature: count number of documents with at least one ignored field
salvatore-campagna May 29, 2024
5b0bf9c
Update docs/changelog/109146.yaml
salvatore-campagna May 29, 2024
0516fa7
Update docs/changelog/109146.yaml
salvatore-campagna May 29, 2024
b468362
fix: adding missing 'Logs' to changelog schema
salvatore-campagna May 29, 2024
46571dd
fix: constructor invokation
salvatore-campagna May 29, 2024
7419e0b
fix: constructor invokation
salvatore-campagna May 29, 2024
4fffc73
docs: update docs stats page
salvatore-campagna May 29, 2024
0d88613
fix: skip null check
salvatore-campagna May 29, 2024
50911a7
fix: add missing docs_with_ignored_fields
salvatore-campagna May 29, 2024
19e2bc2
fix: a few more tests
salvatore-campagna May 29, 2024
0952941
Merge branch 'main' into feature/108092-docs-with-ignored-fields
salvatore-campagna May 29, 2024
9ed9505
fix: update transport version id after main merge
salvatore-campagna May 29, 2024
aebeb19
fix: use -1 as init value
salvatore-campagna May 30, 2024
c9639d7
Merge branch 'main' into feature/108092-docs-with-ignored-fields
salvatore-campagna May 30, 2024
b0f61ae
note: souce only snapshot unsupported
salvatore-campagna May 30, 2024
8c92be8
Revert "fix: use -1 as init value"
salvatore-campagna May 30, 2024
bb74c2f
nit: remove this
salvatore-campagna May 30, 2024
4baab8a
fix: introduce sum_doc_freq_terms_ignored_field
salvatore-campagna May 30, 2024
5e2475e
fix: add missing sum_doc_freq_terms_ignored_field
salvatore-campagna May 30, 2024
9063532
fix: extract method
salvatore-campagna May 30, 2024
9d44e50
fix: adjust error message
salvatore-campagna May 30, 2024
789f27f
do, or do not, there is npo try
salvatore-campagna May 30, 2024
aa29fb7
fix: make ingored field stats optional
salvatore-campagna May 31, 2024
14fd47c
fix: missing ignored field stats
salvatore-campagna May 31, 2024
52d8628
fix: mssing boolean param
salvatore-campagna May 31, 2024
88023d1
fix: writeVLong requires positive long values
salvatore-campagna May 31, 2024
8e45c2a
fix: use positive and negative values
salvatore-campagna May 31, 2024
da184fa
fix: use object to collect ignored field stats
salvatore-campagna Jun 1, 2024
7081c93
fix: default constructor, equals and hash code
salvatore-campagna Jun 1, 2024
bd4fe4a
nit: some code cleanup
salvatore-campagna Jun 3, 2024
188ab53
Merge branch 'main' into feature/108092-docs-with-ignored-fields
salvatore-campagna Jun 3, 2024
01fc479
fix: use a separate object for ignored field stats
salvatore-campagna Jun 3, 2024
675a368
fix: boolean not required anymore
salvatore-campagna Jun 3, 2024
33b943e
fix: use ignored field source when acquiring searcher
salvatore-campagna Jun 3, 2024
b4f4748
fix: restore unwanted changes
salvatore-campagna Jun 3, 2024
e656e7c
fix: reuse an already open searcher for searchable snapshot
salvatore-campagna Jun 3, 2024
d842108
fix: remove docs from cluster stats
salvatore-campagna Jun 3, 2024
0ec7272
fix: make method names consistent
salvatore-campagna Jun 3, 2024
2ca4842
fix: explicitly initialize value
salvatore-campagna Jun 3, 2024
c663ff5
nit: rename capability
salvatore-campagna Jun 3, 2024
573377d
fix: remove unused code
salvatore-campagna Jun 3, 2024
bbb9cae
Merge branch 'main' into feature/108092-docs-with-ignored-fields
salvatore-campagna Jun 4, 2024
4a6e586
docs: improve docs
salvatore-campagna Jun 4, 2024
38da229
Merge branch 'main' into feature/108092-docs-with-ignored-fields
salvatore-campagna Jun 4, 2024
2b795ae
Merge branch 'main' into feature/108092-docs-with-ignored-fields
salvatore-campagna Jun 4, 2024
72e6935
Merge branch 'main' into feature/108092-docs-with-ignored-fields
salvatore-campagna Jun 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@
"Java High Level REST Client",
"Java Low Level REST Client",
"License",
"Logs",
"Machine Learning",
"Mapping",
"Monitoring",
Expand Down
6 changes: 6 additions & 0 deletions docs/changelog/109146.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
pr: 109146
summary: Count number of documents with at least one ignored field
area: Logs
type: feature
issues:
- 108092
10 changes: 9 additions & 1 deletion docs/reference/cluster/stats.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,13 @@ space of deleted Lucene documents when a segment is merged.
`total_size_in_bytes`::
(integer)
Total size in bytes across all primary shards assigned to selected nodes.

`docs_with_ignored_fields`::
(integer)
Total number of documents including at least one ignored field.
+
This number is based on documents in Lucene segments and does not take
into account deleted documents.
=====

`store`::
Expand Down Expand Up @@ -1599,7 +1606,8 @@ The API returns the following response:
"docs": {
"count": 10,
"deleted": 0,
"total_size_in_bytes": 8833
"total_size_in_bytes": 8833,
"docs_with_ignored_fields": 0
},
"store": {
"size": "16.2kb",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -502,7 +502,7 @@ private static ShardStats getShardStats(IndexMetadata indexMeta, int shardIndex,
shardRouting = shardRouting.initialize(assignedShardNodeId, null, ShardRouting.UNAVAILABLE_EXPECTED_SHARD_SIZE);
shardRouting = shardRouting.moveToStarted(ShardRouting.UNAVAILABLE_EXPECTED_SHARD_SIZE);
CommonStats stats = new CommonStats();
stats.docs = new DocsStats(100, 0, randomByteSizeValue().getBytes());
stats.docs = new DocsStats(100, 0, randomByteSizeValue().getBytes(), 0);
stats.store = new StoreStats();
stats.indexing = new IndexingStats(new IndexingStats.Stats(1, 1, 1, 1, 1, 1, 1, 1, false, 1, targetWriteLoad, 1));
return new ShardStats(shardRouting, new ShardPath(false, path, path, shardId), stats, null, null, null, false, 0);
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
---
"indices stats including documents with ignored fields":
- requires:
test_runner_features: [ capabilities ]
capabilities:
- method: GET
path: /{index}/_stats
capabilities: [ count_docs_with_ignored_fields ]
reason: "Counting docs with ignored fields required"

- do:
indices.create:
index: test1
body:
settings:
number_of_shards: 3
number_of_replicas: 0
mappings:
properties:
email:
type: keyword
ignore_above: 15
date_of_birth:
type: date
format: "dd-MM-yyyy"
ignore_malformed: true
ip_address:
type: ip
ignore_malformed: true

- do:
indices.create:
index: test2
body:
settings:
number_of_shards: 3
number_of_replicas: 0
mappings:
properties:
email:
type: keyword
ignore_above: 15
date_of_birth:
type: date
format: "dd-MM-yyyy"
ignore_malformed: true
ip_address:
type: ip
ignore_malformed: true

- do:
indices.create:
index: test3
body:
settings:
number_of_shards: 3
number_of_replicas: 0
mappings:
properties:
email:
type: keyword
ignore_above: 15
date_of_birth:
type: date
format: "dd-MM-yyyy"
ignore_malformed: true
ip_address:
type: ip
ignore_malformed: true

- do:
bulk:
index: test1
refresh: true
body:
- { "index": { "_id": "001" } }
- {"email": "[email protected]", "date_of_birth": "10-11-1992", "ip_address": "117.12.45.79" }
- { "index": { "_id": "002" } }
- { "email": "[email protected]", "date_of_birth": "12-04-1993", "ip_address": "178.22.231.24" }

- do:
bulk:
index: test2
refresh: true
body:
- { "index": { "_id": "003" } }
- { "email": "[email protected]", "date_of_birth": "09-02-1990", "ip_address": "117.12.45.79" }
- { "index": { "_id": "004" } }
- { "email": "[email protected]", "date_of_birth": "12-24-1991", "ip_address": "133.45.123.12" }

- do:
bulk:
index: test3
refresh: false # document not flushed to Lucene segment
body:
- { "index": { "_id": "005" } }
- { "email": "[email protected]", "date_of_birth": "04-05-1992", "ip_address": "123.77.18.488" }

- do:
indices.stats:
index: test*
metric: docs

- match: { _all.primaries.docs.count: 4 }
- match: { _all.primaries.docs.docs_with_ignored_fields: 2 }
- match: { _all.total.docs.count: 4 }
- match: { _all.total.docs.docs_with_ignored_fields: 2 }

- match: { indices.test1.primaries.docs.count: 2 }
- match: { indices.test1.primaries.docs.docs_with_ignored_fields: 0 }
- match: { indices.test1.total.docs.count: 2 }
- match: { indices.test1.total.docs.docs_with_ignored_fields: 0 }

- match: { indices.test2.primaries.docs.count: 2 }
- match: { indices.test2.primaries.docs.docs_with_ignored_fields: 2 }
- match: { indices.test2.total.docs.count: 2 }
- match: { indices.test2.total.docs.docs_with_ignored_fields: 2 }

# Not refreshed
- match: { indices.test3.primaries.docs.count: 0 }
- match: { indices.test3.primaries.docs.docs_with_ignored_fields: 0 }
- match: { indices.test3.total.docs.count: 0 }
- match: { indices.test3.total.docs.docs_with_ignored_fields: 0 }
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,7 @@ static TransportVersion def(int id) {
public static final TransportVersion ADD_METADATA_FLATTENED_TO_ROLES = def(8_671_00_0);
public static final TransportVersion ML_INFERENCE_GOOGLE_AI_STUDIO_COMPLETION_ADDED = def(8_672_00_0);
public static final TransportVersion WATCHER_REQUEST_TIMEOUTS = def(8_673_00_0);
public static final TransportVersion COUNT_DOCS_WITH_IGNORED_FIELDS = def(8_674_00_0);

/*
* STOP! READ THIS FIRST! No, really,
Expand Down
16 changes: 15 additions & 1 deletion server/src/main/java/org/elasticsearch/index/engine/Engine.java
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
import org.elasticsearch.index.VersionType;
import org.elasticsearch.index.mapper.DocumentParser;
import org.elasticsearch.index.mapper.IdFieldMapper;
import org.elasticsearch.index.mapper.IgnoredFieldMapper;
import org.elasticsearch.index.mapper.LuceneDocument;
import org.elasticsearch.index.mapper.Mapping;
import org.elasticsearch.index.mapper.MappingLookup;
Expand Down Expand Up @@ -215,6 +216,7 @@ protected final DocsStats docsStats(IndexReader indexReader) {
long numDocs = 0;
long numDeletedDocs = 0;
long sizeInBytes = 0;
long docsWithIgnoredFields = 0;
// we don't wait for a pending refreshes here since it's a stats call instead we mark it as accessed only which will cause
// the next scheduled refresh to go through and refresh the stats as well
for (LeafReaderContext readerContext : indexReader.leaves()) {
Expand All @@ -228,8 +230,20 @@ protected final DocsStats docsStats(IndexReader indexReader) {
} catch (IOException e) {
logger.trace(() -> "failed to get size for [" + info.info.name + "]", e);
}
docsWithIgnoredFields = tryGetNumberOfDocumentsWithIgnoredFields(readerContext);
}
return new DocsStats(numDocs, numDeletedDocs, sizeInBytes);
return new DocsStats(numDocs, numDeletedDocs, sizeInBytes, docsWithIgnoredFields);
}

private long tryGetNumberOfDocumentsWithIgnoredFields(final LeafReaderContext readerContext) {
try {
return readerContext.reader().getSumDocFreq(IgnoredFieldMapper.NAME);
Copy link
Member

@martijnvg martijnvg May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this invoke getDocCount() instead? Which returns: Returns the number of documents that have at least one term for this field, which matches more closely with the method name and what we are trying to include in the doc stats?

} catch (IOException e) {
logger.trace(() -> "IO error while getting the number of documents with ignored fields", e);
} catch (UnsupportedOperationException e) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This happens for source only indices which do not include inverted index and doc values.

logger.trace(() -> "Getting number of documents with ignored fields is not supported", e);
}
return 0;
}

/**
Expand Down
31 changes: 28 additions & 3 deletions server/src/main/java/org/elasticsearch/index/shard/DocsStats.java
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

package org.elasticsearch.index.shard;

import org.elasticsearch.TransportVersions;
import org.elasticsearch.common.io.stream.StreamInput;
import org.elasticsearch.common.io.stream.StreamOutput;
import org.elasticsearch.common.io.stream.Writeable;
Expand All @@ -23,6 +24,7 @@ public class DocsStats implements Writeable, ToXContentFragment {
private long count = 0;
private long deleted = 0;
private long totalSizeInBytes = 0;
private long docsWithIgnoredFields = -1;

public DocsStats() {

Expand All @@ -32,12 +34,16 @@ public DocsStats(StreamInput in) throws IOException {
count = in.readVLong();
deleted = in.readVLong();
totalSizeInBytes = in.readVLong();
if (in.getTransportVersion().onOrAfter(TransportVersions.COUNT_DOCS_WITH_IGNORED_FIELDS)) {
docsWithIgnoredFields = in.readVLong();
}
}

public DocsStats(long count, long deleted, long totalSizeInBytes) {
public DocsStats(long count, long deleted, long totalSizeInBytes, long docsWithIgnoredFields) {
this.count = count;
this.deleted = deleted;
this.totalSizeInBytes = totalSizeInBytes;
this.docsWithIgnoredFields = docsWithIgnoredFields;
}

public void add(DocsStats other) {
Expand All @@ -51,6 +57,7 @@ public void add(DocsStats other) {
}
this.count += other.count;
this.deleted += other.deleted;
this.docsWithIgnoredFields += other.docsWithIgnoredFields;
}

public long getCount() {
Expand All @@ -69,11 +76,22 @@ public long getTotalSizeInBytes() {
return totalSizeInBytes;
}

/**
* Returns the total number of documents including at least one ignored field.
* This value only reflects documents already flushed to Lucene segments.
*/
public long getDocsWithIgnoredFields() {
return docsWithIgnoredFields;
}

@Override
public void writeTo(StreamOutput out) throws IOException {
out.writeVLong(count);
out.writeVLong(deleted);
out.writeVLong(totalSizeInBytes);
if (out.getTransportVersion().onOrAfter(TransportVersions.COUNT_DOCS_WITH_IGNORED_FIELDS)) {
out.writeVLong(docsWithIgnoredFields);
}
}

@Override
Expand All @@ -82,6 +100,9 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
builder.field(Fields.COUNT, count);
builder.field(Fields.DELETED, deleted);
builder.field(Fields.TOTAL_SIZE_IN_BYTES, totalSizeInBytes);
if (docsWithIgnoredFields >= 0) {
builder.field(Fields.DOCS_WITH_IGNORED_FIELDS, docsWithIgnoredFields);
}
builder.endObject();
return builder;
}
Expand All @@ -91,18 +112,22 @@ public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
DocsStats that = (DocsStats) o;
return count == that.count && deleted == that.deleted && totalSizeInBytes == that.totalSizeInBytes;
return count == that.count
&& deleted == that.deleted
&& totalSizeInBytes == that.totalSizeInBytes
&& this.docsWithIgnoredFields == that.docsWithIgnoredFields;
}

@Override
public int hashCode() {
return Objects.hash(count, deleted, totalSizeInBytes);
return Objects.hash(count, deleted, totalSizeInBytes, docsWithIgnoredFields);
}

static final class Fields {
static final String DOCS = "docs";
static final String COUNT = "count";
static final String DELETED = "deleted";
static final String TOTAL_SIZE_IN_BYTES = "total_size_in_bytes";
static final String DOCS_WITH_IGNORED_FIELDS = "docs_with_ignored_fields";
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -161,4 +161,8 @@ protected Set<String> responseParams() {
return RESPONSE_PARAMS;
}

@Override
public Set<String> supportedCapabilities() {
return RestIndicesStatsCapabilities.CAPABILITIES;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/

package org.elasticsearch.rest.action.admin.indices;

import java.util.Set;

public class RestIndicesStatsCapabilities {

private static final String COUNT_DOCS_WITH_IGNORED_FIELDS = "count_docs_with_ignored_fields";

public static final Set<String> CAPABILITIES = Set.of(COUNT_DOCS_WITH_IGNORED_FIELDS);
}
Original file line number Diff line number Diff line change
Expand Up @@ -568,7 +568,7 @@ private static CommonStats createShardLevelCommonStats() {
int iota = 0;

final CommonStats indicesCommonStats = new CommonStats(CommonStatsFlags.ALL);
indicesCommonStats.getDocs().add(new DocsStats(++iota, ++iota, ++iota));
indicesCommonStats.getDocs().add(new DocsStats(++iota, ++iota, ++iota, ++iota));
Map<String, FieldDataStats.GlobalOrdinalsStats.GlobalOrdinalFieldStats> fieldOrdinalStats = new HashMap<>();
fieldOrdinalStats.put(
randomAlphaOfLength(4),
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -580,10 +580,10 @@ public void testRolloverAliasToDataStreamFails() throws Exception {

private IndicesStatsResponse createIndicesStatResponse(String indexName, long totalDocs, long primariesDocs) {
final CommonStats primaryStats = mock(CommonStats.class);
when(primaryStats.getDocs()).thenReturn(new DocsStats(primariesDocs, 0, between(1, 10000)));
when(primaryStats.getDocs()).thenReturn(new DocsStats(primariesDocs, 0, between(1, 10000), randomLongBetween(0, primariesDocs)));

final CommonStats totalStats = mock(CommonStats.class);
when(totalStats.getDocs()).thenReturn(new DocsStats(totalDocs, 0, between(1, 10000)));
when(totalStats.getDocs()).thenReturn(new DocsStats(totalDocs, 0, between(1, 10000), randomLongBetween(0, totalDocs)));

final IndicesStatsResponse response = mock(IndicesStatsResponse.class);
when(response.getPrimaries()).thenReturn(primaryStats);
Expand Down Expand Up @@ -612,10 +612,10 @@ private IndicesStatsResponse createAliasToMultipleIndicesStatsResponse(Map<Strin

private IndexStats createIndexStats(long primaries, long total) {
final CommonStats primariesCommonStats = mock(CommonStats.class);
when(primariesCommonStats.getDocs()).thenReturn(new DocsStats(primaries, 0, between(1, 10000)));
when(primariesCommonStats.getDocs()).thenReturn(new DocsStats(primaries, 0, between(1, 10000), randomLongBetween(0, primaries)));

final CommonStats totalCommonStats = mock(CommonStats.class);
when(totalCommonStats.getDocs()).thenReturn(new DocsStats(total, 0, between(1, 10000)));
when(totalCommonStats.getDocs()).thenReturn(new DocsStats(total, 0, between(1, 10000), randomLongBetween(0, total)));

IndexStats indexStats = mock(IndexStats.class);
when(indexStats.getPrimaries()).thenReturn(primariesCommonStats);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ public void testShrink() {
ResizeNumberOfShardsCalculator.ShrinkShardsCalculator shrinkShardsCalculator =
new ResizeNumberOfShardsCalculator.ShrinkShardsCalculator(
new StoreStats(between(1, 100), between(0, 100), between(1, 100)),
(i) -> new DocsStats(between(1, 1000), between(1, 1000), between(0, 10000))
(i) -> new DocsStats(between(1, 1000), between(1, 1000), between(0, 10000), between(0, 10000))
);
assertEquals(4, shrinkShardsCalculator.calculate(4, null, indexMetadata));
assertEquals(1, shrinkShardsCalculator.calculate(null, null, indexMetadata));
Expand All @@ -41,7 +41,7 @@ public void testShrink() {
IllegalStateException.class,
() -> new ResizeNumberOfShardsCalculator.ShrinkShardsCalculator(
new StoreStats(between(1, 100), between(0, 100), between(1, 100)),
(i) -> new DocsStats(Integer.MAX_VALUE, between(1, 1000), between(1, 100))
(i) -> new DocsStats(Integer.MAX_VALUE, between(1, 1000), between(1, 100), between(0, 10000))
).validate(1, indexMetadata)
).getMessage().startsWith("Can't merge index with more than [2147483519] docs - too many documents in shards ")
);
Expand Down
Loading