- Ingest V2 is now the default ingest path. The
/api/v1/{index}/ingestendpoint transparently routes to V2. - The
rest_listen_porttop-level config field is deprecated; userest.listen_portunder the newrestblock. The old field still works but emits a warning. - Stemming is now restricted to the
multilangcargo feature (#6085); the previously bundled generic stemmer is no longer available by default. - The unused multilang tokenizer feature was removed (#6154).
- Metastore format: new
maturityfield and compaction columns on splits; PostgreSQL metastore migrates automatically on first start.
- Add Ingest V2 — now the default (#5600, #5566, #5463, #5375, #5350, #5252, #5202, #6078, #6185, #6203, #6207, #6217, #6249)
- Offload leaf-search work to AWS Lambda functions — searchers can farm out part of their workload to Lambda (#6157, #5c1e60f)
- Add SQS source (#5374, #5335, #5148)
- Add Jaeger v2 support (#6023)
- Extract and propagate W3C
traceparentheader on incoming HTTP requests (#6224) - Support configurable OTLP exporter protocol for traces and logs (#6254)
- Export internal logs via OTLP exporter (#6142)
- Add
QW_LOG_FORMAT=DDGJSON log formatter (#6215) - Add ES-compatible endpoints for Trino connector support (#6168)
- Elasticsearch DSL: prefix and wildcard queries (#6000),
case_insensitiveparameter on supported queries (#6005), regexp shorthand, concatenate-fields exposure, andtext→keywordmapping in_mapping(#6208),index_filteron field capabilities API (#6102),ignore_unavailablequery parameter (#5971) - Make Elasticsearch
TermsQueryuseTermSetQueryinternally for better performance (#6151) - Add composite aggregation (#6214) and aggregations alias (#6314)
- Add
skip_aggregation_finalizationtoSearchRequest(#6145) - Add
list_index_statsendpoint (#6035) - Add
validate_docsingest setting (#5984); validate doc-mapping updates (#5988) - Support updating the doc mapper through the API (#5253)
- Add CORS debug mode (#5955)
- Add config to fail search when it targets too many splits (#6009)
- Differentiate leaf- and root-level search timeouts (#6255)
- Predicate cache in leaf search (#6024); skip CPU work when it cannot improve the result (#6001); propagate cancellation within leaf search (#6002)
- Set
searcher.warmup_single_split_initial_allocationdefault to 300 MB (#c34966c6) - Rebalance shards when ingester status changes (#6185) and improved rebalance algorithm (#6018)
- Add object storage metrics for GCS (#5889)
- Expose per-shard load configuration and disable the Tokio LIFO slot by default (#5898, #5899)
- Redact sensitive information in developer API debug output (#6191)
- Disable control plane check for searcher (#5599, #5360)
- Partially implement
_elastic/_cluster/health(#5595) - Make Jaeger span attribute-to-tag conversion exhaustive (#5574)
- Use
content_length_limitfor ES bulk limit (#5573) - Limit and monitor warmup memory usage (#5568)
- Add eviction metrics to caches (#5523)
- Record object storage request latencies (#5521)
- Throttle the janitor to prevent overloading the metastore (#5510)
- Prevent single split searches from different
leaf_searchfrom interleaving (#5509) - Retry on S3 internal error (#5504); make more S3 errors retryable (#5384)
- Allow specifying OTEL index ID in header (#5503)
- Add a metric to count storage errors and their error code (#5497)
- Add support for concatenated fields (#4773, #5369, #5331)
- Add number of splits per root/leaf search histograms (#5472)
- Introduce a searcher config option to timeout get requests (#5467)
- Add fingerprint to task in cluster state (#5464)
- Enrich root/leaf search spans with number of docs and splits (#5450)
- Add some additional search metrics (#5447)
- Improve GC resilience and add metrics (#5420)
- Enable force shutdown with 2nd Ctrl+C (#5414)
- Add
request_timeout_secsto searcher config (#5402) - Memoize S3 client (#5377)
- Add more env var config for Postgres (#5365)
- Enable str fast field range queries (#5324)
- Allow querying non-existing fields (#5308)
- Add optional special handling for hex in code tokenizer (#5200)
- Added a circuit breaker layer (#5134)
- Follow AWS hints for Lambda retries (#6195)
- Improve cluster sizing documentation for control plane, metastore and janitor (#6202)
- Various performance optimizations in Tantivy (https://github.com/quickwit-oss/tantivy/blob/main/CHANGELOG.md)
- Parse datetimes and timestamps with leading and/or trailing whitespace (#5544)
- Restrict maturity period to retention (#5543)
- Wait for merge at end of local ingest (#5542)
- Log PostgreSQL metastore error (#5530)
- Update azure multipart policy (#5553)
- Stop relying on our own version of pulsar-rs (#5487)
- Handle nested OTLP values in attributes and log bodies (#5485)
- Improve merge pipeline finalization (#5475)
- Allow failed splits in root search (#5440)
- Batch delete from GC (#5404, #5380)
- Change default timestamps in OTEL logs (#5366)
- Only return root spans for Jaeger HTTP API (#5358)
- Share aggregation limit on node (#5357)
- Make regex lenient to start and end anchors (#6089) — later reverted, see Fixed
- Use correct precision level for fastfield-based term queries on datetime (#6027)
- Disable the Tokio LIFO slot and tune per-shard default load (#5898)
- Improve control-plane logging (#6003)
- Upgraded Tantivy to 9b61999 / 04beab3b29 with multiple search/aggregation perf improvements (see Tantivy CHANGELOG)
- Fix existence queries for nested fields (#5581)
- Fix lenient option with wildcard queries (#5575)
- Fix incompatible ES Java date format (#5462)
- Fix bulk api response order (#5434)
- Fix pulsar finalize (#5471)
- Fix pulsar URI scheme (#5470)
- Fix grafana searchers dashboard (#5455)
- Fix jaeger http endpoint (#5378)
- Fix file re-ingestion after EOF (#5330)
- Fix configuration interpolation (#5403)
- Fix jaeger duration parse error (#5518)
- Fix unit conversion in jaeger http search endpoint (#5519)
- Fix Kinesis source panic on resharding (#5912)
- Fix Azure multipart upload data corruption (#5919)
- Fix leaf list fields merging logic (#5908)
- Fix empty intermediate aggregation results merge (#5930)
- Fix error when running a scoring query with cache (#6025)
- Fix
f64not working properly with concatenated fields (#6074) - Fix wrong range result when datetime was inferred in JSON type (#6048)
- Fix infinite-loop OOM bug caused by rebalancing shards from unavailable ingesters (#6078)
- Fix index reincarnation routing bug (#6217)
- Fix bug when deploying a new routing table (#6249)
- Fix Chitchat gossip bug triggering excess gRPC traffic (#6082)
- Revert over-eager anchor lenience in regex queries (#6089)
skip_aggregation_finalizationfixes for composite aggregations- Jaeger: query resource attributes when the Jaeger request carries tags
- Remove support for 2-digit years in Java datetime parser (#5596)
- Remove
DocMappertrait (#5508) - Remove standalone AWS Lambda deployment mode — the previous dedicated search/indexing Lambda binaries are gone (#5884). AWS Lambda is still supported for search offloading; see the Added section.
- Remove search stream endpoint (#5886)
- Remove mentions of the stream API from the docs (#5958)
- Remove the legacy
/api/v1/{index}/ingest-v2REST endpoint (V2 is now served from/ingest— see Breaking / Migration) - Remove the unused multilang tokenizer feature (#6154)
- Restrict stemming to the
multilangfeature (#6085)
- Bug in the chitchat digest message serialization (chitchat#144)
- Remove some noisy logs (#4447)
- Add
/{index}/_statsand/_statsES API (#4442) - Use
search_afterin ES scroll API (#4280) - Add support for wildcard exclusion in index patterns (#4458)
- Add
.support in DSL indentifiers (#3989) - Add cat indices ES API (#4465)
- Limit concurrent merges (#4473)
- Add Index Template API and auto create index (#4456) (only available with ingest V2)
- Add support for compressed ES
_bulkrequests (#4506) - Add support for slash
/character in field names (#4510) - Handle SIGTERM shutdown signal (#4539)
- Add
start_timestampandend_timestampfilter to ES_field_capsAPI (#4547) - Limit the number of merge pipelines that can be spawned concurrently (#4574)
- Add support for
_source_excludesand_source_includesquery parameters in ES API (#4572) - Add gRPC metrics layer to clients and servers (#4591)
- Add additional cluster metrics (#4597)
- Add index patterns query param on GET
/indexesendpoint (#4600) - Add support for GCS file backed metastore (#4604)
- Add default search fields for OTEL traces index (#4602)
- Add support for delete index in ES API (#4606)
- Add a handler to dynamically change the log level (#4662)
- Add REST endpoint to parse a query into a query AST (#4652)
- Add postgresql index and use
INinstead of manyOR(#4670) - Add support for
_source_excludes,_source_includes,extra_filtersin_msearchES API (#4696) - Handle
track_total_sizeon request ES body (#4710) - Add a metric for the number number of indexes (#4711)
- Add various performance optimizations in Quickwit and Tantivy
More details in tantivy's changelog.
- Fix aggregation result on empty index (#4449)
- Fix Gzip file source (#4457)
- Rate limit noisy logs (#4483)
- Prevent the exponential backoff from overflowing after 64 attempts (#4501)
- Remove field presence in ES
_field_capsAPI (#4492) - Remove
sourcein ES parameter, remove unsupported fieldfieldsin response (#4590) - Fix aggregation
split_sizeparameter, add docs and test (#4627) - Various fixes in chitchat (gossip): more details in chitchat commit history
- Various fixes in mrecordlog (WAL): more details in mrecordlog commit history
- (Breaking) Add ZSTD compression to chitchat's Deltas
To deploy Quickwit 0.8.0, you must either:
- shutdown down your cluster entirely before deploying, or
- restart all the nodes of your cluster after deploying.
Because we made some breaking changes in the gossip protocol (chitchat), nodes running different versions of Quickwit cannot communicate with each other and crash upon receiving messages that do not match their release version. The new protocol is now versioned, and future updates of the gossip protocol will be backward compatible.
- Add es _count API (#4410)
- Add _elastic/_field_caps API (#4350)
- Make gRPC message size configurable (#4388)
- Add API endpoint to get some control-plan internal info (#4339)
- Add Google Cloud Storage Implementation available for storage paths starting with
gs://(#4344)
- Return 404 on index not found in ES Bulk API (#4425)
- Allow $ and @ characters in field names (#4413)
- Assign all sources/shards, even if this requires exceeding the indexer #4363
- Fix traces doc mapping (service name set as fast) and update default otel logs index ID to
otel-logs-v0_7(#4401) - Fix parsing multi-line queries (#4409)
- Fix range query for optional fast field panics with Index out of bounds (#4362)
Quickwit 0.7.1 will create the new index otel-logs-v0_7 which is now used by default when ingesting data with the OTEL gRPC and HTTP API.
In the traces index otel-traces-v0_7, the service_name field is now fast. No migration is done if otel-traces-v0_7 already exists. If you want service_name field to be fast, you have to delete first the existing otel-traces-v0_7 index or create your own index.
- Elasticsearch-compatible API
- Added scroll and search_after APIs and support for multi-index search queries
- Added exists, multi-match, match phrase prefix, match bool prefix, bool queries
- Added
_field_capsAPI
- Added support for OTLP over HTTP API (Protobuf only) (#4335)
- Added Jaeger REST endpoints for Grafana tracing support (#4197)
- Added support for injecting custom HTTP headers and moved REST config parameters into REST config section (#4198)
- Added support for OTLP trace data in arbitrary sources
- Commit Kafka offsets on suggest truncate (#3638)
- Honor
auto.offset.resetparameter in Kafka source (#4095) - Added exact count optimization (#4019)
- Added stream splits gRPC (#4109)
- Adding a split cache in Searchers (#3857)
- Added
coerceandoutput_formatoptions for numeric fields (#3704) - Added
PhraseMatchQueryandMultiMatchQuery(#3727) - Added Elasticsearch's
TermsQuery(#3747) - Added GCP PubSub source (#3720)
- Parse timestamp strings (#3639)
- Added Digital Ocean storage flavor (#3632)
- Added new tokenizers:
source_code_default,source_code,multilang(#3647, #3655, #3608)
- Fixed dates in UI (#4277)
- Fixed duplicate splits planned on pipeline crash-respawn (#3854)
- Fixed sorting (#3799)
More details in tantivy's changelog.
-
Improve OTEL traces index config (#4311)
- OTEL endpoints are now using by default indexes
otel-logs-v0_7andotel-traces-v0_7instead ofotel-logs-v0_6andotel-traces-v0_6 - OTEL indexes have more fields stored as "fast" and have Trace and Span ID bytes field in hex format
- OTEL endpoints are now using by default indexes
-
Increased the gRPC payload limits from 10MiB to 20MiB (#4227)
-
Reject malformed Elasticsearch API requests (#4175)
-
Better logging when doc processing fails (#4323)
-
Search performance improvements
-
Indexing performance improvements
The format of the index and internal objects stored in the metastore of 0.7 is backward compatible with 0.6.
If you are using the OTEL indexes and ingesting data into indexes the otel-logs-v0_6 and otel-traces-v0_6, you must stop indexing before upgrading.
Indeed, the first time you start Quickwit 0.7, it will update the doc mapping fields of Trace ID and Span ID of those two indexes by changing their input/output formats from base64 to hex. This is automatic: you don't have to perform any manual operation.
Quickwit 0.7 will create new indexes otel-logs-v0_7 and otel-traces-v0_7, which are now used by default when ingesting data with the OTEL gRPC and HTTP API. The Jaeger gRPC and HTTP APIs will query both otel-traces-v0_6 and otel-traces-v0_7 by default.
It's possible to define the index ID you want to use for OTEL gRPC endpoints and Jaeger gRPC API by setting the request header qw-otel-logs-index or qw-otel-traces-index to the index ID you want to target.
- Support of phrase prefix queries in the query language.
- Fix timestamp field which was not allowed when defined in an object mapping.
- Fix querying of integer on a JSON field (no document were returned).
- Elasticsearch/Opensearch compatible API.
- New columnar format:
- Fast fields can now have any cardinality (Optional, Multivalued, restricted). In fact cardinality is now only used to format the output.
- Dynamic Fields are now fast fields.
- String fast fields now can be normalized.
- Various parameters of object storages can now be configured.
- The ingest API makes it possible to force a commit, or wait for a scheduled commit to occur.
- Ability to parse non-JSON data using VRL to extract some structure from documents.
- Object storage can now use the
virtual-hosted–style. date_histogramaggregation.percentilesaggregation.- Added support for Prefix Phrase query.
- Added support for range queries.
- The query language now supports different date formats.
- Added support for base16 input/output configuration for bytes field. You can search for bytes fields using base16 encoded values.
- Autotagging: fields used in the partition key are automatically added to tags.
- Added arm64 docker image.
- Added CORS configuration for the REST API.
- Major bug fix that required to restart quickwit when deleting and recreating an index with the same name.
- The number of concurrent GET requests to object stores is now limited. This fixes a bug observed with when requested a lot of documents from MinIO.
- Quickwit now searches into resource attributes when receiving a Jaeger request carrying tags
- Object storage can be figured to:
- avoid Bulk delete API (workaround for Google Cloud Storage).
- Use virtual-host style addresses (workaround for Alibaba Object Storage Service).
- Fix aggregation min doc_count empty merge bug.
- Fix: Sort order for term aggregations.
- Switch to ms in histogram for date type (aligning with ES).
- Search performance improvement.
- Aggregation performance improvement.
- Aggregation memory improvement.
More details in tantivy's changelog.
- Datetime now have up to a nanosecond precision.
- By default, quickwit now uses the node's hostname as the default node ID.
- By default, Quickwit is in dynamic mode and all dynamic fields are marked as fast fields.
- JSON field uses by default the raw tokanizer and is set to fast field.
- Various performance/compression improvements.
- OTEL indexes Trace ID and Span ID are now bytes fields.
- OTEL indexes stores timestamps with nanosecond precision.
- pan status is now indexed in the OTEL trace index.
- Default and raw tokenizers filter tokesn longer than 255 bytes instead of 40 bytes.
- gRPC OpenTelemetry Protocol support for traces
- gRPC OpenTelemetry Protocol support for logs
- Control plane (indexing tasks scheduling)
- Ingest API rate limiter
- Pulsar source
- VRL transform for data sources
- REST API enhanced to fully manage indexes, sources, and splits
- OpenAPI specification and swagger UI for all REST available endpoints
- Large responses from REST API can be compressed
- Add bulk stage splits method to metastore
- MacOS M1 binary
- Doc mapping field names starting with
_are now valid
- Fix UI index completion on search page
- Fix CLI index describe command to show stats on published splits
- Fix REST API to always return on error a body formatted as
{"message": "error message"} - Fixed REST status code when deleting unexisting index, source and when fetching splits on unexisting index
- Source config schema (breaking or not? use serde rename to be not breaking?)
- RocksDB replaced by mrecordlog to store ingest API queues records
- (Breaking) Indexing partition key new DSL
- (Breaking) Helm chart updated with the new CLI
- (Breaking) CLI indexes, sources, and splits commands use the REST API
- (Breaking) Index new format: you need to reindex all your data
- Boolean, datetime, and IP address fields
- Chinese tokenizer
- Distributed indexing (Kafka only)
- gRPC metastore server
- Index partitioning
- Kubernetes
- Node config templating
- Prometheus metrics
- Retention policies
- REST API for CRUD operations on indexes/sources
- Support for Azure Blob Storage
- Support for BM25 document scoring
- Support for deletions
- Support for slop in phrase queries
- Support for snippeting
- Fixed cache misses during search fetch docs phase
- Fixed credentials leak in metastore URI
- Fixed GC scalability issues
- Fixed support for multi-source
-
Changed default docstore block size to 1 MiB and compression algorithm to ZSTD
-
Quickwit now relies on sqlx rather than Diesel for PostgreSQL interactions. Migrating from 0.3 should work as expected. Migrating from earlier version however is not supported.
- Removed support for i64 as timestamp field
- Removed support for sorting index by field
- Forbid access to paths with
..at storage level
- Add support for Google Cloud Storage
- Sort hits by timestamp desc by default in search UI
- Add
descriptionattribute to field mappings - Display split state in output of
quickwit split listcommand
- Clean up local split cache after index deletion
- Fix API URLs displayed for copy and paste in UI
- Fix custom S3 endpoint with trailing
/ - Fix
quickwit index createcommand with--overwriteoption
- Embedded UI for displaying search hits and cluster state
- Schemaless indexing with JSON field
- Ingest API (Elasticsearch-compatible)
- Aggregation queries
- Support for Amazon Kinesis
- Switched cluster membership algorithm from S.W.I.M. to Chitchat
- u64 as date field
- Query validation against index schema before dispatch to leaf nodes (#1109, @linxGnu)
- Support for custom S3 endpoint (#1108)
- Warm up terms and fastfields concurrently (#1147)
- Minor bug in leaf search stream (#1110)
- Default index root URI and metastore URI correctly default to data dir (#1140, @ddelemeny)
- QW_ENV environment variable
- Compiled binaries with Rust 1.58.1, which fixes CVE-2022-21658