Skip to content

Commit a5527b4

Browse files
authored
Merge pull request #613 from apache/6.1.X_align_with_master
This aligns 6.1.X with master
2 parents 6724a39 + 9fa8799 commit a5527b4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+173
-93
lines changed

.github/workflows/check_cpp_files.yml

+7-2
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,17 @@ jobs:
1212
runs-on: ubuntu-latest
1313
steps:
1414
- name: Checkout
15-
uses: actions/checkout@v3
15+
uses: actions/checkout@v4
1616
- name: Checkout C++
17-
uses: actions/checkout@v3
17+
uses: actions/checkout@v4
1818
with:
1919
repository: apache/datasketches-cpp
2020
path: cpp
21+
- name: Setup Java
22+
uses: actions/setup-java@v2
23+
with:
24+
java-version: '11'
25+
distribution: 'temurin'
2126
- name: Configure C++ build
2227
run: cd cpp/build && cmake .. -DGENERATE=true
2328
- name: Build C++ unit tests

pom.xml

+1-1
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ under the License.
3333

3434
<groupId>org.apache.datasketches</groupId>
3535
<artifactId>datasketches-java</artifactId>
36-
<version>6.1.1</version>
36+
<version>6.2.0-SNAPSHOT</version>
3737
<packaging>jar</packaging>
3838

3939
<name>${project.artifactId}</name>

src/main/java/org/apache/datasketches/filters/bloomfilter/BloomFilter.java

+2-2
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@
3333
import org.apache.datasketches.memory.XxHash;
3434

3535
/**
36-
* <p>A Bloom filter is a data structure that can be used for probabilistic
37-
* set membership.</p>
36+
* A Bloom filter is a data structure that can be used for probabilistic
37+
* set membership.
3838
*
3939
* <p>When querying a Bloom filter, there are no false positives. Specifically:
4040
* When querying an item that has already been inserted to the filter, the filter will

src/main/java/org/apache/datasketches/filters/bloomfilter/BloomFilterBuilder.java

+2-2
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@
2525
import org.apache.datasketches.memory.WritableMemory;
2626

2727
/**
28-
* <p>This class provides methods to help estimate the correct parameters when
29-
* creating a Bloom filter, and methods to create the filter using those values.</p>
28+
* This class provides methods to help estimate the correct parameters when
29+
* creating a Bloom filter, and methods to create the filter using those values.
3030
*
3131
* <p>The underlying math is described in the
3232
* <a href='https://en.wikipedia.org/wiki/Bloom_filter#Optimal_number_of_hash_functions'>

src/main/java/org/apache/datasketches/frequencies/ItemsSketch.java

+2-2
Original file line numberDiff line numberDiff line change
@@ -55,10 +55,10 @@
5555
import org.apache.datasketches.memory.WritableMemory;
5656

5757
/**
58-
* <p>This sketch is useful for tracking approximate frequencies of items of type <i>&lt;T&gt;</i>
58+
* This sketch is useful for tracking approximate frequencies of items of type <i>&lt;T&gt;</i>
5959
* with optional associated counts (<i>&lt;T&gt;</i> item, <i>long</i> count) that are members of a
6060
* multiset of such items. The true frequency of an item is defined to be the sum of associated
61-
* counts.</p>
61+
* counts.
6262
*
6363
* <p>This implementation provides the following capabilities:</p>
6464
* <ul>

src/main/java/org/apache/datasketches/frequencies/LongsSketch.java

+2-2
Original file line numberDiff line numberDiff line change
@@ -54,9 +54,9 @@
5454
import org.apache.datasketches.memory.WritableMemory;
5555

5656
/**
57-
* <p>This sketch is useful for tracking approximate frequencies of <i>long</i> items with optional
57+
* This sketch is useful for tracking approximate frequencies of <i>long</i> items with optional
5858
* associated counts (<i>long</i> item, <i>long</i> count) that are members of a multiset of
59-
* such items. The true frequency of an item is defined to be the sum of associated counts.</p>
59+
* such items. The true frequency of an item is defined to be the sum of associated counts.
6060
*
6161
* <p>This implementation provides the following capabilities:</p>
6262
* <ul>

src/main/java/org/apache/datasketches/frequencies/PreambleUtil.java

+3-4
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,11 @@
3131
/**
3232
* This class defines the preamble data structure and provides basic utilities for some of the key
3333
* fields.
34-
* <p>
35-
* The intent of the design of this class was to isolate the detailed knowledge of the bit and byte
34+
*
35+
* <p>The intent of the design of this class was to isolate the detailed knowledge of the bit and byte
3636
* layout of the serialized form of the sketches derived from the Sketch class into one place. This
3737
* allows the possibility of the introduction of different serialization schemes with minimal impact
38-
* on the rest of the library.
39-
* </p>
38+
* on the rest of the library.</p>
4039
*
4140
* <p>
4241
* MAP: Low significance bytes of this <i>long</i> data structure are on the right. However, the

src/main/java/org/apache/datasketches/hash/MurmurHash3.java

-2
Original file line numberDiff line numberDiff line change
@@ -29,10 +29,8 @@
2929
import org.apache.datasketches.memory.Memory;
3030

3131
/**
32-
* <p>
3332
* The MurmurHash3 is a fast, non-cryptographic, 128-bit hash function that has
3433
* excellent avalanche and 2-way bit independence properties.
35-
* </p>
3634
*
3735
* <p>
3836
* Austin Appleby's C++

src/main/java/org/apache/datasketches/hash/package-info.java

+1-2
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,11 @@
1818
*/
1919

2020
/**
21-
* <p>The hash package contains a high-performing and extended Java implementations
21+
* The hash package contains a high-performing and extended Java implementations
2222
* of Austin Appleby's 128-bit MurmurHash3 hash function originally coded in C.
2323
* This core MurmurHash3.java class is used throughout many of the sketch classes for consistency
2424
* and as long as the user specifies the same seed will result in coordinated hash operations.
2525
* This package also contains an adaptor class that extends the basic class with more functions
2626
* commonly associated with hashing.
27-
* </p>
2827
*/
2928
package org.apache.datasketches.hash;

src/main/java/org/apache/datasketches/kll/KllDoublesHelper.java

+1
Original file line numberDiff line numberDiff line change
@@ -312,6 +312,7 @@ private static void randomlyHalveUpDoubles(final double[] buf, final int start,
312312

313313
/**
314314
* Compression algorithm used to merge higher levels.
315+
*
315316
* <p>Here is what we do for each level:</p>
316317
* <ul><li>If it does not need to be compacted, then simply copy it over.</li>
317318
* <li>Otherwise, it does need to be compacted, so...

src/main/java/org/apache/datasketches/kll/KllDoublesSketch.java

+1
Original file line numberDiff line numberDiff line change
@@ -278,6 +278,7 @@ public final void merge(final KllSketch other) {
278278

279279
/**
280280
* {@inheritDoc}
281+
*
281282
* <p>The parameter <i>k</i> will not change.</p>
282283
*/
283284
@Override

src/main/java/org/apache/datasketches/kll/KllFloatsHelper.java

+1
Original file line numberDiff line numberDiff line change
@@ -312,6 +312,7 @@ private static void randomlyHalveUpFloats(final float[] buf, final int start, fi
312312

313313
/**
314314
* Compression algorithm used to merge higher levels.
315+
*
315316
* <p>Here is what we do for each level:</p>
316317
* <ul><li>If it does not need to be compacted, then simply copy it over.</li>
317318
* <li>Otherwise, it does need to be compacted, so...

src/main/java/org/apache/datasketches/kll/KllFloatsSketch.java

+1
Original file line numberDiff line numberDiff line change
@@ -278,6 +278,7 @@ public final void merge(final KllSketch other) {
278278

279279
/**
280280
* {@inheritDoc}
281+
*
281282
* <p>The parameter <i>k</i> will not change.</p>
282283
*/
283284
@Override

src/main/java/org/apache/datasketches/kll/KllItemsHelper.java

+1
Original file line numberDiff line numberDiff line change
@@ -346,6 +346,7 @@ static <T> void updateItem(final KllItemsSketch<T> itmSk, final T item, final lo
346346

347347
/**
348348
* Compression algorithm used to merge higher levels.
349+
*
349350
* <p>Here is what we do for each level:</p>
350351
* <ul><li>If it does not need to be compacted, then simply copy it over.</li>
351352
* <li>Otherwise, it does need to be compacted, so...

src/main/java/org/apache/datasketches/kll/KllLongsHelper.java

+1
Original file line numberDiff line numberDiff line change
@@ -312,6 +312,7 @@ private static void randomlyHalveUpLongs(final long[] buf, final int start, fina
312312

313313
/**
314314
* Compression algorithm used to merge higher levels.
315+
*
315316
* <p>Here is what we do for each level:</p>
316317
* <ul><li>If it does not need to be compacted, then simply copy it over.</li>
317318
* <li>Otherwise, it does need to be compacted, so...

src/main/java/org/apache/datasketches/kll/KllLongsSketch.java

+1
Original file line numberDiff line numberDiff line change
@@ -278,6 +278,7 @@ public final void merge(final KllSketch other) {
278278

279279
/**
280280
* {@inheritDoc}
281+
*
281282
* <p>The parameter <i>k</i> will not change.</p>
282283
*/
283284
@Override

src/main/java/org/apache/datasketches/quantiles/DoublesSketch.java

+1
Original file line numberDiff line numberDiff line change
@@ -506,6 +506,7 @@ public QuantilesDoublesSketchIterator iterator() {
506506

507507
/**
508508
* {@inheritDoc}
509+
*
509510
* <p>The parameter <i>k</i> will not change.</p>
510511
*/
511512
@Override

src/main/java/org/apache/datasketches/quantiles/package-info.java

+1-2
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,8 @@
1818
*/
1919

2020
/**
21-
* <p>The quantiles package contains stochastic streaming algorithms that enable single-pass
21+
* The quantiles package contains stochastic streaming algorithms that enable single-pass
2222
* analysis of the distribution of a stream of quantiles.
23-
* </p>
2423
*
2524
* @see org.apache.datasketches.quantiles.DoublesSketch
2625
* @see org.apache.datasketches.quantiles.ItemsSketch

src/main/java/org/apache/datasketches/quantilescommon/DoublesSortedView.java

+4-4
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ public interface DoublesSortedView extends SortedView {
3838
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
3939
* (of the same type as the input items)
4040
* that divide the item input domain into <i>m+1</i> overlapping intervals.
41-
*
41+
* <blockquote>
4242
* <p>The start of each interval is below the lowest item retained by the sketch
4343
* corresponding to a zero rank or zero probability, and the end of the interval
4444
* is the rank or cumulative probability corresponding to the split point.</p>
@@ -55,7 +55,7 @@ public interface DoublesSortedView extends SortedView {
5555
* </ul>
5656
*
5757
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
58-
*
58+
* </blockquote>
5959
* @param searchCrit the desired search criteria.
6060
* @return a discrete CDF array of m+1 double ranks (or cumulative probabilities) on the interval [0.0, 1.0].
6161
* @throws IllegalArgumentException if sketch is empty.
@@ -100,7 +100,7 @@ default double[] getCDF(double[] splitPoints, QuantileSearchCriteria searchCrit)
100100
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
101101
* (of the same type as the input items)
102102
* that divide the item input domain into <i>m+1</i> consecutive, non-overlapping intervals.
103-
*
103+
* <blockquote>
104104
* <p>Each interval except for the end intervals starts with a split point and ends with the next split
105105
* point in sequence.</p>
106106
*
@@ -124,7 +124,7 @@ default double[] getCDF(double[] splitPoints, QuantileSearchCriteria searchCrit)
124124
* </ul>
125125
*
126126
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
127-
*
127+
* </blockquote>
128128
* @param searchCrit the desired search criteria.
129129
* @return a PMF array of m+1 probability masses as doubles on the interval [0.0, 1.0].
130130
* @throws IllegalArgumentException if sketch is empty.

src/main/java/org/apache/datasketches/quantilescommon/FloatsSortedView.java

+4-4
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ public interface FloatsSortedView extends SortedView {
3838
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
3939
* (of the same type as the input items)
4040
* that divide the item input domain into <i>m+1</i> overlapping intervals.
41-
*
41+
* <blockquote>
4242
* <p>The start of each interval is below the lowest item retained by the sketch
4343
* corresponding to a zero rank or zero probability, and the end of the interval
4444
* is the rank or cumulative probability corresponding to the split point.</p>
@@ -55,7 +55,7 @@ public interface FloatsSortedView extends SortedView {
5555
* </ul>
5656
*
5757
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
58-
*
58+
* </blockquote>
5959
* @param searchCrit the desired search criteria.
6060
* @return a discrete CDF array of m+1 double ranks (or cumulative probabilities) on the interval [0.0, 1.0].
6161
* @throws IllegalArgumentException if sketch is empty.
@@ -100,7 +100,7 @@ default double[] getCDF(float[] splitPoints, QuantileSearchCriteria searchCrit)
100100
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
101101
* (of the same type as the input items)
102102
* that divide the item input domain into <i>m+1</i> consecutive, non-overlapping intervals.
103-
*
103+
* <blockquote>
104104
* <p>Each interval except for the end intervals starts with a split point and ends with the next split
105105
* point in sequence.</p>
106106
*
@@ -124,7 +124,7 @@ default double[] getCDF(float[] splitPoints, QuantileSearchCriteria searchCrit)
124124
* </ul>
125125
*
126126
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
127-
*
127+
* </blockquote>
128128
* @param searchCrit the desired search criteria.
129129
* @return a PMF array of m+1 probability masses as doubles on the interval [0.0, 1.0].
130130
* @throws IllegalArgumentException if sketch is empty.

src/main/java/org/apache/datasketches/quantilescommon/GenericSortedView.java

+4-4
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ public interface GenericSortedView<T> extends PartitioningFeature<T>, SketchPar
4747
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
4848
* (of the same type as the input items)
4949
* that divide the item input domain into <i>m+1</i> overlapping intervals.
50-
*
50+
* <blockquote>
5151
* <p>The start of each interval is below the lowest item retained by the sketch
5252
* corresponding to a zero rank or zero probability, and the end of the interval
5353
* is the rank or cumulative probability corresponding to the split point.</p>
@@ -64,7 +64,7 @@ public interface GenericSortedView<T> extends PartitioningFeature<T>, SketchPar
6464
* </ul>
6565
*
6666
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
67-
*
67+
* </blockquote>
6868
* @param searchCrit the desired search criteria.
6969
* @return a discrete CDF array of m+1 double ranks (or cumulative probabilities) on the interval [0.0, 1.0].
7070
* @throws IllegalArgumentException if sketch is empty.
@@ -116,7 +116,7 @@ default double[] getCDF(final T[] splitPoints, final QuantileSearchCriteria sear
116116
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
117117
* (of the same type as the input items)
118118
* that divide the item input domain into <i>m+1</i> consecutive, non-overlapping intervals.
119-
*
119+
* <blockquote>
120120
* <p>Each interval except for the end intervals starts with a split point and ends with the next split
121121
* point in sequence.</p>
122122
*
@@ -140,7 +140,7 @@ default double[] getCDF(final T[] splitPoints, final QuantileSearchCriteria sear
140140
* </ul>
141141
*
142142
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
143-
*
143+
* </blockquote>
144144
* @param searchCrit the desired search criteria.
145145
* @return a PMF array of m+1 probability masses as doubles on the interval [0.0, 1.0].
146146
* @throws IllegalArgumentException if sketch is empty.

src/main/java/org/apache/datasketches/quantilescommon/LongsSortedView.java

+4-4
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ public interface LongsSortedView extends SortedView {
3838
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
3939
* (of the same type as the input items)
4040
* that divide the item input domain into <i>m+1</i> overlapping intervals.
41-
*
41+
* <blockquote>
4242
* <p>The start of each interval is below the lowest item retained by the sketch
4343
* corresponding to a zero rank or zero probability, and the end of the interval
4444
* is the rank or cumulative probability corresponding to the split point.</p>
@@ -55,7 +55,7 @@ public interface LongsSortedView extends SortedView {
5555
* </ul>
5656
*
5757
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
58-
*
58+
* </blockquote>
5959
* @param searchCrit the desired search criteria.
6060
* @return a discrete CDF array of m+1 double ranks (or cumulative probabilities) on the interval [0.0, 1.0].
6161
* @throws IllegalArgumentException if sketch is empty.
@@ -100,7 +100,7 @@ default double[] getCDF(long[] splitPoints, QuantileSearchCriteria searchCrit) {
100100
* @param splitPoints an array of <i>m</i> unique, monotonically increasing items
101101
* (of the same type as the input items)
102102
* that divide the item input domain into <i>m+1</i> consecutive, non-overlapping intervals.
103-
*
103+
* <blockquote>
104104
* <p>Each interval except for the end intervals starts with a split point and ends with the next split
105105
* point in sequence.</p>
106106
*
@@ -124,7 +124,7 @@ default double[] getCDF(long[] splitPoints, QuantileSearchCriteria searchCrit) {
124124
* </ul>
125125
*
126126
* <p>It is not recommended to include either the minimum or maximum items of the input stream.</p>
127-
*
127+
* </blockquote>
128128
* @param searchCrit the desired search criteria.
129129
* @return a PMF array of m+1 probability masses as doubles on the interval [0.0, 1.0].
130130
* @throws IllegalArgumentException if sketch is empty.

src/main/java/org/apache/datasketches/quantilescommon/QuantilesAPI.java

+2-2
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,12 @@
2020
package org.apache.datasketches.quantilescommon;
2121

2222
/**
23-
* <p>This is a stochastic streaming sketch that enables near-real time analysis of the
23+
* This is a stochastic streaming sketch that enables near-real time analysis of the
2424
* approximate distribution of items from a very large stream in a single pass, requiring only
2525
* that the items are comparable.
2626
* The analysis is obtained using the <i>getQuantile()</i> function or the
2727
* inverse functions getRank(), getPMF() (the Probability Mass Function), and getCDF()
28-
* (the Cumulative Distribution Function).</p>
28+
* (the Cumulative Distribution Function).
2929
*
3030
* <p>Given an input stream of <i>N</i> items, the <i>natural rank</i> of any specific
3131
* item is defined as its index <i>(1 to N)</i> in the hypothetical sorted stream of all

0 commit comments

Comments
 (0)