Skip to content

Commit 50bc415

Browse files
committed
BiStream readme
1 parent ca8679c commit 50bc415

File tree

1 file changed

+256
-0
lines changed
  • mug/src/main/java/com/google/mu/util/stream

1 file changed

+256
-0
lines changed
Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
## Why BiStream?
2+
3+
[BiStream](https://google.github.io/mug/apidocs/com/google/mu/util/stream/BiStream.html) is a rapidly growing API in Google's internal "labs" library.
4+
5+
What is it for? Current usage data shows that most people use it to stream through Map or Multimap entries fluently.
6+
7+
Java 8's Stream is a hugely popular API and paradigm. It nicely combines functional programming with conventional Java pragmaticsm. Every `.map()`, `.filter()` line cleanly expresses one thing and one thing only, greatly improving readability while reducing bug rate.
8+
9+
But when it comes down to Map and Multimap entries, things become muddy. For example, transforming then filtering the keys of a Map takes this boilerplate:
10+
```java
11+
map.entrySet().stream()
12+
.map(e -> Map.entry(transform(e.getKey()), e.getValue()))
13+
.filter(e -> isGood(e.getKey())
14+
.collect(toImmutableMap(Map.Entry::getKey, Map.Entry::getValue));
15+
```
16+
17+
Which is equivalent to the following BiStream code:
18+
```java
19+
BiStream.from(map)
20+
.mapKeys(this::transform)
21+
.filterKeys(this::isGood)
22+
.toMap();
23+
```
24+
25+
If you need to flatten a nested Map, it becomes even more awkward:
26+
```java
27+
// Flatten a Map<String, Map<String, V>> to Map<String, V>
28+
// by concatenating the two string keys
29+
map.entrySet().stream()
30+
.flatMap(e -> e.getValue().entrySet().stream()
31+
.map(innerEntry ->
32+
Map.entry(e.getKey() + innerEntry.getKey(), innerEntry.getValue())))
33+
.collect(toImmutableMap(Map.Entry::getKey, Map.Entry::getValue));
34+
```
35+
And the equivalent BiStream code is:
36+
```java
37+
BiStream.from(map)
38+
.flatMap((r, m) -> BiStream.from(m).mapKeys(r::concat))
39+
.toMap();
40+
```
41+
42+
## How to use a BiStream?
43+
44+
Most BiStream operations are pretty straight-forward and natural extension of their Stream counterparts.
45+
46+
You can `filter()`:
47+
```java
48+
Map<PhoneNumber, Address> phoneBook = ...;
49+
50+
// by key
51+
BiStream.from(phoneBook)
52+
.filterKeys(phoneNumber -> phoneNumber.startsWith("312"))
53+
...
54+
55+
// by value
56+
BiStream.from(phoneBook)
57+
.filterValues(address -> address.state().equals("IL"))
58+
...
59+
60+
// by both key and value
61+
BiStream.from(phoneBook)
62+
.filterValues((phoneNumber, address) -> isExpired(phoneNumber, address))
63+
...
64+
```
65+
66+
You can `map()`:
67+
```java
68+
Map<Address, Household> households = ...;
69+
70+
// by key
71+
BiStream.from(households)
72+
.mapKeys(Address::state)
73+
...;
74+
75+
// by value
76+
BiStream.from(households)
77+
.mapValues(Household::income)
78+
...;
79+
80+
// from both key and value
81+
BiStream.from(households)
82+
.mapValues((address, household) -> fiveYearAverageIncome(address, household))
83+
...;
84+
```
85+
86+
You can `flatMap()`:
87+
```java
88+
Map<Address, Household> households = ...;
89+
90+
// by key
91+
BiStream.from(households)
92+
.flatMapKeys(address -> address.getPhoneNumbers().stream())
93+
...;
94+
95+
// by value
96+
BiStream.from(households)
97+
.flatMapValues(household -> household.members().stream())
98+
...;
99+
100+
// by both key and value
101+
BiStream.from(phoneBook)
102+
.flatMap((address, household) -> BiStream.from(household.getMemberMap()))
103+
...;
104+
```
105+
106+
There're `anyMatch()`, `allMatch()`, `noneMatch()`:
107+
```java
108+
Map<PhoneNumber, Address> phoneBook = ...;
109+
BiStream.from(phoneBook)
110+
.anyMatch((phoneNumber, address) -> isInvalid(phoneNumber, address));
111+
...
112+
```
113+
114+
## How to create BiStream?
115+
116+
You can create it from a JDK collection or stream:
117+
```java
118+
// From Map
119+
BiStream.from(map);
120+
121+
// From Multimap
122+
BiStream.from(multimap.entries());
123+
124+
// From any collection or stream
125+
Map<Id, String> idToName = BiStream.from(students, Student::id, Student::name).toMap();
126+
Map<Id, Student> studentMap = BiStream.biStream(students)
127+
.mapKeys(Student::id)
128+
.toMap();
129+
```
130+
131+
Or zip two collections or streams:
132+
```
133+
// If you have a list of requests and responses to pair up:
134+
BiStream.zip(requests, responses)
135+
.mapKeys(Request::fingerprint)
136+
...;
137+
```
138+
139+
Through concatenation:
140+
```java
141+
// a handful of Maps
142+
Map<Request, Response> cached = ...;
143+
Map<Request, Response> onDemand = ...;
144+
Map<Request, Response> all = BiStream.concat(cached, onDemand).toMap();
145+
146+
// a stream of maps
147+
BiStream<K, V> biStream = maps.stream()
148+
.collect(concatenating(BiStream::from));
149+
150+
// a stream of multimaps
151+
BiStream<K, V> biStream = multimaps.stream()
152+
.collect(concatenating(multimap -> BiStream.from(multimap.entries()));
153+
```
154+
155+
With `groupingBy()`:
156+
```java
157+
import static com.google.mu.util.stream.BiStream.groupingBy;
158+
import java.util.stream.Collectors.counting;
159+
160+
Map<City, Long> cityHouseholds = addresses.stream()
161+
.collect(groupingBy(Address::city, counting()));
162+
163+
// Using a BiFunction to reduce group members is more convenient than JDK's groupingBy()
164+
Map<City, Household> richestHouseholds = households.stream()
165+
.collect(groupingBy(Household::city, this::richerHousehold));
166+
```
167+
168+
By splitting strings:
169+
```java
170+
Map<String, String> keyValues =
171+
BiStream.from(flags, flag -> Substring.first('=').splitThenTrim(flag).orElseThrow(...));
172+
173+
// or via a Collector in the middle of a stream pipeline
174+
import static com.google.mu.util.stream.BiStream.toBiStream;
175+
176+
Map<String, String> keyValues = lines.stream()
177+
...
178+
.collect(toBiStream(kv -> Substring.first('=').splitThenTrim(kv).orElseThrow(...)))
179+
.toMap();
180+
```
181+
182+
## How to get data out of BiStream?
183+
184+
Obviously you can call `.toMap()` to create a Map, or use `.mapToObj()` to convert back to a Stream. But the library is extensible and supports flexible options through the concept of `BiCollector`.
185+
186+
You can collect to a Guava ImmutableMap:
187+
```java
188+
ImmutableMap<K, V> all = BiStream.concat(cached, onDemand)
189+
.collect(ImmutableMap::toImmutableMap);
190+
```
191+
192+
> **_TIP:_** At first glance this shouldn't have worked because the guava toImmutableMap() collector requires two Function parameters to get the key and value. But if you try it, it actually works. Why does it compile? This is because what's required here is a `BiCollector`, which is a functional interface with compatible method signature as `toImmutableMap(Function, Function)`.
193+
194+
> **_TIP:_** In general, one can method reference any such Collector factory method as BiCollector. Examples include `ImmutableBiMap::toImmutableBiMap`, `ImmutableListMiltimap::toImmutableListMultimap`, `Collectors::toConcurrentMap` etc.
195+
196+
> **_TIP:_** The Google internal library has special BiCollector methods that look like the following to make them more discoverable and also static-import friendly:
197+
```java
198+
class BiCollectors {
199+
public static <K, V> BiCollector<K, V> toImmutableMap() {
200+
return ImmutableMap::toImmutableMap;
201+
}
202+
}
203+
```
204+
> This is useful because in the same .java file you can static import both `ImmutableMap.toImmutableMap` and `BiCollectors.toImmutableMap`. And when you call `toImmutableMap()` in a context that requires BiCollector, or when you call `toImmutableMap(Foo::id, Foo::name)` in a context that requires Collector, the compiler will seamlessly figure out which one to use as if both imports were overloads in the same class.
205+
206+
> To avoid dependency, Mug doesn't include these Guava-specific utilities. But it's trivial to create them when you need them. Eventually when BiStream is consolidated into Guava, these internal Guava-specific BiCollector utilities such as `toImmutableTable()` will become available.
207+
208+
And more...
209+
210+
As implied above, you can also collect to a Guava `ImmutableSetMultimap`:
211+
```java
212+
ImmutableSetMultimap<K, V> all = BiStream.concat(cached, onDemand)
213+
.collect(ImmutableSetMultimap::toImmutableSetMultimap);
214+
```
215+
216+
to a custom data structure:
217+
```java
218+
// compatible with BiCollector signature
219+
<T, V> Collector<T, ?, Ledger<V>> toLedger(
220+
Function<T, Instant> toTime, Function<T, V> toValue) {...}
221+
222+
Ledger<V> ledger = ledger.timeseries()
223+
.filterKeys(...)
224+
.collect(this::toLedger); // method-ref as a BiCollector
225+
```
226+
227+
228+
Or do further grouping:
229+
```java
230+
Multimap<Address, PhoneNumber> phoneBook = ...;
231+
ImmutableMap<State, ImmutableSet<AreaCode>> stateAreaCodes =
232+
BiStream.from(phoneBook)
233+
.mapValues(PhoneNumber::areaCode)
234+
.collect(BiCollectors.groupingBy(Address::state, toImmutableSet()))
235+
.collect(toImmutableMap());
236+
```
237+
238+
## Design Considerations
239+
240+
#### Why doesn't BiStream<K, V> extend `Stream<Map.Entry<K, V>>`?
241+
242+
* We expect that with BiStream users will rarely need to deal with Entry objects.
243+
* The Stream class has a large API surface. Next versions can likely add more methods. It'll be hard to stay clear of accidental conflicts with existing or future JDK methods.
244+
* It's not clear that JDK meant for 3rd-party libraries to extend from Stream directly.
245+
246+
#### Why is BiCollector designed with that weird signature?
247+
248+
This is mainly for the ease of reusing existing Collector factory methods as BiCollector implementations. This way, users can freely collect from BiStream to custom data with ease, like `collect(Collectors::toConcurrentMap)`, `collect(ImmutableSetMultimap::toImmutableSetMultimap)`.
249+
250+
#### Why doesn't BiStream implement Iterable?
251+
252+
We believe it's better to stay consistent with JDK Stream. The reason Stream didn't implement Iterable is likely because there is a quasi-standard that Iterable are expected to be idempotent, while streams are one-use-only.
253+
254+
#### Why not provide a toPairs() method that returns List<Pair<K, V>>?
255+
256+
See [Perusing Pair](https://github.com/google/mug/wiki/Perusing-Pair-(and-BiStream)), we believe it's better not to add library support for Pair.

0 commit comments

Comments
 (0)