Order of Intermediate Operations – Streams
Order of Intermediate Operations
The order of intermediate operations in a stream pipeline can impact the performance of a stream pipeline. If intermediate operations that reduce the size of the stream can be performed earlier in the pipeline, fewer elements need to be processed by the subsequent operations.
Moving intermediate operations such as filter(), distinct(), dropWhile(), limit(), skip(), and takeWhile() earlier in the pipeline can be beneficial, as they all decrease the size of the input stream. Example 16.4 implements two stream pipelines at (1) and (2) to create a list of CD titles, but skipping the first three CDs. The map() operation transforms each CD to its title, resulting in an output stream with element type String. The example shows how the number of elements processed by the map() operation can be reduced if the skip() operation is performed before the map() operation (p. 921).
Example 16.4 Order of Intermediate Operations
import java.util.List;
public final class OrderOfOperations {
public static void main(String[] args) {
List<CD> cdList = CD.cdList;
// Map before skip.
List<String> cdTitles1 = cdList
.stream() // (1)
.map(cd -> { // Map applied to all elements.
System.out.println(“Mapping: ” + cd.title());
return cd.title();
})
.skip(3) // Skip afterwards.
.toList();
System.out.println(cdTitles1);
System.out.println();
// Skip before map preferable.
List<String> cdTitles2 = cdList
.stream() // (2)
.skip(3) // Skip first.
.map(cd -> { // Map not applied to the first 3 elements.
System.out.println(“Mapping: ” + cd.title());
return cd.title();
})
.toList();
System.out.println(cdTitles2);
}
}
Output from the program:
Mapping: Java Jive
Mapping: Java Jam
Mapping: Lambda Dancing
Mapping: Keep on Erasing
Mapping: Hot Generics
[Keep on Erasing, Hot Generics]
Mapping: Keep on Erasing
Mapping: Hot Generics
[Keep on Erasing, Hot Generics]
Non-interfering and Stateless Behavioral Parameters
One of the main goals of the Stream API is that the code for a stream pipeline should execute and produce the same results whether the stream elements are processed sequentially or in parallel. In order to achieve this goal, certain constraints are placed on the behavioral parameters—that is, on the lambda expressions and method references that are implementations of the functional interface parameters in stream operations. These behavioral parameters, as the name implies, allow the behavior of a stream operation to be customized. For example, the predicate supplied to the filter() operation defines the criteria for filtering the elements.
Most stream operations require that their behavioral parameters are non-interfering and stateless. A non-interfering behavioral parameter does not change the stream data source during the execution of the pipeline, as this might not produce deterministic results. The exception to this is when the data source is concurrent, which guarantees that the source is thread-safe. A stateless behavioral parameter does not access any state that can change during the execution of the pipeline, as this might not be thread-safe.
If the constraints are violated, all bets are off, resulting in incorrect results being computed, which causes the stream pipeline to fail. In addition to these constraints, care should be taken to introduce side effects via behavioral parameters, as these might introduce other concurrency-related problems during parallel execution of the pipeline.
The aspects of intermediate operations mentioned in this subsection will become clearer as we fill in the details in subsequent sections.
Streams from Collections – Streams
Streams from Collections
The default methods stream() and parallelStream() of the Collection interface create streams with collections as the data source. Collections are the only data source that provide the parallelStream() method to create a parallel stream directly. Otherwise, the parallel() intermediate operation must be used in the stream pipeline.
The following default methods for building streams from collections are defined in the java.util.Collection interface:
default Stream<E> stream()
default Stream<E> parallelStream()
Return a finite sequential stream or a possibly parallel stream with this collection as its source, respectively. Whether it is ordered or not depends on the collection used as the data source.
We have already seen examples of creating streams from lists and sets, and several more examples can be found in the subsequent sections.
The code below illustrates two points about streams and their data sources. If the data source is modified before the terminal operation is initiated, the changes will be reflected in the stream. A stream is created at (2) with a list of CDs as the data source. Before a terminal operation is initiated on this stream at (4), an element is added to the underlying data source list at (3). Note that the list created at (1) is modifiable. The count() operation correctly reports the number of elements processed in the stream pipeline.
List<CD> listOfCDS = new ArrayList<>(List.of(CD.cd0, CD.cd1)); // (1)
Stream<CD> cdStream = listOfCDS.stream(); // (2)
listOfCDS.add(CD.cd2); // (3)
System.out.println(cdStream.count()); // (4) 3
// System.out.println(cdStream.count()); // (5) IllegalStateException
Trying to initiate an operation on a stream whose elements have already been consumed results in a java.lang.IllegalStateException. This case is illustrated at (5). The elements in the cdStream were consumed after the terminal operation at (4). A new stream must be created on the data source before any stream operations can be run.
To create a stream on the entries in a Map, a collection view can be used. In the code below, a Map is created at (1) and populated with some entries. An entry view on the map is obtained at (2) and used as a data source at (3) to create an unordered sequential stream. The terminal operation at (4) returns the number of entries in the map.
Map<Integer, String> dataMap = new HashMap<>(); // (1)
dataMap.put(1, “en”); dataMap.put(2, “to”);
dataMap.put(3, “tre”); dataMap.put(4, “fire”);
long numOfEntries = dataMap
.entrySet() // (2)
.stream() // (3)
.count(); // (4) 4
In the examples in this subsection, the call to the stream() method can be replaced by a call to the parallelStream() method. The stream will then execute in parallel, without the need for any additional synchronization code (p. 1009).
Archives
- July 2024
- June 2024
- May 2024
- March 2024
- February 2024
- January 2024
- December 2023
- October 2023
- September 2023
- May 2023
- March 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- April 2022
- March 2022
- November 2021
- October 2021
- September 2021
- July 2021
- June 2021
- March 2021
- February 2021
Calendar
M | T | W | T | F | S | S |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | 31 |