Comparison of Java stream and collection: when and how to return stream instead of collection from Java API- TomaszKie ł bowicz

Jiedao jdon 2021-05-04 17:29:04
comparison java stream collection return


Show you something that can be used very easily Java Stream Flow scenarios and examples of how to use them .

This paper is based on the standard Java library java.util.stream. It is not only related to Reaction flow irrelevant , It's also related to things like Vavr And Class has nothing to do with other stream implementations . in addition , I won't go into the high-level details of streams like parallel execution .

First , Let's briefly discuss the unique streaming capabilities compared to collections . Although there are some similarities , But the difference is big , You should not think of a flow as just another collection in the library .

according to java.util.stream  Of file , The most important function is :

  • No storage space , It could be unlimited  - Collections are ready-made data structures , And streams represent the ability to generate data , Usually it doesn't even exist when creating a stream . Since the data in the stream is not stored , So we can create almost indefinite flows , Or it could be rephrased more realistically , We can let the consumer decide how many elements to read from the stream , From the producer's point of view , It can be uncertain ( for example new Random().ints()).
  • Lazy load  — Pause many operations while defining a flow ( Such as filtering , mapping ), And only if the user decides to use the data in the stream
  • It's practical in nature  - Because you already have experience with streaming , So you may notice that processing the data in the flow is for each step ( Such as filters or maps ) Create new streams , Instead of modifying the source data
  • Consumptive  - You can only read the stream once , Then, unlike collections that can be read multiple times , It becomes “ Consumptive ”

Now let's see what we can do with streams .

Deal with a lot of data

hypothesis , We have to copy data from external services into our database . The amount of data to be copied can be arbitrarily large . We can't get all the data , Cannot store it in a collection , And then save it in the database , Because it can run out of heap memory . We have to process the data in batches , And design the interface between external service client and database storage . Because the stream does not store dates , So you can use it to safely handle the amount of data you need .

In the example ( And all of the following examples ) in , We will use java.util.stream.Stream Interface to build the flow . use Java The most powerful building flow , The most flexible way is to implement Spliterator Interface , And then use StreamSupport Class wraps it as a stream . however , As we can see ,Stream in many instances , Static factory methods in the interface are enough .

Suppose a simple API From external services that support paging ( for example ,REST service , database ) Get data in . The API At most limit Extract items from offset. Use... Iteratively API​​, We can get as much data as we need .

interface ExternalService {
 List<String> fetch(int offset, int limit);
}

Now? , We can use API​​ Provide data flow , And will API Users and paging API To separate from each other :

class Service<T> {
  private final ExternalService<T> externalService;
  
  public Stream<T> stream(int size, int batchSize) {
    var cursor = new Cursor();
    return Stream
      .generate(() -> next(cursor, size, batchSize))
      .takeWhile(not(List::isEmpty))
      .flatMap(List::stream);
  }
  private List<T> next(Cursor cursor, int size, int batchSize) {
    var fetchSize = Math.min(size - cursor.offset, batchSize);
    var result = externalService.fetch(cursor.offset, fetchSize);
    cursor.inc(result.size());
    return result;
  }
}

Cursor Hold the current offset offset:

private static class Cursor {
  private int offset;
   
  void inc(int by) {
    offset += by;
  }
}

We use Stream.generate() Method to construct infinite flow , Each of these elements is created by the flow provider . The stream element is from REST API Get the page List<T>. Will create... For each stream Cursor Class , To track the progress of the acquired elements .

Stream.takeWhile() Method is used to detect the last page , The last data stream returned T, instead of List<T>.

We use flatMap Flat flow . Although in some cases , Keep batch processing ( For example, save the entire page in a transaction ) It could be useful .

Now? , We can use Service.stream(size, batchSize) To retrieve any long stream , Without any paging API Knowledge ( We decided to make it public batchSize Parameters , But it's a design decision ). At any time , Memory consumption is limited by batch size . Users can process data one by one , Save it in a database , Or batch again ( Batch sizes may be different ).

Quick access ( Incomplete ) data

Suppose we have a time-consuming operation , This operation must be performed on each element of the data , And it takes time to calculate t. about n Elements , Users have to wait t * n To receive the calculation results . for example , If the user is waiting for a table with calculated results , It could be a problem . We want to show them as soon as we show the first results , Instead of waiting for all the results to be calculated and submitting the table immediately .

public class Producer1 {
  private Stream<String> buildStream() {
    return Stream.of("a", "b", "c");
  }
  
  private String expensiveStringDoubler(String input) {
    return input + input;
  }
  public Stream<String> stream() {
    return buildStream().map(this::expensiveComputation);
  }
}

consumer :

stream().forEach(System.out::println)

Output :

Processing of: a
aa
Processing of: b
…

Output :

Processing of: a
aa
Processing of: b
…

As we can see , Before we start processing the next element , The user can use the first element “ aa ” Processing results of , But computing is still the producer responsibility of flow . let me put it another way , Consumers decide when and whether to perform calculations , But the producer is still responsible for how to perform the calculation .

You might think it's easy , And it doesn't need to flow . Of course , You are right , Let's take a look :

public class Producer1Classic {
  public List<String> data() {
    return List.of("a", "b", "c", "d", "e", "f");
  }
  
  public String expensiveStringDoubler(String input) {
    return input + input;
  }
}

consumer :

var producer = new Producer1Classic();
for (String element : producer.data()) {
  System.out.println(producer.expensiveComputation(element));
}

The same effect , But actually we've reinvented the wheel , Our implementation mimics stream The ancestors of the - Iterator And we lost stream Of API The advantages of .

Avoid premature calculations

Again, let's assume that we are going to perform a time-consuming operation on each flow element . In some cases ,API 's users can't say in advance how much data they need . for example :

  • User canceled data loading
  • An error occurred during data processing , No need to process the rest of the data
  • The consumer reads the data until the conditions are met , For example, the first positive value

Because of the inertia of the flow , In this case, some calculations can be avoided .

private Stream<Double> buildStream() {
  return new Random().doubles().boxed();
}
private Double expensiveComputation(Double input) {
  return input / 2;
}
public Stream<Double> stream() {
  return buildStream().map(this::expensiveComputation);
}

consumer :

stream().peek(System.out::println).filter(value -> value > 0.4).findFirst();

In this example , The user reads the data , Until the value is greater than 0.4. Producers don't understand the logic of consumers , But it only counts the necessary items . Logic ( For example, conditions ) It can be changed independently on the client side .

API Easy to use

Use streams instead of customization API There's another reason for design . Streams are part of the standard library , And it's known to many developers . In our API Using streams in makes it easier for other developers to use the API.

Other matters needing attention

Error handling

Traditional error handling doesn't apply to Streams. As the actual processing will be delayed until needed , So when you construct a flow, you don't throw an exception . Basically , We have two options :

  • trigger RuntimeException- Termination of the method ( for example forEach) Exception will be thrown
  • Wrapping elements into an object , This object represents the current state of the element being processed , for example TryVavr Special classes in the library ( Blog Details )

Resource management

Sometimes we have to use a resource to provide streaming data ( for example , Sessions in external services ), And we want to release the stream when it's finished . Fortunately, , Flow realizes Autoclosable Interface , We can do it in try-with-resources Statement using flow , from ​​ And make resource management very easy . All we have to do is use onClose Method to register a hook in the stream . When the stream is closed , The hook will be called automatically .

private Stream<Double> buildStream() {
  return new Random().doubles().boxed();
}
private Double expensiveComputation(Double input) {
  if (input > 0.8) throw new RuntimeException("Data processing exception");
  return input / 2;
}
public Stream<Double> stream() {
  return buildStream().map(this::expensiveComputation).onClose(()-> System.out.println("Releasing resources…"));
}

consumer :

try (Stream<Double> stream = stream()){
  stream.forEach(System.out::println);
}

Output :

0.2264004802916616
0.32777949557515484
Releasing resources…
Exception in thread “main” java.lang.RuntimeException: Data processing exception

In this example , When a data processing exception occurs , The stream will pass through try-with-resources Statement automatically closes , And call the registered handler . In the sample output , We can see Releasing resources… The message printed by the handler .

summary

  1. Streams are not collections .
  2. Flow can help us solve the following problems :* Deal with a lot of data * Quick access ( incomplete ) data * Avoid premature calculations
  3. It's not hard to build streams .
  4. We have to pay attention to error handling .
  5. Support resource management .

版权声明
本文为[Jiedao jdon]所创,转载请带上原文链接,感谢
https://javamana.com/2021/05/20210504172656538K.html

  1. Compare node.js with spring boot- Ryan Gleason
  2. Obvious pitfalls of spring Webflux- Ł ukaszKy ć
  3. Spring founder uncle rod's real thoughts on yaml
  4. 码农飞升记-02-OracleJDK是什么?OracleJDK的版本怎么选择?
  5. What is manong feisheng-02-oracle JDK? How to choose the version of Oracle JDK?
  6. Spring tide surging Xinanjiang
  7. Linux内核软中断
  8. Linux kernel soft interrupt
  9. Linux内核软中断
  10. Linux kernel soft interrupt
  11. Java multithreading Foundation
  12. The construction of Maven private library nexus
  13. I / O stream in Java
  14. JDK 16:Java 16的新功能 - InfoWorld
  15. 在Java中本地进行线程间数据传输的三种方式和源码展示
  16. jdon导致cpu 99%最后tomcat死掉---banq给予回复
  17. 用领域事件模拟AOP注入
  18. JDK 16: new function of Java 16 - InfoWorld
  19. Cartoon: from JVM lock to redis distributed lock
  20. Spring 3.1 终于加入了Cache支持
  21. Prototype与JQuery对比
  22. Three ways of data transmission between threads in Java and source code display
  23. Jdon causes 99% of CPU and Tomcat dies -- banq replies
  24. docker 原理之 user namespace(下)
  25. Simulating AOP injection with domain events
  26. Spring 3.1 finally adds cache support
  27. Comparison between prototype and jquery
  28. User namespace of docker principle (2)
  29. The way to learn java IO stream and XML
  30. Why does a seemingly correct code cause the Dubbo thread pool to be full
  31. 0 基础 Java 自学之路(2021年最新版)
  32. 0 basic Java self study road (latest version in 2021)
  33. c#—基础拾遗(1) 面向对象
  34. C - basic information (1) object oriented
  35. 技术分享|SQL和 NoSQL数据库之间的差异:MySQL(VS)MongoDB
  36. Technology sharing differences between SQL and NoSQL databases: MySQL (VS) mongodb
  37. PHP教程/面向对象-3~构造函数和析构函数
  38. Spring Cloud的Feign客户端入门
  39. 优化Spring Boot应用的Docker打包速度
  40. PHP tutorial / object oriented - 3 ~ constructor and destructor
  41. Introduction to feign client of spring cloud
  42. Optimizing docker packaging speed of spring boot application
  43. 尚硅谷宋红康Java基础教程2019版
  44. 尚硅谷宋红康Java基础教程2019版
  45. Song Hongkang Java foundation course 2019
  46. Song Hongkang Java foundation course 2019
  47. Redis 6 的多线程
  48. Multithreading of redis 6
  49. SpringCloud-微服务架构编码构建
  50. SpringCloud-微服务架构编码构建
  51. Linux作业控制
  52. Coding construction of springcloud microservice architecture
  53. Java中几个常用并发队列比较 | Baeldung
  54. 为什么Java后端在创业企业中并不流行? -reddit
  55. Coding construction of springcloud microservice architecture
  56. Linux job control
  57. Comparison of several common concurrent queues in Java
  58. Why is java backend not popular in start-ups- reddit
  59. docker 资源限制之 cgroup
  60. 大数据环境: hadoop和jdk部署