Storm Storms and Spark Streaming Spark flow is an open source framework for distributed flow processing . Here we compare them and point out their important differences .
Processing model , Delay
Although both frameworks provide scalability and fault tolerance , The fundamental difference between them is their processing model . and Storm It deals with one event at a time , and Spark Streaming It deals with the flow of events in a window of time . therefore ,Storm Processing an event can be delayed by seconds , and Spark Streaming There's a delay of a few seconds .
Fault tolerance 、 Data assurance
The trade-off in fault-tolerant data assurance is ,Spark Streaming Provides better support for fault-tolerant state computation . stay Storm in , Each individual record must be tracked as it passes through the system , therefore Storm It can guarantee that each record will be processed at least once , But duplicate records are allowed when recovering from errors . This means that the variable state may be incorrectly updated twice .
On the other hand ,Spark Streaming Just trace at the batch level , Therefore, it can effectively ensure that every mini-batch Will be dealt with completely once , Even if a node fails .( actually ,Storm Of Trident library The library also provides a complete process . however , It depends on the transaction update state , It's slow , Usually it has to be implemented by the user .)
In short , If you need a second delay ,Storm It's a good choice , And no data loss . If you need stateful computation , And make sure that every event is handled only once ,Spark Streaming Better .Spark Streaming Programming logic may also be easier , Because it's like a batch program (Hadoop), Especially when you use batches ( Small as it is ) when .
Realization , Programming api
Storm The first time was by Clojure Realization , and Spark Streaming It's using Scala. If you want to look at the code or make your own customization, you need to pay attention to , In order to find out how each system works .Storm By BackType and Twitter Development ; Spark Streaming It was developed at the University of California, Berkeley .
Storm There is one Java API, Other languages are also supported , and Spark Streaming In order to Scala Programming , Yes, of course Java
Spark Streaming A good feature is that it runs on Spark On . So you can write the same code for your batch , There is no need to write separate code to deal with real-time stream data and historical data .
Storm It's been released for a few years , stay Twitter from 2011 It has been running since , It's also used by other companies , and Spark Streaming It's a new project , It is from 2013 In Sharethrough There's a project running .
Storm It's a Hortonworks Hadoop Streaming solutions on data platforms , and Spark Streaming Yes MapR There are also versions of Cloudera Our enterprise data platform ,Databricks Also provide Spark Support .
Cluster management integration
Although both systems run on their own clusters ,Storm It can also run in Mesos, and Spark Streaming Can run in YARN and Mesos On .