Kafka At first it was from LinkedIn Adopted by the company Scala A multi partition of language development 、 Multiple copies and based on ZooKeeper Coordinated distributed messaging systems , Has been donated to Apache The foundation . at present Kafka Has been positioned as a distributed streaming platform , It's high throughput 、 Be persistent 、 Scalable horizontally 、 It is widely used to support streaming data processing and other features . At present, more and more open source distributed processing systems such as Cloudera、Storm、Spark、Flink And so on Kafka Integrate .
Kafka The reason why it is more and more popular , With it “ Play the role ” The three roles of the teacher are inseparable ：
- The messaging system ： Kafka And traditional messaging systems （ Also known as message middleware ） All have system decoupling 、 Redundant storage 、 Traffic peak clipping 、 buffer 、 asynchronous communication 、 Extensibility 、 Recoverability and other functions . meanwhile ,Kafka It also provides the message sequence guarantee and the function of backtracking consumption which are difficult to be realized by most message systems .
- The storage system ： Kafka Persist messages to disk , Compared to other memory based storage systems , Effectively reduce the risk of data loss . It's because of Kafka Message persistence and multi copy mechanism , We can Kafka As a long-term data storage system , Just set the corresponding data retention policy to “ permanent ” Or enable the theme's log compression function .
- Streaming platform ： Kafka Not only does it provide a reliable data source for every popular streaming framework , It also provides a complete streaming class library , Like windows 、 Connect 、 Various operations such as transformation and aggregation .
1|0 Basic concepts
A typical Kafka The architecture includes a number of Producer、 A number of Broker、 A number of Consumer, And one. ZooKeeper colony , As shown in the figure below . among ZooKeeper yes Kafka It is used to manage the cluster metadata 、 Controller election and other operations .Producer Send the message to Broker,Broker Responsible for storing received messages to disk , and Consumer In charge of from Broker Subscribe and consume messages .
Whole Kafka The architecture introduces the following 3 A term ：
- Producer： producer , That is, the party sending the message . The producer is responsible for creating the message , Then deliver it to Kafka in .
- Consumer： consumer , That is, the party receiving the message . Consumer connected to Kafka Go up and receive messages , Then carry on the corresponding business logic processing .
- Broker： Service agent node . about Kafka for ,Broker Can be simply seen as an independent Kafka Service node or Kafka Service instance . In most cases, you can also change Broker Think of it as a Kafka The server , The premise is that there is only one deployed on this server Kafka example . One or more Broker Formed a Kafka colony . generally speaking , We are more used to lowercase broker To represent a service proxy node .
stay Kafka There are also two particularly important concepts — The theme （Topic） And zoning （Partition）.Kafka The messages in are grouped by topic , Producers are responsible for sending messages to specific topics （ Send to Kafka Each message in the cluster has to be assigned a topic ）, And consumers are responsible for subscribing to topics and consuming .
2|0 Client development
A normal production logic needs the following steps ：
- Configure producer client parameters and create corresponding producer instances .
- Building messages to be sent .
- Send a message .
- Close producer instance .
The message object built in it ProducerRecord, It's not just news , It contains multiple attributes , The business-related message body that needs to be sent is just one of them value attribute , such as “Hello, Kafka!” It's just ProducerRecord An attribute in an object .ProducerRecord Class is defined as follows （ Intercepts only member variables ）
among topic and partition The fields represent the subject to which the message is to be sent and the area code .headers The field is the header of the message ,Kafka 0.11.x It's the version that introduces this property , It is mostly used to set some information related to the application , If you don't need it, you don't need to set it .key Is the key used to specify the message , It's not just additional information to the message , It can also be used to calculate the partition number so that messages can be sent to specific partitions . As mentioned earlier, messages are classified by topic , And this key The message can be sorted again , The same key All messages will be partitioned into the same partition .
3|0 Necessary parameter setting
Before creating a real producer instance, you need to configure the corresponding parameters , Like the ones that need to be connected Kafka The cluster address . Refer to... In the client code above initConfig() Method , stay Kafka Producer client KafkaProducer There is 3 Two parameters are required .
- bootstrap.servers： This parameter is used to specify the producer client connection Kafka Cluster needs broker Address list , The specific content format is host1:port1,host2:port2, You can set one or more addresses , Separated by commas , The default value for this parameter is “”. Note that not all of them are needed here broker Address , Because the producer will start from a given broker Find other broker Information about . However, it is recommended to set at least two or more broker Address information , When any one of them goes down , Producers can still connect to Kafka On the cluster .
- key.serializer and value.serializer：broker The message received by the client must be in byte array （byte） There is a form of . Code list 3-1 It's used by producers in KafkaProducer<String, String> and ProducerRecord<String, String> The generics in <String, String> The corresponding is in the message key and value The type of , The producer client uses this method to make the code readable , But it's being sent to broker Before that, you need to change the corresponding key and value Do the corresponding serialization operation to convert to byte array .key.serializer and value.serializer These two parameters are used to specify key and value Serializer for serialization operations , There are no default values for these two parameters .
In the client development code above initConfig() A parameter is also set in the method client.id, This parameter is used to set KafkaProducer Corresponding client id, The default value is “”. If the client does not set , be KafkaProducer Will automatically generate a non empty string , The content and form are as follows “producer-1”、“producer-2”, String character “producer-” The combination of numbers .
KafkaProducer There are many parameters in , Far from being an example initConfig() In the method, only 4 individual , Developers can modify the default values of these parameters according to the actual needs of business applications , In order to achieve the purpose of flexible deployment . In general , Ordinary developers can't remember all the parameter names , Only a general impression .
In actual use , Such as “key.serializer”、“max.request.size”、“interceptor.classes” Such strings are often wrongly written due to human factors . So , We can directly use the org.apache.kafka.clients.producer.ProducerConfig Class to do a certain degree of preventive measures , Each parameter is in ProducerConfig Classes have corresponding names , With code listing 3-1 Medium initConfig() Methods as an example , introduce ProducerConfig The results are as follows ：
Notice in the code above key.serializer and value.serializer The fully qualified name of the class corresponding to the parameter is relatively long , It's also easier to make mistakes , Through here Java To make further improvements , The relevant code is as follows ：
So the code is much simpler , At the same time, it further reduces the possibility of human error . After configuring the parameters , We can use it to create a producer instance , Examples are as follows ：
KafkaProducer It's thread safe , You can share a single KafkaProducer example , Can also be KafkaProducer The instance is pooled for other threads to call .
KafkaProducer There are several construction methods in , Like creating KafkaProducer Instance is not set key.serializer and value.serializer These two configuration parameters , Then you need to add the corresponding serializer to the constructor , Examples are as follows ：