Link to the original text :https://blog.csdn.net/qq_43323585/article/details/105824989
Kafka An overview of
kafka What is it?
Kafka yes Apache Open source stream processing platform , The platform provides the subscription and publication of messages . With high throughput 、 Simple 、 Easy to deploy and so on .
Kafka What for?
- Message queue : For system decoupling 、 asynchronous communication 、 Flow, valley filling, etc .
- Kaka Streaming: Real time online streaming .
Message queuing working mode
Two working modes of message queuing :1. At most once 2. There is no limit to . Pictured :
that Kafka How it works ? continue
Kafka Architecture and concepts
Some concepts still need to know , Coding can be used and can help understand ! Finally, I'll attach my git Address , I want to talk about kafka Business with my code base can be basically done
Kafka The production and consumption of news
Kafka With Topic Formal management of messages in cluster classification (Record). Every Record Belong to a Topic And each Topic There are multiple partitions (partition) Deposit Record, Each partition corresponds to a server (Broker), This Broker go by the name of leader, The partition copy corresponds to Broker Become follower.** It should be noted that only leader Can read and write .** you 're right , It's easy for us to think of zookeeper, The so-called only distributed consistency algorithm so far Paxos, This is also for Kafka The reliability of the data provides a guarantee . It may be a little abstract , Look at the picture :
Partitions and logs
The journal is log, It's just data , It's kind of like redis Inside log. It's not what we call print logs , ha-ha
3. Every Topic Can be subscribed by multiple consumers ,Kafka adopt Topic management partiton.
4. Consumers can poll , load ( Yes Record Of key modulus ) The way to Record Deposit in partition in .
5. partition It's an ordered and immutable sequence of logs , Every Record There is only offset, Used to record consumption 、 Support persistence strategy .
6. kafka The default configuration log.retention.hours=168, Whether the message is consumed or not Record Can be preserved 168 Hours – Hard disk storage , Specific persistence policies can be customized through configuration files .
7. partiton Internal order , In the case of multiple partitions , Don't expect your husband to consume first . Write business and code to pay attention to
Why? Kafka Give up order , The local order is adopted ?
It's like hadoop The same as , Distributed cluster , Breaking physical limitations , Both the performance capacity and concurrency have been improved qualitatively , One can't make a hundred . After all Kafka It's a big data framework .
Consumer groups
Concept : It's a logical consumer , There are multiple consumer instances . If 4 Consumers subscribe at the same time topic1(4 Zones ), Then a partition will be consumed 4 Time . The introduction of consumption group can avoid repeated consumption .
code : Consumers can use subscribe and assign, use subscribe subscribe topic Consumer group must be specified !
Kafka High performance way
Why? Kafka The throughput is so high
Kafka It can easily support millions of write requests , And the data will persist to the hard disk , Terror . Now think about it , A high-performance technology is an encapsulation of the kernel , such as Redis Called at the bottom epoll(), The most powerful is big brother OS.
Sequential writing 、mmap
Sequential writing : The hard disk is a mechanical structure , Addressing is an extremely time-consuming mechanical action , therefore Kafka In order IO, It avoids a lot of memory overhead and IO Addressing time .
mmap:Memory Mapped Files Memory mapped files , It works by using the operating system PageCache Realize the direct mapping from file to physical memory , It's written directly to pagecache in , All user operations on memory will be automatically refreshed to the disk by the operating system , Greatly reduced IO Usage rate .mmap It is equivalent to that a user mode can directly access the kernel mode shared space , The switch from user state to inner core state and copy.
ZeroCopy
Kafka When the server responds to the client reading , Bottom use ZeroCopy technology , You don't need to copy the disk to user space , Instead, the data is transmitted directly through the kernel space , Data doesn't reach user space .
IO The model will not be repeated here , I'll write it later .
build Kafka colony
Please look at what I wrote zookeeper and kafka Build two articles , Reference resources :Kafka Cluster building
Topic management
- bootstrap-server: consumer , Pull data , The old version with -zookeeper This parameter . And a lot of data goes into zk It's in , This is very unreasonable .
- broker-list: producer , Push data
- partitions: Number of divisions
- replication-factor : Partition replica factor
establish topic
bin/kafka-topics.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --create --partitions 3 --replication-factor 2 --topic debug
- 1
See the list of topics
bin/kafka-topics.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --list
- 1
See topic details
bin/kafka-topics.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --describe --topic debug
- 1
Delete topic
bin/kafka-topics.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --delete --topic debug
- 1
Produce a message on a topic
bin/kafka-console-producer.sh --broker-list node01:9092,node02:9092,node03:9092 --topic debug
- 1
Consume news on a topic
bin/kafka-console-consumer.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --topic debug --from-beginning
- 1
bin/kafka-console-consumer.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --topic debug --group group1
- 1
These commands are still very useful ! It can help us test .
Check out the consumer group
bin/kafka-consumer-groups.sh --bootstrap-server node01:9092,node02:9092,node03:9092 --list
- 1
Kafka API
The information comes from the examples of reference books , Very comprehensive , It has been uploaded to GIt, I haven't had time to sort it out yet , But I'll sort it out later , If you think it's good, give a star ! Include : Production and consumption , Custom partition , serialize , Interceptor , Business ,kafka Stream processing, etc .
Git Address :Kafka API
Kafka characteristic
acks、retries Mechanism
acks Response mechanism :
- acks=1,leader Write a successful , Don't wait for follower Confirm to return . There is no guarantee of a single point of failure
- acks=0, Send to socket cache , Confirm to return . The message security factor is the lowest
- acks=-1,leader At least one follower After answering, confirm to return . High availability cluster
retries Retry mechanism :
request.timeout.ms=3000 Default timeout retries .
retries=2147483647xxx Retry count .
Kafka Message semantics : Messages can be saved at least once
Idempotent writing
Idempotency : The result of multiple requests is consistent with that of one request .
Kafka Idempotent writing solutions :
For some business idempotent problems, we can learn from . To really solve idempotency :{ The blacklist 、 Signature 、token}
It should be noted that
enable.idempotence=false Idempotent is turned off by default
Prerequisite :retries=true;acks=all
Business
because Kafka Idempotent writing does not provide spanning multiple Partition And guarantees in cross session scenarios , therefore , We need a stronger transaction assurance , Can handle multiple atoms Partition Write operation of , Either all the data is written successfully , All or nothing , This is it. Kafka Transactions, namely Kafka Business .
producer Provides initTransactions,beginTransaction,sendOffsetsToTransaction,commitTransaction,abortTransaction Five transaction methods .
consumer Provides read_committed and read_uncommitted.
KafkaEagle monitoring software
Open source address :https://github.com/smartloli/kafka-eagle
Including when I started to learn Kafka He wrote his book, too .
setup script
- download kafka-eagle-bin-1.4.0.tar.gz And extract the
- mv kafka-eagle-bin-1.4.0 /opt/
- After decompressing, there is a compressed package inside , Decompress again
- Modify environment variables
vim /etc/profile
------------------------
# kafka-eagle
export KE_HOME=/opt/kafka-eagle
# PATH
export PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$KE_HOME/bin
------------------------
source /etc/profile
echo $PATH
OK~
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- modify kafka-eagle To configure
[[email protected] conf]# vim system-config.properties
-----------------------------------------------------
kafka.eagle.zk.cluster.alias=cluster1
cluster1.zk.list=node01:2181,node02:2181,node03:2181
#cluster2.zk.list=xdn10:2181,xdn11:2181,xdn12:2181
cluster1.kafka.eagle.offset.storage=kafka
#cluster2.kafka.eagle.offset.storage=zk
kafka.eagle.metrics.charts=true
cluster1.kafka.eagle.sasl.enable=false
cluster1.kafka.eagle.sasl.protocol=SASL_PLAINTEXT
cluster1.kafka.eagle.sasl.mechanism=SCRAM-SHA-256
cluster1.kafka.eagle.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="kafka" password="kafka-eagle";
cluster1.kafka.eagle.sasl.client.id=
#cluster2.kafka.eagle.sasl.enable=false
#cluster2.kafka.eagle.sasl.protocol=SASL_PLAINTEXT
#cluster2.kafka.eagle.sasl.mechanism=PLAIN
#cluster2.kafka.eagle.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="kafka" password="kafka-eagle";
#cluster2.kafka.eagle.sasl.client.id=
######################################
# kafka sqlite jdbc driver address
######################################
#kafka.eagle.driver=org.sqlite.JDBC
#kafka.eagle.url=jdbc:sqlite:/hadoop/kafka-eagle/db/ke.db
#kafka.eagle.username=root
#kafka.eagle.password=www.kafka-eagle.org
######################################
# kafka mysql jdbc driver address
######################################
kafka.eagle.driver=com.mysql.jdbc.Driver
kafka.eagle.url=jdbc:mysql://127.0.0.1:3306/ke?useUnicode=true&characterEncoding=UTF-8&zeroDateTimeBehavior=convertToNull
kafka.eagle.username=root
kafka.eagle.password=123456
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- modify kafka A launch configuration , Turn on JMS
vim kafka-server-start.sh
-------------------------------------
if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then
export JMX_PORT="7379"
export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G"
fi
- 1
- 2
- 3
- 4
- 5
- 6
- Finally start , Found no startup permissions
chmod u+x ke.sh
./ke.sh
- 1
- 2
Version 1.4.0
*******************************************************************
* Kafka Eagle Service has started success.
* Welcome, Now you can visit 'http://192.168.83.11:8048/ke'
* Account:admin ,Password:123456
*******************************************************************
* <Usage> ke.sh [start|status|stop|restart|stats] </Usage>
* <Usage> https://www.kafka-eagle.org/ </Usage>
*******************************************************************
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
summary
Only this and nothing more ,3 platform zk+3 platform kafka+kafka-eagle Monitoring can do a lot of things . such as : Log collection 、 The messaging system 、 Activity tracking 、 Operational indicators 、 Stream processing, etc . I hope I can give some help to the people I see !
Supporting documents :
kafka colony : link
zk colony : link