Kafka It is the most widely used message queue in the industry . Data flow is a common business scenario , The client pushes the collected logs to Kafka, The business side can consume Kafka Data landing HDFS, For offline analysis , You can also use Spark or Flink consumption Kafka Data in , For real-time calculation .Kafka It plays a role as a link in the process of data flow , It can be used for decoupling between log collection and data processing systems .
This article will introduce how to build Kafka Detailed installation steps of cluster , And according to the problems encountered in daily business , Yes Linux Systems and Kafka The configuration parameters of the node are optimized .
1. Environmental statement
Components | edition | explain |
---|---|---|
Kafka | 2.12-2.5.1 | |
Zookeeper | 3.5.8 | 5 Nodes |
JDK | 1.8.0_144 |
Server configuration :
CPU:【2*Intel(R) Xeon(R) Silver 4214 Processor 12 Cores 24 Threads 2.20 GHz】
Memory :【8*16G DDR4-2666 ECC 1.2v RDIMM】
Mechanical drive :【12*4T 7200 turn 3.5 " SATA Interface 】
network card : Wan Zhao nic
Linux System :centos7.6
2. Initialize the basic environment of each node
- install jdk1.8
download jdk1.8.0_144, And unpack it /usr/local Catalog
- close swap
If you don't close swap, Memory is frequently swapped with disk space , There will be an increase in gc The risk of time .
# Provisional entry into force
swapoff -a
# permanent
echo 'swapoff -a' >> /etc/rc.d/rc.local
- Modify the maximum number of open files
Linux The maximum number of files opened by default is 1024, if Kafka There are a lot of writing and consumer side , It's easy to exceed the default value , Lead to broker Abnormal shutdown .
# Provisional entry into force
ulimit -n 102400
# see open files Number
ulimit -a | grep 'open files'
# permanent
vim /etc/security/limits.conf
* soft nofile 102400
* hard nofile 102400
3. build zk colony
build 5 Of nodes zk colony , You can guarantee to hang up to two zk In the case of nodes ,zk The cluster can still provide external services normally .
zk Please refer to the last blog for the steps of cluster building :zookeeper-3.5.8 Cluster building
4. The transfer plane is equipped with a copy of Kafka
Download decompression
Download one from the official website kafka_2.12-2.5.1 Program compression package , Unzip to the current directory .
Modify the configuration
- modify bin/kafka-server-start.sh file
Configure log print Directory 、 to open up JMX port 、 Rely on the JDK, as well as JVM Memory .
vim bin/kafka-server-start.sh
export LOG_DIR="/var/log/kafka"
export JMX_PORT="2020"
export JAVA_HOME="/usr/local/jdk1.8.0_144"
if [ "x$KAFKA_HE:AP_OPTS" = "x" ]; then
export KAFKA_HEAP_OPTS="-Xmx6G -Xms6G"
fi
- modify bin/kafka-run-class.sh
To configure JVM The garbage collector G1 Parameters .
vim bin/kafka-run-class.sh
export JAVA_HOME="/usr/local/jdk1.8.0_144" # Other scripts call the shell, need JDK Environmental Science
KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:MetaspaceSize=96m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80"
- modify config/server.properties file
vim config/server.properties
broker.id=10*
listeners=PLAINTEXT://host_name:9090
# In the amount of writing 150MB/s, Read volume 300MB/s Under the circumstances , Use the following two configurations , Network threads and disks IO The average thread idle rate is about 30%.
num.network.threads=6
num.io.threads=12
log.dirs=/data*/kafka-logs # Depending on the circumstances
log.retention.hours=48
zookeeper.connect=zk1.bjehp.com:2181,zk2.bjehp.com:2181,zk3.bjehp.com:2181,zk4.bjehp.com:2181,zk5.bjehp.com:2181/kafka/talos # Pay attention to revision zookeeper Address
auto.create.topics.enable=false
default.replication.factor=2
# The following notes are Kafka Default parameters , Be careful zk The connection time and timeout time of are determined by 0.8 Version of 6s Upgrade to the current version of 18s.
#offsets.topic.replication.factor=3
#transaction.state.log.replication.factor=3
#transaction.state.log.min.isr=2
#group.initial.rebalance.delay.ms=3000
#zookeeper.connection.timeout.ms=18000
#zookeeper.session.timeout.ms=18000
5. Install and start at each node Kafka
install
- Synchronous installation package
It's going to turn around Kafka Copy the installation package to local /usr/local/ Catalog
- Modify according to the actual situation of the machine server.properties The configuration file
vim /usr/local/kafka_2.12-2.5.1/config/server.properties
broker.id=10* # modify broker id Parameters
log.dirs=/data*/kafka-logs # Modify... According to the actual situation log Catalog
listeners=SASL_PLAINTEXT://hostname:9090 # modify hostname
Start and verify
nohup /usr/local/kafka_2.12-2.5.1/bin/kafka-server-start.sh /usr/local/kafka_2.12-2.5.1/config/server.properties > /dev/null 2>&1 &
ps aux | grep kafka
tailf /var/log/kafka/server.log
netstat -tnlp | grep 9090
netstat -tnlp | grep 2020
stop it
/usr/local/kafka_2.12-2.5.1/bin/kafka-server-stop.sh
6. Service operation and maintenance
Clear the log regularly
vim /etc/cron.d/kafka-logclean
# Clear it regularly every day 1 The day before kafka Log files
5 4 * * * root find /var/log/kafka/*.log.* -type f -mtime +1 | xargs rm -f
Configure monitoring alarm
- Configure the disk of the server 、 Memory 、cpu load Call the police
- To configure Kakfa Node port alarm
summary
This paper introduces Kakfa The detailed steps of cluster building , as well as Linux Systems and Kafka Node parameter tuning . In recent years, with Kafka Version iteration , The old version of bug( such as 0.8 edition ) And it's constantly being repaired , And new features are emerging , For example, traffic quota 、exactly-once Semantics, etc , bring Kafka Clusters are becoming more and more stable , This will significantly reduce cluster failures .Kafka There are many ingenious designs in the implementation of many function points , It is worth learning and exploring ~~~