This paper mainly summarizes Hadoop The process of clustering , Content includes release notes 、Hadoop Introduction of the cluster 、 Server preparation 、 Network environment preparation 、 Server system settings and JDK Environmental installation . Let's have a look with those who need to learn ~
1、 Release notes
Hadoop The distribution version is divided into open source community version and commercial version . The community version refers to the Apache The version maintained by the software foundation , It's an official version system . Business Edition Hadoop It's a Community Edition by a third-party commercial company Hadoop On this basis, some modifications have been made 、 Integration and compatibility testing of various service components , Well known are cloudera Of CDH、mapR、hortonWorks etc. .
What we're going to learn later is the commercial version ：cloudera Of CDH. If not specified, it means CDH edition .Hadoop It's a special version of , It is developed by many branches in parallel . The big ones are divided into 3 A big series version ：1.x、2.x、3.x.Hadoop1.0 By a distributed file system HDFS And an offline computing framework MapReduce form .
Hadoop 2.0 It contains a support NameNode Laterally extended HDFS, A resource management system YARN And one running on YARN Offline computing framework on MapReduce. Compared with Hadoop1.0, Hadoop 2.0 More powerful , And it has better scalability 、 performance , And support a variety of computing frameworks .Hadoop 3.0 Compared with the previous Hadoop 2.0 There's a series of enhancements . At present, it has stabilized , But the upgrading and integration of the whole ecosystem is not complete yet , So commercial use is still open to question . What we are going to talk about Hadoop Cluster building process , Using the current 2 The most stable version of the series ：CDH 2.6.0-CDH14.0.
2、Hadoop Introduction of the cluster
Hadoop Specifically, a cluster consists of two clusters ：HDFS Clusters and YARN colony , The two are logically separated , But physically, they're always together .HDFS Cluster is responsible for massive data storage , The main roles in the cluster are ：NameNode 、 DataNode 、 SecondaryNameNode.YARN Cluster is responsible for the resource scheduling of massive data computing , The main roles in the cluster are ： ResourceManager、NodeManager.
that mapreduce What is it? ? It's actually a distributed computing programming framework , It's an application development package , By the user in accordance with the programming specifications for program development , Post packaging runs on HDFS On the cluster , And receive YARN Cluster resource scheduling management .Hadoop There are three ways to deploy ,Standalone mode( Independent mode )、Pseudo-Distributed mode( Pseudo distributed mode )、Cluster mode( Cluster mode ), The first two are deployed on a single machine . Stand alone mode is also called stand-alone mode , only 1 Two machines running 1 individual java process , Mainly used for debugging . The pseudo distribution pattern is also in 1 Running on a machine HDFS Of NameNode and DataNode、YARN Of ResourceManger and NodeManager, But start separate java process , Mainly used for debugging . Cluster mode is mainly used for production environment deployment . Will use N Each host makes up a Hadoop colony . In this deployment mode , The master and slave nodes will be deployed on different machines separately . We use 3 Node as an example to build , The roles are assigned as follows ：
node-01 NameNode DataNode ResourceManager
node-02 DataNode NodeManager SecondaryNameNode
node-03 DataNode NodeManager
3、 Server preparation
Use in this case VMware Workstation Pro Virtual machines create virtual servers to build HADOOP colony , The software and version used are as follows ：
VMware Workstation Pro 12.0
Centos 6.9 64bit
4、 Network environment preparation
use NAT The Internet . If you create a desktop version Centos System , It can be edited through the graphic page after installation . If it is mini Version of , By editing ifcfg-eth* Profile to configure . Be careful BOOTPROTO、GATEWAY、NETMASK.
5、 Server system settings
# Synchronize the time of each machine in the cluster
date -s "2019-03-03 03:03:03" yum install ntpdate
# Network synchronization time
Set host name
vi /etc/sysconfig/network NETWORKING=yes
To configure IP、 Host name mapping vi /etc/hosts
To configure ssh Avoid secret landing
# Generate ssh Login free key
ssh-keygen -t rsa ( Four returns )
After executing this order , Will generate id_rsa( Private key )、id_rsa.pub( Public key )
Copy the public key to the target machine for password free login
# View firewall status
service iptables status
# Turn off firewall
service iptables stop
# Check the startup status of firewall
chkconfig iptables --list
# Turn off the firewall and start it
chkconfig iptables off
6、JDK Environmental installation
# Upload jdk Installation package
# Unzip the installation package
tar zxvf jdk-8u65-linux-x64.tar.gz
# Configure environment variables /etc/profile
export PATH= JAVA_HOME/bin
export CLASSPATH=.: JAVA_HOME/lib/tools.jar
# Refresh configuration
That's all Hadoop Summary of cluster building process , Have you all mastered ? More detailed big data video learning resources are in the erudite Valley , Welcome to apply for trial places , Have a free course experience !