Hadoop cluster building process summary

sxxbxh 2021-04-16 18:06:18
hadoop cluster building process summary

This paper mainly summarizes Hadoop The process of clustering , Content includes release notes 、Hadoop Introduction of the cluster 、 Server preparation 、 Network environment preparation 、 Server system settings and JDK Environmental installation . Let's have a look with those who need to learn ~

1、 Release notes

Hadoop The distribution version is divided into open source community version and commercial version . The community version refers to the Apache The version maintained by the software foundation , It's an official version system . Business Edition Hadoop It's a Community Edition by a third-party commercial company Hadoop On this basis, some modifications have been made 、 Integration and compatibility testing of various service components , Well known are cloudera Of CDH、mapR、hortonWorks etc. .

What we're going to learn later is the commercial version :cloudera Of CDH. If not specified, it means CDH edition .Hadoop It's a special version of , It is developed by many branches in parallel . The big ones are divided into 3 A big series version :1.x、2.x、3.x.Hadoop1.0 By a distributed file system HDFS And an offline computing framework MapReduce form .

Hadoop 2.0 It contains a support NameNode Laterally extended HDFS, A resource management system YARN And one running on YARN Offline computing framework on MapReduce. Compared with Hadoop1.0, Hadoop 2.0 More powerful , And it has better scalability 、 performance , And support a variety of computing frameworks .Hadoop 3.0 Compared with the previous Hadoop 2.0 There's a series of enhancements . At present, it has stabilized , But the upgrading and integration of the whole ecosystem is not complete yet , So commercial use is still open to question . What we are going to talk about Hadoop Cluster building process , Using the current 2 The most stable version of the series :CDH 2.6.0-CDH14.0.

2、Hadoop Introduction of the cluster

Hadoop Specifically, a cluster consists of two clusters :HDFS Clusters and YARN colony , The two are logically separated , But physically, they're always together .HDFS Cluster is responsible for massive data storage , The main roles in the cluster are :NameNode 、 DataNode 、 SecondaryNameNode.YARN Cluster is responsible for the resource scheduling of massive data computing , The main roles in the cluster are : ResourceManager、NodeManager.

that mapreduce What is it? ? It's actually a distributed computing programming framework , It's an application development package , By the user in accordance with the programming specifications for program development , Post packaging runs on HDFS On the cluster , And receive YARN Cluster resource scheduling management .Hadoop There are three ways to deploy ,Standalone mode( Independent mode )、Pseudo-Distributed mode( Pseudo distributed mode )、Cluster mode( Cluster mode ), The first two are deployed on a single machine . Stand alone mode is also called stand-alone mode , only 1 Two machines running 1 individual java process , Mainly used for debugging . The pseudo distribution pattern is also in 1 Running on a machine HDFS Of NameNode and DataNode、YARN Of ResourceManger and NodeManager, But start separate java process , Mainly used for debugging . Cluster mode is mainly used for production environment deployment . Will use N Each host makes up a Hadoop colony . In this deployment mode , The master and slave nodes will be deployed on different machines separately . We use 3 Node as an example to build , The roles are assigned as follows :

node-01 NameNode DataNode ResourceManager

node-02 DataNode NodeManager SecondaryNameNode

node-03 DataNode NodeManager

3、 Server preparation

Use in this case VMware Workstation Pro Virtual machines create virtual servers to build HADOOP colony , The software and version used are as follows :

VMware Workstation Pro 12.0

Centos 6.9 64bit

4、 Network environment preparation

use NAT The Internet . If you create a desktop version Centos System , It can be edited through the graphic page after installation . If it is mini Version of , By editing ifcfg-eth* Profile to configure . Be careful BOOTPROTO、GATEWAY、NETMASK.

5、 Server system settings

Synchronization time

# Synchronize the time of each machine in the cluster

date -s "2019-03-03 03:03:03" yum install ntpdate

# Network synchronization time

ntpdate cn.pool.ntp.org

Set host name

vi /etc/sysconfig/network NETWORKING=yes


To configure IP、 Host name mapping vi /etc/hosts node-1 node-2 node-3

To configure ssh Avoid secret landing

# Generate ssh Login free key

ssh-keygen -t rsa ( Four returns )

After executing this order , Will generate id_rsa( Private key )、id_rsa.pub( Public key )

Copy the public key to the target machine for password free login

ssh-copy-id node-2

Configure firewall

# View firewall status

service iptables status

# Turn off firewall

service iptables stop

# Check the startup status of firewall

chkconfig iptables --list

# Turn off the firewall and start it

chkconfig iptables off

6、JDK Environmental installation

# Upload jdk Installation package


# Unzip the installation package

tar zxvf jdk-8u65-linux-x64.tar.gz

# Configure environment variables /etc/profile

export JAVA_HOME=/export/servers/jdk1.8.0_65

export PATH= P A T H : PATH: JAVA_HOME/bin

export CLASSPATH=.: J A V A H O M E / l i b / d t . j a r : JAVA_HOME/lib/dt.jar: JAVA_HOME/lib/tools.jar

# Refresh configuration

source /etc/profile

That's all Hadoop Summary of cluster building process , Have you all mastered ? More detailed big data video learning resources are in the erudite Valley , Welcome to apply for trial places , Have a free course experience !

Zhengzhou see infertility hospital which good https://jbk.39.net/yiyuanzaixian/zztjyy/


  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云