1. To configure linux System environment
  1. centos 6.4 Download address :http://pan.baidu.com/s/1geoSWuv【VMWare special CentOS.rar】( Install the packaged VM Compressed package )
  2. And configure the virtual machine network connection as “ Host only mode ”( The host and virtual machine can communicate with each other )
     3.   Set the virtual machine to be fixed ip, Set up and restart the network :service network restart
          

2. Turn off firewall
1
2
3
4
5
6
su  root
service iptables stop  # Turn off firewall
service iptables status  # Verify that it is turned off
chkconfig iptables off  # Turn off the firewall and turn it on automatically
chkconfig –list |  grep  iptables  # Verify that the firewall is running automatically when it is turned on
vim  /etc/sysconfig/selinux  #  Ban selinux, take SELINUX=disabled

3. Modify hostname ( It has to be changed to hadoop)
1
2
3
hostname  # View the current hostname
hostname  hadoop  # Set host name , This time it comes into effect
vim  /etc/sysconfig/network  # modify HOSTNAME=hadoop, permanent

4. hold hostname and ip binding ( Set up DNS analysis )
1
2
vim  /etc/hosts  # increase  192.168.17.100    hadoop
reboot


5. Create user ( General production does not directly give root Of )
  1. adduser hadoop01
  2. passwd hadoop01 Set the password :hadoop01
  3. But instead root User give hadoop01 Set the permissions , Set allowed root Users of commands
su root
chmod 751 /etc/sudoers # Set read write
vim /etc/sudoers 
hadoop01        ALL=(ALL)       ALL # stay Allow root to run any commands anywhere Next add a line
Use it directly root
6. SSH Password free login
1
2
3
ssh -keygen -t rsa  # enter , Generate key , be located ~/.ssh Next
cp  ~/. ssh /id_rsa .pub ~/. ssh /authorized_keys2  #( or ssh-copy-id localhost)
ssh localhost  # verification , Password free login

7. install jdk(64bit: http://pan.baidu.com/s/1nua41ol【jdk-7u79-linux-x64.gz】,32bit: http://pan.baidu.com/s/1dDJPDNr【jdk-7u79-linux-i586.gz】)
     (Hadoop2.7.x Support to 1.7, So it is recommended to install 1.7 Of jdk, http://wiki.apache.org/hadoop/HadoopJavaVersions
1
2
3
4
5
6
7
8
9
cd  /usr/local/src
mkdir  java 
cd  java 
mkdir  jdk  #jdk Unzip here  sudo tar -zxvf xxx
vim  /etc/profile  # Configure environment variables , Additional
     export  JAVA_HOME= /usr/local/src/java/jdk
     export  PATH=$PATH:$JAVA_HOME /bin
source  /etc/profile  #  Make the configuration work
  java –version  #  Check whether the installation is successful
8. install Hadoop( Single node installation )
    download hadoop, http://apache.fayea.com/hadoop/common/hadoop-2.7.2/ ( If 64bit The system compiles its own source code , Refer to the video connection below for details ,64bit Compiled download address : http://pan.baidu.com/s/1c0TuAgo
    1.   To configure hdfs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
cd  /usr/local/src
mkdir  hadoop  #  decompression hadoop Compress the package here
cd  hadoop 
mkdir  data  
cd  data 
mkdir  tmp   # Store temporary directory and generate data when running
vim etc /hadoop/hadoop-env .sh  # JAVA_HOME Change to actual path JAVA_HOME=/usr/local/src/java/jdk
vim etc /hadoop/core-site .xml  # Configuration add : To configure hdfs Port of access  8020(2.x Support in the future 8020)
         <property>
                 <name>fs.defaultFS< /name >
                 <value>hdfs: //hadoop :8020< /value >
         < /property >
# Modify the directory where the temporary directory generates data at run time :( Default in tmp, stay linux The system may be deleted )
         <property>
                 <name>hadoop.tmp. dir < /name >
                 <value> /usr/local/src/hadoop/data/tmp < /value >
         < /property >
vim etc /hadoop/hdfs-site .xml  #  Configure the number of backups for the default replica ( It's usually 3 individual , Here, pseudo distributed as long as 1 One portion will do )
         <property>
                 <name>dfs.replication< /name >
                 <value>1< /value >
          < /property >
bin /hdfs  namenode - format  #  Format file system
sbin /start-dfs .sh  #  start-up  jps See how the startup is going , Start three processes ,namenode Store metadata ,datanode Store the data
sbin /stop-dfs .sh  # close
            Check the log directory logs
            see hdfs The state of : http://192.168.17.100:50070/

    2.   install yarn( Resource scheduling )
1
2
3
4
5
6
7
8
9
10
11
12
13
mv  etc /hadoop/mapred-site .xml.template etc /hadoop/mapred-site .xml
vim etc /hadoop/mapred-site .xml  # send mapreduce use yarn To dispatch
         <property>
                 <name>mapreduce.framework.name< /name >
                 <value>yarn< /value >
         < /property >
vim etc /hadoop/yarn-site .xml  # To configure Reduce The way to get data is shuffle
         <property>
                  <name>yarn.nodemanager.aux-services< /name >
                  <value>mapreduce_shuffle< /value >       
         < /property >
sbin /start-yarn .sh  #  start-up ( jsp You can see two more processes  resourcesManage Manage the overall allocation of resources , NodeManage  Manage the resources of the node )
sbin /stop-yarn .sh  # close
             yarn Monitoring interface   http://192.168.17.100:8088/cluster  
Close all :stop-all.sh
Start each process separately : The command to be executed is “hadoop-daemon.sh start [ Process name ]”, This startup mode is suitable for adding

9. test hadoop( Word frequency statistics )
1
2
3
4
5
6
7
8
9
cd  /usr/local/src/hadoop/data
vim words  #  Input   hello a,hello b 
cd  /usr/local/src/hadoop
bin /hadoop  fs -put  /usr/local/src/hadoop/data/words  /words  #  hold words Upload to hdfs,
#  Can be in http://192.168.17.100:50070/ notice  hdfd Default 128M If it exceeds, it will be divided into several parts 128M We need to deal with it
bin /hadoop  jar share /hadoop/mapreduce/hadoop-mapreduce-examples-2 .7.2.jar wordcount  /words  /out  # function demo
# View running status  http://192.168.17.100:8088
bin /hadoop  fs - ls  /out 
bin/hadoop fs - cat  /out/part-r-00000
10. Word frequency statistics  MapReduce A simple analysis of the principle
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
map Stage :
     input data :
         < 0 ,"hello a">
         < 8 ,"hello b">
     
     Output data :
         map(key,value,context) {
             String line = value;
//hello a
             String[] words  = value.split("\t");
             for(String word : words) {
                 //hello
                 // a
                 // hello 
                 // b
                 context.write(word,1);
             }
         }
     < hello ,1>
     < a ,1>
     < hello ,1>
     < b ,1>
 
reduce Stage ( Grouping sorting ):
     input data :
         < a ,1>
         < b ,1>
         < hello ,{1,1}>
     
     
     Output data :
         reduce(key,value,context) {
             int sum = 0;
             String word = key;
             for(int i : value) {
                 sum += i;
             }
             context.write(word,sum);
         }

10. Other questions
    1. NameNode The process did not start successfully
        1. No formatting
        2. The configuration file only copy Don't modify
        3. hostname No peace ip binding
        4. ssh The password free login of was not configured successfully
    2. Multiple formatting hadoop It's also wrong ?
        1. Delete /usr/local/src/hadoop/data/tmp Folder , Reformat

Reference resources :
1.  http://www.jikexueyuan.com/course/2475_3.html?ss=1【hadoop introduction 】, Information   link : http://pan.baidu.com/s/1hrh0mhA  password :w779【press-2949-package-v1.zip】

Hadoop Pseudo Distributed installation 、 More articles on running test examples

  1. hadoop Pseudo Distributed installation Linux Environmental preparation

    Hadoop Pseudo Distributed installation Linux Environmental preparation One . Software version VMare Workstation Pro 14 CentOS 7 32/64 position Two . Realization Linux Server networking function Network adapter double click to select VMn ...

  2. apache hadoop Pseudo Distributed installation

    1.  preparation 1.1.  Software preparation 1. install VMWare 2. stay VMWare Installation on CentOS6.5 3. install XShell5, For remote login system 4. adopt rpm -qa | grep ssh  Check cen ...

  3. 【Hadoop Study two 】Hadoop Pseudo Distributed installation

    Environmental Science virtual machine :VMware 10 Linux edition :CentOS-6.5-x86_64 client :Xshell4 FTP:Xftp4       jdk8       hadoop-3.1.1 Pseudo distribution is ...

  4. hadoop Pseudo Distributed installation

    0. Turn off firewall Failure after restart service iptables start ;# Turn on the firewall now , But it doesn't work after restart . service iptables stop ;# Close the firewall now , But it doesn't work after restart . restart ...

  5. be based on centos6.5 hadoop Pseudo Distributed installation

    step 1: modify IP Address and host name : vi /etc/sysconfig/network-scripts/ifcfg-eth0 If the file is opened as a blank file, it means that the network card file on your computer is not this name “ifcfg-e ...

  6. Hadoop Pseudo Distributed installation steps (hadoop0.20.2 edition )

    Studying recently hadoop, I have a video tutorial , His teaching version is hadoop0.20.2 edition , Now the latest versions are all here 3.0 了 , The version is a little old , But I still learned , I think it has reference value . Don't talk nonsense , Begin to introduce : Let's start with ...

  7. [ big data ] hadoop Pseudo Distributed installation

    Be careful : Node host hostname Don't bring "_" Equal character , Otherwise, an error will be reported . One . install jdk rpm -i jdk-7u80-linux-x64.rpm To configure java environment variable : vi + /e ...

  8. hadoop Pseudo Distributed installation

    hadoop Pseudo Distributed installation of : Installation of a physical or virtual machine . Environmental Science :Windows7.VMWare.CentOS 1.1 Set up ip Address explain : stay CentOS The type of network in : Host only mode : Virtual machine in Windows ...

  9. macbook Next hadoop Pseudo Distributed installation

    1 Prepare raw materials 1.1  jdk 1.8.0_171( Install and configure environment variables in advance HAVA_HOME,PATH) 1.2 Hadoop 2.8.3 2 Password free login configuration ( Otherwise, the installation process requires constant password input ) 2.1 ...

Random recommendation

  1. ELK—— For debugging Logstash Grok expression , install GrokDebuger Environmental Science

      Content install RVM install Ruby and Gems install Rails install jls-grok Ruby grok analysis debugging grok Be careful : Do not use root Do the following . use logstash ...

  2. mac Show hidden files below

    One . In the terminal ls -a You can view hidden files . Show and hide commands such as the following : Show :defaults write com.apple.finder AppleShowAllFiles -bool true ...

  3. perl tk explain

    Introduce : perl/Tk( Also known as pTK) It's a collection of modules and code , Try Simple configuration Tk 8 Parts toolkit to powerful morpheme text , Dynamic memory ,I/O, And object oriented , It's an interpretive scripting language To make parts and programs Use ...

  4. Chapter 1 Securing Your Server and Network(7): Ban SQL Server Browse

    original text :Chapter 1 Securing Your Server and Network(7): Ban SQL Server Browse The source of the original text is :http://blog.csdn.net/dba_h ...

  5. Apache Of htaccess The file appears 500 Reason for the error

    Apache I seldom use , There's a problem in today's test environment , Always 500 The mistake is inexplicable RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_FI ...

  6. L300 3 Next month after English class

    Stress A notional word in a sentence ( Content words ) To be reread , Read hard . A little longer . Function words in sentences ( function word ) To be read lightly or weakly , Read lightly . Short, in oral communication , When function words play an important role in expressing meaning in sentences , Will be reread . Liaison Liaison : The ending of a word with the next single ...

  7. mac Next python2.7 Upgrade to 3.6

    1. Preface Mac System comes with python2.7, The purpose of this article is to bring your own python Upgrade to 3.6 edition .  The way to have more books online is to let python2.7 and python3.X Two versions coexist , Bloggers don't know , It's the coexistence of the two versions , also ...

  8. linux A collection of common commands

    linux View all user groups    vi /etc/group A user can belong to more than one group , View the group to which the user belongs ,groups +  user name linux Find file command   find -name file name     stay ...

  9. Geodesic understanding from beginning to end KMP Algorithm 【 turn 】

    Reprinted from :http://blog.csdn.net/v_july_v/article/details/7041827 1. introduction Ben KMP The original text was originally written in 2 More than years ago 2011 year 12 month , Because of the first contact KMP ...

  10. C++Builder Code editor Enter automatically adds brackets

    Two major problems Brackets and folds One . brackets XE7, enter , Automatic parenthesis } To use , Sometimes it's not accurate , A lot of code , I don't know what causes the mistake , There's always one more bracket , Add a bracket as soon as you return , I dare not return to write the code , How to turn off this option ? eureka , ...