无情 @ 2015-12-02 17:57:42 阅读(1316)
Hadoop 大数据


Linux下Hadoop2.7.1安装与配置



1、环境 安装工具


  三台阿里云服务器

  IP

 Master  10.161.217.220 (此主机当作是Master) 

 Slave1  10.162.80.105  

 Slave2  10.117.7.209  



 安装jdk 或者opnejdk 两者的JAVA_HOME是不一样的。

 本人是openjdk

 在vim  /etc/profile 添加

   export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.91.x86_64

 

 完成后保存后执行source /etc/profile




 下载

    wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz

    tar zxvf hadoop-2.7.1.tar.gz


    本人安装目录是  /usr/local/hadoop-2.7.1/


2、ssh免登录


SSH免密码登录,因为Hadoop需要通过SSH登录到各个节点进行操作,我用的是root用户,每台服务器都生成公钥,再合并到authorized_keys


(1)CentOS默认没有启动ssh无密登录,去掉/etc/ssh/sshd_config其中2行的注释,每台服务器都要设置(如果已经开启请忽略)

#RSAAuthentication yes

#PubkeyAuthentication yes


(2)输入命令,ssh-keygen -t rsa,生成key,都不输入密码,一直回车,/root就会生成.ssh文件夹,每台服务器都要设置


(3)合并公钥到authorized_keys文件,在Master服务器,进入/root/.ssh目录,通过SSH命令合并,

cat id_rsa.pub>> authorized_keys

ssh root@10.162.80.105 cat ~/.ssh/id_rsa.pub>> authorized_keys

ssh root@10.117.7.209  cat ~/.ssh/id_rsa.pub>> authorized_keys


(4)把Master服务器的authorized_keys、known_hosts复制到Slave服务器的/root/.ssh目录


(5)完成,ssh root@10.162.80.105 ssh root@10.117.7.209 就不需要输入密码了




3、安装Hadoop2.7.1,只在Master服务器解压,再复制到Slave服务器


(1)安装目录下创建数据存放的文件夹,tmp、hdfs、hdfs/data、hdfs/name

 (2)修改core-site.xml 加上


 <property>
        <name>fs.defaultFS</name>
        <value>hdfs://10.161.217.220:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/usr/local/hadoop-2.7.1/etc/hadoop</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>131702</value>
    </property>


    (3)修改hdfs-site.xml,加上

  <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/hadoop-2.7.1/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/usr/local/hadoop-2.7.1/dfs/data</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>10.161.217.220:9001</value>
    </property>
    <property>
       <name>dfs.webhdfs.enabled</name>
       <value>true</value>
    </property>
    <property>
    <name>dfs.replication</name>
    <value>1</value>
    </property>


    (4) 修改 mapred-site.xml.template   加上

    

 <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>10.161.217.220:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>10.161.217.220:19888</value>
    </property>



    (5)修改yarn-site.xml,加上


  <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>10.161.217.220:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>10.161.217.220:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>10.161.217.220:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>10.161.217.220:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>10.161.217.220:8088</value>
    </property>
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>768</value>
    </property>


  (6) 配置完成后复制到另外2台slave上

    scp -r  hadoop-2.7.1  root@10.162.80.105:/usr/local/
     scp -r  hadoop-2.7.1  root@10.117.7.209:/usr/local/


 (7)启动hadoop(这里只需在Master 上执行,slave不需要,因为系统会自动帮他启动)

    主服务器上执行bin/hdfs namenode -format

    进行初始化

    sbin目录下执行 ./start-all.sh 

    可以使用jps查看信息

    停止的话,输入命令,sbin/stop-all.sh

   可以通过8088端口看到web界面(这里只能是内网访问,如果需要外网访问那么请通过nginx或其他代理出去,)