CentOS 6.4下Hadoop2.3.0详细安装过程

Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。

Hadoop的框架最核心的设计就是:HDFS和MapReduce.HDFS为海量的数据提供了存储,则MapReduce为海量的数据提供了计算。

--------------------------------------分割线 --------------------------------------

CentOS 6.4下Hadoop 0.20.2安装实录 

Ubuntu 13.04上搭建Hadoop环境

Ubuntu 12.10 +Hadoop 1.2.1版本集群配置

Ubuntu上搭建Hadoop环境(单机模式+伪分布模式)

Ubuntu下Hadoop环境的配置

单机版搭建Hadoop环境图文教程详解

--------------------------------------分割线 --------------------------------------

1,系统架构
集群角色:
主机名 ip地址 角色
name01 192.168.52.128 NameNode、ResourceManager(JobTracker)
data01 192.168.52.129 NameNode、ResourceManager(JobTracker)
data02 192.168.52.130 DataNode、NodeManager(TaskTracker)

系统环境:
centos6.5 x64 vmware vpc
硬盘:30G
内存:1G

hadoop版本:hadoop-2.3.0

2,环境准备
2.1 系统设置
关闭iptables:
/sbin/service iptables stop

/sbin/chkconfig iptables off
关闭selinux: setenforce 0
sed "s@^SELINUX=enforcing@SELINUX=disabled@g" /etc/sysconfig/selinux

设置节点名称,
所有节点执行:
/bin/cat <<EOF> /etc/hosts
localhost.localdomain=data01 #或者name01,data02
192.168.52.128 name01
192.168.52.129 data01
192.168.52.130 data02
EOF

hostname node0*
send "s@HOSTNAME=localhost.localdomain@HOSTNAME=node0*@g" /etc/sysconfig/network

2.2 用户目录创建
创建hadoop运行账户:
使用root登陆所有机器后,所有的机器都创建hadoop用户
useradd hadoop #设置hadoop用户组
passwd hadoop
#sudo useradd –s /bin/bash –d /home/hadoop –m hadoop –g hadoop –G admin //添加一个zhm用户,此用户属于hadoop用户组,且具有admin权限。
#su hadoop //切换到zhm用户中

创建hadoop相关目录:
定义需要数据及目录的存放路径,定义代码及工具存放的路径
mkdir -p /home/hadoop/src
mkdir -p /home/hadoop/tools
chown -R hadoop.hadoop /home/hadoop/*

定义数据节点存放的路径到跟目录下的hadoop文件夹, 这里是数据节点存放目录需要有足够的空间存放
mkdir -p /data/hadoop/hdfs
mkdir -p /data/hadoop/tmp
mkdir -p /var/logs/hadoop

设置可写权限
chmod -R 777 /data/hadoop
chown -R hadoop.hadoop /data/hadoop/*
chown -R hadoop.hadoop /var/logs/hadoop

定义java安装程序路径
mkdir -p /usr/lib/jvm/

2.3 配置ssh免密码登陆
参考文章地址:
SSH主要通过RSA算法来产生公钥与私钥,在数据传输过程中对数据进行加密来保障数
据的安全性和可靠性,公钥部分是公共部分,网络上任一结点均可以访问,私钥主要用于对数据进行加密,以防他人盗取数据。总而言之,这是一种非对称算法,
想要破解还是非常有难度的。Hadoop集群的各个结点之间需要进行数据的访问,被访问的结点对于访问用户结点的可靠性必须进行验证,hadoop采用的是ssh的方
法通过密钥验证及数据加解密的方式进行远程安全登录操作,当然,如果hadoop对每个结点的访问均需要进行验证,其效率将会大大降低,所以才需要配置SSH免
密码的方法直接远程连入被访问结点,这样将大大提高访问效率。
namenode节点配置免密码登陆其他节点,每个节点都要产生公钥密码,Id_dsa.pub为公钥,id_dsa为私钥,紧接着将公钥文件复制成authorized_keys文件,这个步骤是必须的,过程如下:

2.3.1 每个节点分别产生密钥
# 提示:
(1):.ssh目录需要755权限,authorized_keys需要644权限;
(2):Linux防火墙开着,hadoop需要开的端口需要添加,或者关掉防火墙;
(3):数据节点连不上主服务器还有可能是使用了机器名的缘故,还是使用IP地址比较稳妥。

name01(192.168.52.128)主库上面:
namenode主节点hadoop账户创建服务器登陆公私钥:
mkdir -p /home/hadoop/.ssh
chown hadoop.hadoop -R /home/hadoop/.ssh
chmod 755 /home/hadoop/.ssh
su - hadoop
cd /home/hadoop/.ssh
ssh-keygen -t dsa -P '' -f id_dsa
[hadoop@name01 .ssh]$ ssh-keygen -t dsa -P '' -f id_dsa
Generating public/private dsa key pair.
open id_dsa failed: Permission denied.
Saving the key failed: id_dsa.
[hadoop@name01 .ssh]$
报错,解决办法是: setenforce 0
[root@name01 .ssh]# setenforce 0
su - hadoop
[hadoop@name01 .ssh]$ ssh-keygen -t dsa -P '' -f id_dsa
Generating public/private dsa key pair.
Your identification has been saved in id_dsa.
Your public key has been saved in id_dsa.pub.
The key fingerprint is:
52:69:9a:ff:07:f4:fc:28:1e:48:18:fe:93:ca:ff:1d hadoop@name01
The key's randomart image is:
+--[ DSA 1024]----+
| |
| . |
| . + |
| . B . |
| * S. o |
| = o. o |
| * ..Eo |
| . . o.oo.. |
| o..o+o. |
+-----------------+
[hadoop@name01 .ssh]$ ll
total 12
-rw-------. 1 hadoop hadoop 668 Aug 20 23:58 id_dsa
-rw-r--r--. 1 hadoop hadoop 603 Aug 20 23:58 id_dsa.pub
drwxrwxr-x. 2 hadoop hadoop 4096 Aug 20 23:48 touch
[hadoop@name01 .ssh]$
Id_dsa.pub为公钥,id_dsa为私钥,紧接着将公钥文���复制成authorized_keys文件,这个步骤是必须的,过程如下:
[hadoop@name01 .ssh]$ cat id_dsa.pub >> authorized_keys
[hadoop@name01 .ssh]$ ll
total 16
-rw-rw-r--. 1 hadoop hadoop 603 Aug 21 00:00 authorized_keys
-rw-------. 1 hadoop hadoop 668 Aug 20 23:58 id_dsa
-rw-r--r--. 1 hadoop hadoop 603 Aug 20 23:58 id_dsa.pub
drwxrwxr-x. 2 hadoop hadoop 4096 Aug 20 23:48 touch
[hadoop@name01 .ssh]$
用上述同样的方法在剩下的两个结点中如法炮制即可。

data01(192.168.52.129)
2.3.2 在data01(192.168.52.129)上面执行:
useradd hadoop #设置hadoop用户组
passwd hadoop #设置hadoop密码为hadoop
setenforce 0
su - hadoop
mkdir -p /home/hadoop/.ssh
cd /home/hadoop/.ssh
ssh-keygen -t dsa -P '' -f id_dsa
cat id_dsa.pub >> authorized_keys

2.3.3 在data01(192.168.52.130)上面执行:
useradd hadoop #设置hadoop用户组
passwd hadoop #设置hadoop密码为hadoop
setenforce 0
su - hadoop
mkdir -p /home/hadoop/.ssh
cd /home/hadoop/.ssh
ssh-keygen -t dsa -P '' -f id_dsa
cat id_dsa.pub >> authorized_keys

2.3.4 构造3个通用的authorized_keys
在name01(192.168.52.128)上操作:
su - hadoop
cd /home/hadoop/.ssh
scp hadoop@data01:/home/hadoop/.ssh/id_dsa.pub ./id_dsa.pub.data01
scp hadoop@data02:/home/hadoop/.ssh/id_dsa.pub ./id_dsa.pub.data02
cat id_dsa.pub.data01 >> authorized_keys
cat id_dsa.pub.data02 >> authorized_keys

如下所示:
[hadoop@name01 .ssh]$ scp hadoop@data01:/home/hadoop/.ssh/id_dsa.pub ./id_dsa.pub.data01
The authenticity of host 'data01 (192.168.52.129)' can't be established.
RSA key fingerprint is 5b:22:7b:dc:0c:b8:bf:5c:92:aa:ff:93:3c:59:bd:d3.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'data01,192.168.52.129' (RSA) to the list of known hosts.
hadoop@data01's password:
Permission denied, please try again.
hadoop@data01's password:
id_dsa.pub 100% 603 0.6KB/s 00:00
[hadoop@name01 .ssh]$
[hadoop@name01 .ssh]$ scp hadoop@data02:/home/hadoop/.ssh/id_dsa.pub ./id_dsa.pub.data02
The authenticity of host 'data02 (192.168.52.130)' can't be established.
RSA key fingerprint is 5b:22:7b:dc:0c:b8:bf:5c:92:aa:ff:93:3c:59:bd:d3.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'data02,192.168.52.130' (RSA) to the list of known hosts.
hadoop@data02's password:
id_dsa.pub 100% 603 0.6KB/s 00:00
[hadoop@name01 .ssh]$
[hadoop@name01 .ssh]$ cat id_dsa.pub.data01 >> authorized_keys
[hadoop@name01 .ssh]$ cat id_dsa.pub.data02 >> authorized_keys
[hadoop@name01 .ssh]$ cat authorized_keys
ssh-dss AAAAB3NzaC1kc3MAAACBAI2jwEdOWNFFcpys/qB4OercYLY5o5XvBn8a5iy9K/WqYcaz35SimzxQxGtVxWq6AKoKaO0nfjE3m1muTP0grVd5i+HLzysRcpomdFc6z2PXnh4b8pA4QbFyYjxEAp5HszypYChEGGEgpBKoeOei5aA1+ufF1S6b8yEozskITUi7AAAAFQDff2ntRh50nYROstA6eV3IN2ql9QAAAIAtOFp2bEt9QGvWkkeiUV3dcdGd5WWYSHYP0jUULU4Wz0gvCmbpL6uyEDCAiF88GBNKbtDJKE0QN1/U9NtxL3RpO7brTOV7fH0insZ90cnDed6qmZTK4zXITlPACLafPzVB2y/ltH3z0gtctQWTydn0IzppS2U5oe39hWDmBBcYEwAAAIBbV8VEEwx9AUrv8ltbcZ3eUUScFansiNch9QNKZ0LeUEd4pjkvMAbuEAcJpdSqhgLBHsQxpxo3jXpM17vy+AiCds+bINggkvayE6ixRTBvQMcY6j1Bu7tyRmsGlC998HYBXbv/XyC9slCmzbPhvvTk4tAwHvlLkozP3sWt0lDtsw== hadoop@name01
ssh-dss AAAAB3NzaC1kc3MAAACBAJsVCOGZbKkL5gRMapCObhd1ndv1UHUCp3ZC89BGQEHJPKOz8DRM9wQYFLK7pWeCzr4Vt5ne8iNBVJ6LdXFt703b6dYZqp5zpV41R0wdh2wBAhfjO/FI8wUskAGDpnuqer+5XvbDFZgbkVlI/hdrOpKHoekY7hzX2lPO5gFNeU/dAAAAFQDhSINPQqNMjSnyZm5Zrx66+OEgKwAAAIBPQb5qza7EKbGnOL3QuP/ozLX73/7R6kxtrgfskqb8ejegJbeKXs4cZTdlhNfIeBew1wKQaASiklQRqYjYQJV5x5MaPHTvVwoWuSck/3oRdmvKVKBASElhTiiGLQL3Szor+eTbLU76xS+ydILwbeVh/MGyDfXdXRXfRFzSsOvCsAAAAIAeCGgfT8xjAO2M+VIRTbTA51ml1TqLRHjHoBYZmg65oz1/rnYfReeM0OidMcN0yEjUcuc99iBIE5e8DUVWPsqdDdtRAne5oXK2kWVu3PYIjx9l9f9A825gNicC4GAPRg0OOL54vaOgr8LFDJ9smpqK/p7gojCiSyzXltGqfajkpg== hadoop@data01
ssh-dss AAAAB3NzaC1kc3MAAACBAOpxZmz7oWUnhAiis2TiVWrBPaEtMZoxUYf8lmKKxtP+hM/lTDQyIK10bfKekJa52wNCR6q3lVxbFK0xHP04WHeb4Z0WjqNLNiE/U7h0gYCVG2M10sEFycy782jmBDwdc0R8MEy+nLRPmU5oPqcWBARxj0obg01PAj3wkfV+28zDAAAAFQC6a4yeCNX+lzIaGTd0nnxszMHhvwAAAIAevFSuiPi8Axa2ePP+rG/VS8QWcwmGcFZoR+K9TUdFJ4ZnfdKb4lqu78f9n68up2oJtajqXYuAzD08PerjWhPcLJAs/2qdGO1Ipiw/cXN2TyfHrnMcDr3+aEf7cUGHfWhwW4+1JrijHQ4Z9UeHNeEP6nU4I38FmS7gf9/f9MOVVwAAAIBlL1NsDXZUoEUXOws7tpMFfIaL7cXs7p5R+qk0BLdfllwUIwms++rKI9Ymf35l1U000pvaI8pz8s7I8Eo/dcCbWrpIZD1FqBMIqWhdG6sFP1qr9Nn4RZ00DxCz34ft4M8g+0CIn4Bg3pp4ZZES435R40F+jlrsnbLaXI+ixCzpqw== hadoop@data02
[hadoop@name01 .ssh]$

看到authorized_keys文件里面有3行记录,分别代表了访问name01,data01,data02的公用密钥。把这个authorized_keys公钥文件copy到data01和data02上面同一个目录下。

然后通过hadoop远程彼此连接name01、data01、data02就可以免密码了
scp authorized_keys hadoop@data01:/home/hadoop/.ssh/
scp authorized_keys hadoop@data02:/home/hadoop/.ssh/
然后分别在name01、data01、data02以hadoop用户执行权限赋予操作
su - hadoop
chmod 600 /home/hadoop/.ssh/authorized_keys
chmod 700 -R /home/hadoop/.ssh


测试ssh免秘钥登录,首次连接的时候,需要输入yes,之后就不用输入密码直接可以ssh过去了。
[hadoop@name01 .ssh]$ ssh hadoop@data01
Last login: Thu Aug 21 01:53:24 2014 from name01
[hadoop@data01 ~]$ ssh hadoop@data02
The authenticity of host 'data02 (192.168.52.130)' can't be established.
RSA key fingerprint is 5b:22:7b:dc:0c:b8:bf:5c:92:aa:ff:93:3c:59:bd:d3.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'data02,192.168.52.130' (RSA) to the list of known hosts.
[hadoop@data02 ~]$ ssh hadoop@name01
The authenticity of host 'name01 (::1)' can't be established.
RSA key fingerprint is 5b:22:7b:dc:0c:b8:bf:5c:92:aa:ff:93:3c:59:bd:d3.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'name01' (RSA) to the list of known hosts.
Last login: Thu Aug 21 01:56:12 2014 from data01
[hadoop@data02 ~]$ ssh hadoop@name01
Last login: Thu Aug 21 01:56:22 2014 from localhost.localdomain
[hadoop@data02 ~]$
看到问题所在,从data01、data02上面ssh到name01上面没有成功,问题在哪里?

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/be99eb6ef4b02d2cafd1e5b7605b9eb8.html