签到成功

知道了

CNDBA社区CNDBA社区

Hadoop 3.1.1 动态增加节点 操作示例

2019-01-23 18:05 3849 0 原创 Hadoop
作者: dave

1 基础环境说明

之前Hadoop 集群的环境如下:
操作系统: Redhat 7.6
Hadoop 版本:3.1.1
Java 版本:1.8

NameNode节点: 192.168.20.80
Datanode 节点1:192.168.20.81
DataNode 节点2:192.168.20.82

我们这里加3个Datanode节点:http://www.cndba.cn/cndba/dave/article/3261

Datanode 节点3:192.168.20.83
Datanode 节点4:192.168.20.84
Datanode 节点5:192.168.20.85

这部分的主要工作包含如下几点:http://www.cndba.cn/cndba/dave/article/3261

  1. 关闭操作系统的防火墙和SELinux
  2. 修改HOSTS文件
  3. 创建Hadoop用户
  4. 安装JDK
  5. 配置SSH免密登录

增加加点和搭建Hadoop 环境的准备工作一样,所以这里直接参考搭建手册:
Linux 7.6 平台 Hadoop 3.1.1 集群搭建手册
https://www.cndba.cn/download/dave/6

2 动态增加节点

2.1 修改Master节点的~/etc/Hadoop/workers文件

[https://www.cndba.cn@hadoopmaster hadoop]$ pwd
/home/cndba/hadoop/etc/hadoop
[https://www.cndba.cn@hadoopmaster hadoop]$ cat workers 
hadoopslave1
hadoopslave2
hadoopslave3
hadoopslave4
hadoopslave5
[https://www.cndba.cn@hadoopmaster hadoop]$

2.2 将Hadoop 复制到Slave 节点

[https://www.cndba.cn@hadoopmaster ~]$ scp -r hadoop hadoopslave3:`pwd`
[https://www.cndba.cn@hadoopmaster ~]$ scp -r hadoop hadoopslave4:`pwd`
[https://www.cndba.cn@hadoopmaster ~]$ scp -r hadoop hadoopslave5:`pwd`

这里也可以先解压缩Hadoop的安装文件,然将Master端上对应的配置文件scp到新的节点上。

2.3 配置环境变量

在配置文件最后一行添加如下配置,在所有节点都要进行:

http://www.cndba.cn/cndba/dave/article/3261
http://www.cndba.cn/cndba/dave/article/3261

[root@hadoopslave3 ~]#vim /etc/profile

#HADOOP
export HADOOP_HOME=/home/cndba/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
[root@hadoopmaster /]# source /etc/profile

验证Hadoop 版本:

http://www.cndba.cn/cndba/dave/article/3261

[root@hadoopslave3 ~]# hdfs version
Hadoop 3.1.1
Source code repository https://github.com/apache/hadoop -r 2b9a8c1d3a2caf1e733d57f346af3ff0d5ba529c
Compiled by leftnoteasy on 2018-08-02T04:26Z
Compiled with protoc 2.5.0
From source with checksum f76ac55e5b5ff0382a9f7df36a3ca5a0
This command was run using /home/cndba/hadoop/share/hadoop/common/hadoop-common-3.1.1.jar
[root@hadoopslave3 ~]#

2.4 在新增的节点启动datanode进程

在之前的博客,我们已经了解了相关的目录,如下:
Hadoop HDFS 常用命令 汇总
https://www.cndba.cn/dave/article/3258http://www.cndba.cn/cndba/dave/article/3261

启动datanode进程:
[https://www.cndba.cn@hadoopslave3 ~]$ hadoop-daemon.sh start datanode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.
[https://www.cndba.cn@hadoopslave3 ~]$ hdfs --daemon stop datanode
[https://www.cndba.cn@hadoopslave3 ~]$ hdfs --daemon start datanode
[https://www.cndba.cn@hadoopslave3 ~]$ jps
23754 Jps
23692 DataNode
[https://www.cndba.cn@hadoopslave3 ~]$

因为我们增加了3台datanode节点,所有在所有节点都要执行相关的目录。

操作完成后查看HDFS报告:

[https://www.cndba.cn@hadoopslave5 ~]$ hdfs dfsadmin -report
Configured Capacity: 449682350080 (418.80 GB)
Present Capacity: 425212128150 (396.01 GB)
DFS Remaining: 416107417600 (387.53 GB)
DFS Used: 9104710550 (8.48 GB)
DFS Used%: 2.14%
Replicated Blocks:
    Under replicated blocks: 2
    Blocks with corrupt replicas: 0
    Missing blocks: 0
    Missing blocks (with replication factor 1): 0
    Pending deletion blocks: 0
Erasure Coded Block Groups: 
    Low redundancy block groups: 0
    Block groups with corrupt internal blocks: 0
    Missing block groups: 0
    Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (5):

Name: 192.168.20.81:9866 (hadoopslave1)
Hostname: hadoopslave1
Decommission Status : Normal
Configured Capacity: 89936470016 (83.76 GB)
DFS Used: 2973914066 (2.77 GB)
Non DFS Used: 4972280878 (4.63 GB)
DFS Remaining: 81990275072 (76.36 GB)
DFS Used%: 3.31%
DFS Remaining%: 91.16%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Jan 24 01:53:00 CST 2019
Last Block Report: Thu Jan 24 01:49:46 CST 2019
Num of Blocks: 38891


Name: 192.168.20.82:9866 (hadoopslave2)
Hostname: hadoopslave2
Decommission Status : Normal
Configured Capacity: 89936470016 (83.76 GB)
DFS Used: 3080699676 (2.87 GB)
Non DFS Used: 4902387940 (4.57 GB)
DFS Remaining: 81953382400 (76.33 GB)
DFS Used%: 3.43%
DFS Remaining%: 91.12%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Jan 24 01:53:00 CST 2019
Last Block Report: Wed Jan 23 23:00:33 CST 2019
Num of Blocks: 38884


Name: 192.168.20.83:9866 (hadoopslave3)
Hostname: hadoopslave3
Decommission Status : Normal
Configured Capacity: 89936470016 (83.76 GB)
DFS Used: 3018645831 (2.81 GB)
Non DFS Used: 4979719865 (4.64 GB)
DFS Remaining: 81938104320 (76.31 GB)
DFS Used%: 3.36%
DFS Remaining%: 91.11%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Jan 24 01:53:00 CST 2019
Last Block Report: Thu Jan 24 01:23:30 CST 2019
Num of Blocks: 39047


Name: 192.168.20.84:9866 (hadoopslave4)
Hostname: hadoopslave4
Decommission Status : Normal
Configured Capacity: 89936470016 (83.76 GB)
DFS Used: 18900703 (18.03 MB)
Non DFS Used: 4811819297 (4.48 GB)
DFS Remaining: 85105750016 (79.26 GB)
DFS Used%: 0.02%
DFS Remaining%: 94.63%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Jan 24 01:53:01 CST 2019
Last Block Report: Thu Jan 24 01:52:37 CST 2019
Num of Blocks: 295


Name: 192.168.20.85:9866 (hadoopslave5)
Hostname: hadoopslave5
Decommission Status : Normal
Configured Capacity: 89936470016 (83.76 GB)
DFS Used: 12550274 (11.97 MB)
Non DFS Used: 4804013950 (4.47 GB)
DFS Remaining: 85119905792 (79.27 GB)
DFS Used%: 0.01%
DFS Remaining%: 94.64%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu Jan 24 01:53:01 CST 2019
Last Block Report: Thu Jan 24 01:52:52 CST 2019
Num of Blocks: 103


[https://www.cndba.cn@hadoopslave5 ~]$

因为HDFS里面都是有数据的,增加节点会导致数据分布不均衡,所有需要进行自动均衡操作。

默认的数据传输带宽比较低,可以设置为64M,目录如下:

http://www.cndba.cn/cndba/dave/article/3261

[https://www.cndba.cn@hadoopmaster ~]$ hdfs dfsadmin -setBalancerBandwidth 67108864
Balancer bandwidth is set to 67108864

默认balancer的threshold为10%,即各个节点与集群总的存储使用率相差不超过10%,我们在均衡时可以指定该值为5%:

[https://www.cndba.cn@hadoopmaster ~]$ start-balancer.sh -threshold 5

2.5 在新增的节点启动nodemanager进程

Hadoop 从2.X引入了YARN框架,所以对于每个计算节点都可以通过NodeManager进行管理,同理启动NodeManager进程后,即可将其加入集群。http://www.cndba.cn/cndba/dave/article/3261

[https://www.cndba.cn@hadoopslave3 data]$ yarn-daemon.sh start nodemanager
WARNING: Use of this script to start YARN daemons is deprecated.
WARNING: Attempting to execute replacement "yarn --daemon start" instead.
[https://www.cndba.cn@hadoopslave3 data]$

[https://www.cndba.cn@hadoopsalve4 ~]$ yarn --daemon start nodemanager

[https://www.cndba.cn@hadoopslave5 ~]$ yarn --daemon start nodemanager
[https://www.cndba.cn@hadoopslave5 ~]$ jps
22229 DataNode
23658 NodeManager
23756 Jps
[https://www.cndba.cn@hadoopslave5 ~]$

通过Yarn查看集群信息:

http://www.cndba.cn/cndba/dave/article/3261
http://www.cndba.cn/cndba/dave/article/3261

[https://www.cndba.cn@hadoopslave5 ~]$ yarn node -list
2019-01-24 01:59:33,613 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.20.80:8032
Total Nodes:5
         Node-Id         Node-State    Node-Http-Address    Number-of-Running-Containers
hadoopslave2:36452            RUNNING    hadoopslave2:8042                               0
hadoopslave3:39882            RUNNING    hadoopslave3:8042                               0
hadoopslave4:43669            RUNNING    hadoopslave4:8042                               0
hadoopslave1:44641            RUNNING    hadoopslave1:8042                               0
hadoopslave5:37795            RUNNING    hadoopslave5:8042                               0
[https://www.cndba.cn@hadoopslave5 ~]$

新增的节点 slaves 启动了 DataNode 和 NodeManager,实现了动态向集群添加了节点,至此Hadoop 动态添加节点操作完成。

版权声明:本文为博主原创文章,未经博主允许不得转载。

用户评论
* 以下用户言论只代表其个人观点,不代表CNDBA社区的观点或立场
dave

dave

关注

人的一生应该是这样度过的:当他回首往事的时候,他不会因为虚度年华而悔恨,也不会因为碌碌无为而羞耻;这样,在临死的时候,他就能够说:“我的整个生命和全部精力,都已经献给世界上最壮丽的事业....."

  • 2262
    原创
  • 3
    翻译
  • 578
    转载
  • 192
    评论
  • 访问:8067185次
  • 积分:4349
  • 等级:核心会员
  • 排名:第1名
精华文章
    最新问题
    查看更多+
    热门文章
      热门用户
      推荐用户
        Copyright © 2016 All Rights Reserved. Powered by CNDBA · 皖ICP备2022006297号-1·

        QQ交流群

        注册联系QQ