TDSQL 集群 HDFS 启动报错 Connection refused / NameNode is not formatted 解决方法
作者:
dave
之前在高版本的TDSQL 集群安装HDFS时并没有遇到问题,具体过程如下:
TDSQL 分布式集群(10.3.16.2.0) 搭建手册 详细截图版
https://www.cndba.cn/dave/article/4595
但是今天在安装低版本的TDSQL 集群时,HDFS 安装之后无法启动。 遇到2个错误。
1 错误1:tdsql1:9002 failed on connection exception
安装hdfs 过程没有问题,但是访问时被拒绝:
[tdsql@www.cndba.cn ~]$ hadoop fs -ls /
ls: Call From tdsql1/10.206.0.16 to tdsql1:9002 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
[tdsql@www.cndba.cn ~]$
怀疑是hdfs没有启动成功,所有手工重启了一下,问题依旧:
[tdsql@www.cndba.cn ~]$ hdfs --daemon stop datanode
[tdsql@www.cndba.cn ~]$ hdfs --daemon stop namenode
[tdsql@www.cndba.cn ~]$ hdfs --daemon start namenode
[tdsql@www.cndba.cn ~]$ hdfs --daemon start datanode
[tdsql@www.cndba.cn ~]$ hadoop fs -ls /
ls: Call From tdsql1/10.206.0.16 to tdsql1:9002 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
[tdsql@www.cndba.cn ~]$
查看日志:
[root@www.cndba.cn tdsql_full_install_ansible]# ps -ef|grep hdfs
tdsql 64204 1 1 20:09 pts/0 00:00:04 /data/home/tdsql/jdk1.8.0_51/bin/java -Dproc_datanode -Djava.library.path=/data/home/tdsql/hadoop-3.2.1/lib -Dhadoop.security.logger=ERROR,RFAS -Dyarn.log.dir=/data/home/tdsql/hadoop-3.2.1/logs -Dyarn.log.file=hadoop-tdsql-datanode-tdsql1.log -Dyarn.home.dir=/data/home/tdsql/hadoop-3.2.1 -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/data/home/tdsql/hadoop-3.2.1/logs -Dhadoop.log.file=hadoop-tdsql-datanode-tdsql1.log -Dhadoop.home.dir=/data/home/tdsql/hadoop-3.2.1 -Dhadoop.id.str=tdsql -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.datanode.DataNode
root 76802 3738 0 20:13 pts/1 00:00:00 grep --color=auto hdfs
[root@www.cndba.cn tdsql_full_install_ansible]# cd /data/home/tdsql/hadoop-3.2.1/logs
[root@www.cndba.cn logs]#
[root@www.cndba.cn logs]# ls
hadoop-tdsql-datanode-tdsql1.log hadoop-tdsql-namenode-tdsql1.log SecurityAuth-tdsql.audit
hadoop-tdsql-datanode-tdsql1.out hadoop-tdsql-namenode-tdsql1.out
hadoop-tdsql-datanode-tdsql1.out.1 hadoop-tdsql-namenode-tdsql1.out.1
[root@www.cndba.cn logs]#
[root@www.cndba.cn logs]# more hadoop-tdsql-namenode-tdsql1.log
……
2021-07-26 20:09:43,669 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data/hadoop/tmp/dfs is in an inconsistent
state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:391)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:242)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1105)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:720)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:648)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:710)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:953)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:926)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1692)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1759)
2021-07-26 20:09:43,671 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: org.apache.hadoop.hdfs.server.commo
n.InconsistentFSStateException: Directory /data/hadoop/tmp/dfs is in an inconsistent state: storage directory does not exist or is not accessible.
2021-07-26 20:09:43,672 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at tdsql1/10.206.0.16
************************************************************/
这里提示/data/hadoop/tmp/dfs 目录不存在,或者没有访问权限,检查了一下,确实没有,手工创建了一下:
[tdsql@www.cndba.cn ~]$ mkdir -p /data/hadoop/tmp/dfs
[tdsql@www.cndba.cn ~]$ hdfs --daemon start namenode
然后就变成了另外一个错误2。
2 错误2:NameNode is not formatted.
解决了第一步的目录问题,启动之后,变成了另外一个错误:
2021-07-26 20:21:49,593 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@7161
d8d1{/static,file:///data/home/tdsql/hadoop-3.2.1/share/hadoop/hdfs/webapps/static/,UNAVAILABLE}
2021-07-26 20:21:49,593 INFO org.eclipse.jetty.server.handler.ContextHandler: Stopped o.e.j.s.ServletContextHandler@75d3
a5e0{/logs,file:///data/home/tdsql/hadoop-3.2.1/logs/,UNAVAILABLE}
2021-07-26 20:21:49,595 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2021-07-26 20:21:49,598 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2021-07-26 20:21:49,598 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complet
e.
2021-07-26 20:21:49,598 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException: NameNode is not formatted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:252)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1105)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:720)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:648)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:710)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:953)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:926)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1692)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1759)
2021-07-26 20:21:49,599 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: java.io.IOException: NameNode is not formatted.
2021-07-26 20:21:49,600 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at tdsql1/10.206.0.16
************************************************************/
这个到是hdfs中的常见问题,手工执行下格式化,然后启动namenode,问题解决。
[tdsql@www.cndba.cn ~]$
[tdsql@www.cndba.cn ~]$ hdfs namenode -format
Formatting using clusterid: CID-4ee7cec9-5cb2-4c5d-afd3-20b1769cd8ef
[tdsql@www.cndba.cn ~]$ hadoop fs -ls /
ls: Call From tdsql1/10.206.0.16 to tdsql1:9002 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
[tdsql@www.cndba.cn ~]$ hdfs --daemon start namenode
[tdsql@www.cndba.cn ~]$ hadoop fs -ls /
Found 1 items
drwxr-xr-x - tdsql supergroup 0 2021-07-26 20:23 /tdsqlbackup
[tdsql@www.cndba.cn ~]$
注意这里要用tdsql 用户来执行,否则会提示命令不存在。
版权声明:本文为博主原创文章,未经博主允许不得转载。