签到成功

知道了

CNDBA社区CNDBA社区

HBase Master 启动 check the config value of 'hbase.procedure.store.wal.use.hsync' 解决方法

2019-03-05 13:07 6393 0 原创 HBase
作者: dave

搭建HBase 集群,执行启动命令后,住HMaster 进程无法启动,只有back-Masters配置中的可以启动。

http://www.cndba.cn/dave/article/3321
http://www.cndba.cn/dave/article/3321

查看日志如下:

2019-03-05 20:37:56,379 INFO  [master/hadoopslave1:16000:becomeActiveMaster] coordination.SplitLogManagerCoordination: Found 0 orphan tasks and 0 rescan nodes
2019-03-05 20:37:56,456 INFO  [ReadOnlyZKClient-hadoopmaster:2181,hadoopslave1:2181,hadoopslave2:2181@0x76359dbf] zookeeper.ZooKeeper: Initiating client connection, connectString=hadoopmaster:2181,hadoopslave1:2181,hadoopslave2:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$79/479258815@59927549
2019-03-05 20:37:56,459 INFO  [ReadOnlyZKClient-hadoopmaster:2181,hadoopslave1:2181,hadoopslave2:2181@0x76359dbf-SendThread(hadoopslave1:2181)] zookeeper.ClientCnxn: Opening socket connection to server hadoopslave1/192.168.20.81:2181. Will not attempt to authenticate using SASL (unknown error)
2019-03-05 20:37:56,459 INFO  [ReadOnlyZKClient-hadoopmaster:2181,hadoopslave1:2181,hadoopslave2:2181@0x76359dbf-SendThread(hadoopslave1:2181)] zookeeper.ClientCnxn: Socket connection established to hadoopslave1/192.168.20.81:2181, initiating session
2019-03-05 20:37:56,465 INFO  [ReadOnlyZKClient-hadoopmaster:2181,hadoopslave1:2181,hadoopslave2:2181@0x76359dbf-SendThread(hadoopslave1:2181)] zookeeper.ClientCnxn: Session establishment complete on server hadoopslave1/192.168.20.81:2181, sessionid = 0x20011e3dcbb000a, negotiated timeout = 40000
2019-03-05 20:37:56,525 INFO  [master/hadoopslave1:16000:becomeActiveMaster] procedure2.ProcedureExecutor: Starting 16 core workers (bigger of cpus/4 or 16) with max (burst) worker count=160, start 1 urgent thread(s)
2019-03-05 20:37:56,545 ERROR [master/hadoopslave1:16000:becomeActiveMaster] master.HMaster: Failed to become active master
java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
    at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:1086)
    at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:423)
    at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:611)
    at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1458)
    at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:890)
    at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2272)
    at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:581)
    at java.lang.Thread.run(Thread.java:748)
2019-03-05 20:37:56,547 ERROR [master/hadoopslave1:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master hadoopslave1,16000,1551789470904: Unhandled exception. Starting shutdown. *****
java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
    at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:1086)
    at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:423)
    at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:611)
    at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1458)
    at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:890)
    at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2272)
    at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:581)
    at java.lang.Thread.run(Thread.java:748)
2019-03-05 20:37:56,547 INFO  [master/hadoopslave1:16000:becomeActiveMaster] regionserver.HRegionServer: ***** STOPPING region server 'hadoopslave1,16000,1551789470904' *****
2019-03-05 20:37:56,547 INFO  [master/hadoopslave1:16000:becomeActiveMaster] regionserver.HRegionServer: STOPPED: Stopped by master/hadoopslave1:16000:becomeActiveMaster
2019-03-05 20:37:56,967 INFO  [master/hadoopslave1:16000] ipc.NettyRpcServer: Stopping server on /192.168.20.81:16000
2019-03-05 20:37:56,982 INFO  [master/hadoopslave1:16000] regionserver.HRegionServer: Stopping infoServer

这里显示了主进程异常退出的原因:

http://www.cndba.cn/dave/article/3321
http://www.cndba.cn/dave/article/3321http://www.cndba.cn/dave/article/3321

java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.

解决方法:
一种方法是在hbase-site.xml配置文件里增加如下内容:

<property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
    <description>
        Controls whether HBase will check for stream capabilities (hflush/hsync).
        Disable this if you intend to run on LocalFileSystem, denoted by a rootdir
        with the 'file://' scheme, but be mindful of the NOTE below.
        WARNING: Setting this to false blinds you to potential data loss and
        inconsistent system state in the event of process and/or node failures. If
        HBase is complaining of an inability to use hsync or hflush it's most
        likely not a false positive.
    </description>
</property>

hbase.unsafe.stream.capability.enforce:使用本地文件系统设置为false,使用hdfs设置为true。但根据HBase 官方手册的说明:HBase 从2.0.0 开始默认使用的是asyncfs。

http://www.cndba.cn/dave/article/3321

137.1.3. Master fails to become active due to lack of hsync for filesystem
HBase’s internal framework for cluster operations requires the ability to durably save state in a write ahead log. When using a version of Apache Hadoop Common’s filesystem API that supports checking on the availability of needed calls, HBase will proactively abort the cluster if it finds it can’t operate safely.

http://www.cndba.cn/dave/article/3321
http://www.cndba.cn/dave/article/3321

asyncfs: The default. New since hbase-2.0.0 (HBASE-15536, HBASE-14790). This AsyncFSWAL provider, as it identifies itself in RegionServer logs, is built on a new non-blocking dfsclient implementation. It is currently resident in the hbase codebase but intent is to move it back up into HDFS itself. WALs edits are written concurrently (“fan-out”) style to each of the WAL-block replicas on each DataNode rather than in a chained pipeline as the default client does. Latencies should be better. See Apache HBase Improements and Practices at Xiaomi at slide 14 onward for more detail on implementation.http://www.cndba.cn/dave/article/3321

我们测试环境里用的是HBase 2.1.3, 所以这里虽然是集群环境,也直接将该参数设置false,然后重启Hbase Master,恢复正常。 或者使用版本小于2.0.0的HBase,也可以避免出现这种错误。http://www.cndba.cn/dave/article/3321

版权声明:本文为博主原创文章,未经博主允许不得转载。

用户评论
* 以下用户言论只代表其个人观点,不代表CNDBA社区的观点或立场
dave

dave

关注

人的一生应该是这样度过的:当他回首往事的时候,他不会因为虚度年华而悔恨,也不会因为碌碌无为而羞耻;这样,在临死的时候,他就能够说:“我的整个生命和全部精力,都已经献给世界上最壮丽的事业....."

  • 2261
    原创
  • 3
    翻译
  • 578
    转载
  • 191
    评论
  • 访问:7995707次
  • 积分:4346
  • 等级:核心会员
  • 排名:第1名
精华文章
    最新问题
    查看更多+
    热门文章
      热门用户
      推荐用户
        Copyright © 2016 All Rights Reserved. Powered by CNDBA · 皖ICP备2022006297号-1·

        QQ交流群

        注册联系QQ