HBase Master 启动 check the config value of 'hbase.procedure.store.wal.use.hsync' 解决方法 -- cnDBA.cn

搭建HBase 集群，执行启动命令后，住HMaster 进程无法启动，只有back-Masters配置中的可以启动。

查看日志如下：http://www.cndba.cn/dave/article/3321 http://www.cndba.cn/dave/article/3321

2019-03-05 20:37:56,379 INFO  [master/hadoopslave1:16000:becomeActiveMaster] coordination.SplitLogManagerCoordination: Found 0 orphan tasks and 0 rescan nodes
2019-03-05 20:37:56,456 INFO  [ReadOnlyZKClient-hadoopmaster:2181,hadoopslave1:2181,hadoopslave2:2181@0x76359dbf] zookeeper.ZooKeeper: Initiating client connection, connectString=hadoopmaster:2181,hadoopslave1:2181,hadoopslave2:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$79/479258815@59927549
2019-03-05 20:37:56,459 INFO  [ReadOnlyZKClient-hadoopmaster:2181,hadoopslave1:2181,hadoopslave2:2181@0x76359dbf-SendThread(hadoopslave1:2181)] zookeeper.ClientCnxn: Opening socket connection to server hadoopslave1/192.168.20.81:2181. Will not attempt to authenticate using SASL (unknown error)
2019-03-05 20:37:56,459 INFO  [ReadOnlyZKClient-hadoopmaster:2181,hadoopslave1:2181,hadoopslave2:2181@0x76359dbf-SendThread(hadoopslave1:2181)] zookeeper.ClientCnxn: Socket connection established to hadoopslave1/192.168.20.81:2181, initiating session
2019-03-05 20:37:56,465 INFO  [ReadOnlyZKClient-hadoopmaster:2181,hadoopslave1:2181,hadoopslave2:2181@0x76359dbf-SendThread(hadoopslave1:2181)] zookeeper.ClientCnxn: Session establishment complete on server hadoopslave1/192.168.20.81:2181, sessionid = 0x20011e3dcbb000a, negotiated timeout = 40000
2019-03-05 20:37:56,525 INFO  [master/hadoopslave1:16000:becomeActiveMaster] procedure2.ProcedureExecutor: Starting 16 core workers (bigger of cpus/4 or 16) with max (burst) worker count=160, start 1 urgent thread(s)
2019-03-05 20:37:56,545 ERROR [master/hadoopslave1:16000:becomeActiveMaster] master.HMaster: Failed to become active master
java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
    at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:1086)
    at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:423)
    at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:611)
    at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1458)
    at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:890)
    at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2272)
    at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:581)
    at java.lang.Thread.run(Thread.java:748)
2019-03-05 20:37:56,547 ERROR [master/hadoopslave1:16000:becomeActiveMaster] master.HMaster: ***** ABORTING master hadoopslave1,16000,1551789470904: Unhandled exception. Starting shutdown. *****
java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.
    at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.rollWriter(WALProcedureStore.java:1086)
    at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.recoverLease(WALProcedureStore.java:423)
    at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:611)
    at org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1458)
    at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:890)
    at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2272)
    at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:581)
    at java.lang.Thread.run(Thread.java:748)
2019-03-05 20:37:56,547 INFO  [master/hadoopslave1:16000:becomeActiveMaster] regionserver.HRegionServer: ***** STOPPING region server 'hadoopslave1,16000,1551789470904' *****
2019-03-05 20:37:56,547 INFO  [master/hadoopslave1:16000:becomeActiveMaster] regionserver.HRegionServer: STOPPED: Stopped by master/hadoopslave1:16000:becomeActiveMaster
2019-03-05 20:37:56,967 INFO  [master/hadoopslave1:16000] ipc.NettyRpcServer: Stopping server on /192.168.20.81:16000
2019-03-05 20:37:56,982 INFO  [master/hadoopslave1:16000] regionserver.HRegionServer: Stopping infoServer

这里显示了主进程异常退出的原因：http://www.cndba.cn/dave/article/3321

java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.

解决方法：
一种方法是在hbase-site.xml配置文件里增加如下内容:

<property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
    <description>
        Controls whether HBase will check for stream capabilities (hflush/hsync).
        Disable this if you intend to run on LocalFileSystem, denoted by a rootdir
        with the 'file://' scheme, but be mindful of the NOTE below.
        WARNING: Setting this to false blinds you to potential data loss and
        inconsistent system state in the event of process and/or node failures. If
        HBase is complaining of an inability to use hsync or hflush it's most
        likely not a false positive.
    </description>
</property>

hbase.unsafe.stream.capability.enforce：使用本地文件系统设置为false，使用hdfs设置为true。但根据HBase 官方手册的说明：HBase 从2.0.0 开始默认使用的是asyncfs。http://www.cndba.cn/dave/article/3321

137.1.3. Master fails to become active due to lack of hsync for filesystem
HBase’s internal framework for cluster operations requires the ability to durably save state in a write ahead log. When using a version of Apache Hadoop Common’s filesystem API that supports checking on the availability of needed calls, HBase will proactively abort the cluster if it finds it can’t operate safely.

asyncfs: The default. New since hbase-2.0.0 (HBASE-15536, HBASE-14790). This AsyncFSWAL provider, as it identifies itself in RegionServer logs, is built on a new non-blocking dfsclient implementation. It is currently resident in the hbase codebase but intent is to move it back up into HDFS itself. WALs edits are written concurrently (“fan-out”) style to each of the WAL-block replicas on each DataNode rather than in a chained pipeline as the default client does. Latencies should be better. See Apache HBase Improements and Practices at Xiaomi at slide 14 onward for more detail on implementation.http://www.cndba.cn/dave/article/3321http://www.cndba.cn/dave/article/3321

我们测试环境里用的是HBase 2.1.3, 所以这里虽然是集群环境，也直接将该参数设置false，然后重启Hbase Master，恢复正常。或者使用版本小于2.0.0的HBase，也可以避免出现这种错误。http://www.cndba.cn/dave/article/3321

签到成功

CNDBA社区

HBase Master 启动 check the config value of 'hbase.procedure.store.wal.use.hsync' 解决方法

dave

QQ交流群

注册联系QQ