Hadoop HA HDFS启动错误之org.apache.hadoop.ipc.Client: Re

网上有很多HA的博文,其实比较好的博文就是官方文档,讲的已经非常详细。所以,HA的搭建这里不再赘述。
本文就想给出一篇org.apache.hadoop.ipc.Client: Retrying connect to server错误的解决的方法。
因为在搜索引擎中输入了错误问题,没有找到一篇解决问题的。这里写一篇备忘,也可以给出现同样问题的朋友一个提示。

一、问题描述
HA按照规划配置好,启动后,NameNode不能正常启动。刚启动的时候 jps 看到了NameNode,但是隔了一两分钟,再看NameNode就不见了。
但是测试之后,发现下面2种情况:
1)先启动JournalNode,再启动Hdfs,NameNode可以启动并可以正常运行
2)使用start-dfs.sh启动,众多服务都启动了,隔两分钟NameNode会退出,再次hadoop-daemon.sh start namenode单独启动可以成功稳定运行NameNode。

再看NameNode的日志,不要嫌日志长,其实出错的蛛丝马迹都包含其中了,如下:
2016-03-09 10:50:27,123 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:  host = node1/192.168.56.201
STARTUP_MSG:  args = []
STARTUP_MSG:  version = 2.5.1
STARTUP_MSG:  build = Unknown -r Unknown; compiled by 'root' on 2014-10-20T05:53Z
STARTUP_MSG:  java = 1.7.0_09
************************************************************/
2016-03-09 10:50:27,132 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2016-03-09 10:50:27,138 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: createNameNode []
2016-03-09 10:50:27,465 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2016-03-09 10:50:27,623 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2016-03-09 10:50:27,623 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started
2016-03-09 10:50:27,625 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: fs.defaultFS is hdfs://hadoopha
2016-03-09 10:50:27,626 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Clients are to use hadoopha to access this namenode/service.
2016-03-09 10:50:28,048 INFO org.apache.hadoop.hdfs.DFSUtil: Starting web server as: ${dfs.web.authentication.kerberos.principal}
2016-03-09 10:50:28,048 INFO org.apache.hadoop.hdfs.DFSUtil: Starting Web-server for hdfs at: :50070
2016-03-09 10:50:28,121 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2016-03-09 10:50:28,128 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.namenode is not defined
2016-03-09 10:50:28,145 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2016-03-09 10:50:28,149 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context hdfs
2016-03-09 10:50:28,149 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2016-03-09 10:50:28,149 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2016-03-09 10:50:28,209 INFO org.apache.hadoop.http.HttpServer2: Added filter 'org.apache.hadoop.hdfs.web.AuthFilter' (class=org.apache.hadoop.hdfs.web.AuthFilter)
2016-03-09 10:50:28,211 INFO org.apache.hadoop.http.HttpServer2: addJerseyResourcePackage: packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources, pathSpec=/webhdfs/v1/*
2016-03-09 10:50:28,268 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 50070
2016-03-09 10:50:28,269 INFO org.mortbay.log: jetty-6.1.26
2016-03-09 10:50:28,580 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: 'signature.secret' configuration not set, using a random value as secret
2016-03-09 10:50:28,648 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@node1:50070
2016-03-09 10:50:28,687 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Only one image storage directory (dfs.namenode.name.dir) configured. Beware of data loss due to lack of redundant storage directories!
2016-03-09 10:50:28,741 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsLock is fair:true
2016-03-09 10:50:28,802 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
2016-03-09 10:50:28,802 INFO org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
2016-03-09 10:50:28,805 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
2016-03-09 10:50:28,807 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: The block deletion will start around 2016 Mar 09 10:50:28
2016-03-09 10:50:28,810 INFO org.apache.hadoop.util.GSet: Computing capacity for map BlocksMap
2016-03-09 10:50:28,810 INFO org.apache.hadoop.util.GSet: VM type      = 64-bit
2016-03-09 10:50:28,813 INFO org.apache.hadoop.util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
2016-03-09 10:50:28,813 INFO org.apache.hadoop.util.GSet: capacity      = 2^21 = 2097152 entries
2016-03-09 10:50:28,852 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: dfs.block.access.token.enable=false
2016-03-09 10:50:28,852 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: defaultReplication        = 3
2016-03-09 10:50:28,852 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxReplication            = 512
2016-03-09 10:50:28,852 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: minReplication            = 1
2016-03-09 10:50:28,853 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxReplicationStreams      = 2
2016-03-09 10:50:28,853 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
2016-03-09 10:50:28,853 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: replicationRecheckInterval = 3000
2016-03-09 10:50:28,853 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: encryptDataTransfer        = false
2016-03-09 10:50:28,853 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
2016-03-09 10:50:28,859 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner            = hadoop (auth:SIMPLE)
2016-03-09 10:50:28,859 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup          = supergroup
2016-03-09 10:50:28,859 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled = true
2016-03-09 10:50:28,865 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Determined nameservice ID: hadoopha
2016-03-09 10:50:28,865 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: HA Enabled: true
2016-03-09 10:50:28,866 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Append Enabled: true
2016-03-09 10:50:29,120 INFO org.apache.hadoop.util.GSet: Computing capacity for map INodeMap
2016-03-09 10:50:29,120 INFO org.apache.hadoop.util.GSet: VM type      = 64-bit
2016-03-09 10:50:29,120 INFO org.apache.hadoop.util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
2016-03-09 10:50:29,120 INFO org.apache.hadoop.util.GSet: capacity      = 2^20 = 1048576 entries
2016-03-09 10:50:29,174 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
2016-03-09 10:50:29,186 INFO org.apache.hadoop.util.GSet: Computing capacity for map cachedBlocks
2016-03-09 10:50:29,186 INFO org.apache.hadoop.util.GSet: VM type      = 64-bit
2016-03-09 10:50:29,186 INFO org.apache.hadoop.util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
2016-03-09 10:50:29,186 INFO org.apache.hadoop.util.GSet: capacity      = 2^18 = 262144 entries
2016-03-09 10:50:29,188 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
2016-03-09 10:50:29,188 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
2016-03-09 10:50:29,188 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.namenode.safemode.extension    = 30000
2016-03-09 10:50:29,190 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Retry cache on namenode is enabled
2016-03-09 10:50:29,190 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2016-03-09 10:50:29,194 INFO org.apache.hadoop.util.GSet: Computing capacity for map NameNodeRetryCache
2016-03-09 10:50:29,194 INFO org.apache.hadoop.util.GSet: VM type      = 64-bit
2016-03-09 10:50:29,194 INFO org.apache.hadoop.util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
2016-03-09 10:50:29,194 INFO org.apache.hadoop.util.GSet: capacity      = 2^15 = 32768 entries
2016-03-09 10:50:29,199 INFO org.apache.hadoop.hdfs.server.namenode.NNConf: ACLs enabled? false
2016-03-09 10:50:29,199 INFO org.apache.hadoop.hdfs.server.namenode.NNConf: XAttrs enabled? true
2016-03-09 10:50:29,199 INFO org.apache.hadoop.hdfs.server.namenode.NNConf: Maximum size of an xattr: 16384
2016-03-09 10:50:29,208 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /home/hadoop/hadoop/tmp/dfs/name/in_use.lock acquired by nodename 4394@node1
2016-03-09 10:50:29,610 WARN org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded
2016-03-09 10:50:31,053 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node2/192.168.56.202:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-03-09 10:50:31,054 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node3/192.168.56.203:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-03-09 10:50:31,054 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node4/192.168.56.204:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2016-03-09 10:50:32,055 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node2/192.168.56.202:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
此处省去重复的N行
2016-03-09 10:50:35,807 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
此处省去重复的N行
2016-03-09 10:50:39,812 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 10006 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.

内容版权声明:除非注明,否则皆为本站原创文章。

转载注明出处:https://www.heiqu.com/9da90ef403f81ad77edac621cdf11b52.html