Ask Your Question
0

Sahara cluster failed to start - Operation timed out after 300 second(s)

asked 2014-12-02 16:32:08 -0500

belle gravatar image
Redhat 7.0
Openstack Juno
Sahara version that comes with RedHat 7.0

When starting a sahara cluster all instances , master and worker nodes get started without problem but cluster startup stays a long time in "Starting" state then it times out and fails. Besides the error message I am getting on the sahara.log below, I also saw some error messages on nova-scheduler.log and this only happens when I start a sahara cluster, I don't see it when launching an instance, don't know if this is related to the issue I having with Sahara, but can't figure out what is causing this disk space error when launching a cluster:

nova.scheduler.host_manager [req-ef4b107d-8575-4989-a3a4-3b569fa7a7cb None] Host has more disk space than database expected (1391gb > 1346gb)

From /var/log/sahara/sahara.log

ERROR sahara.service.ops [-] Error during operating cluster 'test-cluster' (reason: Operation timed out after 300 second(s))                  
TRACE sahara.service.ops Traceback (most recent call last):                                                                                   
TRACE sahara.service.ops   File "/usr/lib/python2.7/site-packages/sahara/service/ops.py", line 113, in wrapper                                
TRACE sahara.service.ops     f(cluster_id, *args, **kwds)                                                                                     
TRACE sahara.service.ops   File "/usr/lib/python2.7/site-packages/sahara/service/ops.py", line 206, in _provision_cluster                     
TRACE sahara.service.ops     plugin.start_cluster(cluster)                                                                                    
TRACE sahara.service.ops   File "/usr/lib/python2.7/site-packages/sahara/plugins/vanilla/plugin.py", line 52, in start_cluster                
TRACE sahara.service.ops     cluster.hadoop_version).start_cluster(cluster)                                                                   
TRACE sahara.service.ops   File "/usr/lib/python2.7/site-packages/sahara/plugins/vanilla/v1_2_1/versionhandler.py", line 131, in start_cluster
TRACE sahara.service.ops     run.oozie_share_lib(r, nn_instance.hostname())                                                                   
TRACE sahara.service.ops   File "/usr/lib/python2.7/site-packages/sahara/plugins/vanilla/v1_2_1/run_scripts.py", line 60, in oozie_share_lib  
TRACE sahara.service.ops     'sudo su - -c "mkdir /tmp/oozielib && '                                                                          
TRACE sahara.service.ops   File "/usr/lib/python2.7/site-packages/sahara/utils/ssh_remote.py", line 411, in execute_command                   
TRACE sahara.service.ops     get_stderr, raise_when_error)                                                                                    
TRACE sahara.service.ops   File "/usr/lib/python2.7/site-packages/sahara/utils/ssh_remote.py", line 480, in _run_s                            
TRACE sahara.service.ops     return self._run_with_log(func, timeout, *args, **kwargs)                                                        
TRACE sahara.service.ops   File "/usr/lib/python2.7/site-packages/sahara/utils/ssh_remote.py", line 368, in _run_with_log                     
TRACE sahara.service.ops     return self._run(func, *args, **kwargs)                                                                          
TRACE sahara.service.ops   File "/usr/lib/python2.7/site-packages/sahara/utils/ssh_remote.py", line 477, in _run                              
TRACE sahara.service.ops     return procutils.run_in_subprocess(self.proc, func, args, kwargs)                                                
TRACE sahara.service.ops   File "/usr/lib/python2.7/site-packages/sahara/utils/procutils.py", line 49, in run_in_subprocess                   
TRACE sahara.service.ops     result = pickle.load(proc.stdout)                                                                                
TRACE sahara.service.ops   File "/usr/lib64/python2.7/pickle.py", line 1378, in load                                                          
TRACE sahara.service.ops     return Unpickler(file).load()                                                                                    
TRACE sahara.service.ops   File "/usr/lib64/python2.7/pickle.py", line 857, in load                                                           
TRACE sahara.service.ops     key = read(1)                                                                                                    
TRACE sahara.service.ops   File "/usr/lib64/python2.7/socket.py", line 380, in read                                                           
TRACE sahara.service.ops ...
(more)
edit retag flag offensive close merge delete

1 answer

Sort by » oldest newest most voted
0

answered 2019-12-13 01:33:33 -0500

nicholas666 gravatar image

ssh_remote VM(instance) to execute 'sudo su - -c "mkdir /tmp/oozielib && ' time out。 It's seems to be a need for communication between the management network and the business network. or use floating ip can perform ssh_remote execute。

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2014-12-02 16:32:08 -0500

Seen: 412 times

Last updated: Dec 02 '14