Ask Your Question
0

Savanna with Nova-network with Grizlly

asked 2013-08-06 10:36:02 -0500

sarita18narwal gravatar image

I had deployed the hadoop cluster using Savanna API. After launching the cluster it remains in waiting state and after some time goes into Eror State while the nodes remains in active state.

Suppose i had 3 node(1 master and 2 slave) in cluster . When I launched the cluster using respective cluster template,the scenario is like this.

Cluster

Name State Instance Count Sample Error 3

And in Instances

Name IP State 1-master 10.0.0.X ACTIVE 2-slave 10.0.0.X ACTIVE 3-slave 10.0.0.X ACTIVE

Log Description: WARNING savanna.service.instances [-] Can't start cluster 'tstcluster' (reason: Unauthorized (HTTP 401))

ERROR root [-] Original exception being dropped: ['Traceback (most recent call last):\n', ' File "/usr/local/lib/python2.7/dist-packages/savanna/service/instances.py", line 38, in create_cluster\n _await_instances(cluster)\n', ' File "/usr/local/lib/python2.7/dist-packages/savanna/service/instances.py", line 206, in _await_instances\n if not _check_if_up(instance):\n', ' File "/usr/local/lib/python2.7/dist-packages/savanna/service/instances.py", line 215, in _check_if_up\n server = instance.nova_info\n', ' File "/usr/local/lib/python2.7/dist-packages/savanna/db/models.py", line 226, in nova_info\n return nova.client().servers.get(self.instance_id)\n', ' File "/usr/local/lib/python2.7/dist-packages/novaclient/v1_1/servers.py", line 350, in get\n return self._get("/servers/%s" % base.getid(server), "server")\n', ' File "/usr/local/lib/python2.7/dist-packages/novaclient/base.py", line 140, in _get\n _resp, body = self.api.client.get(url)\n', ' File "/usr/local/lib/python2.7/dist-packages/novaclient/client.py", line 230, in get\n return self._cs_request(url, \'GET\', *kwargs)\n', ' File "/usr/local/lib/python2.7/dist-packages/novaclient/client.py", line 227, in _cs_request\n raise e\n', 'Unauthorized: Unauthorized (HTTP 401)\n'] Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/eventlet/hubs/poll.py", line 97, in wait readers.get(fileno, noop).cb(fileno) File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 194, in main result = function(args, *kwargs) File "/usr/local/lib/python2.7/dist-packages/savanna/context.py", line 127, in wrapper func(args, *kwargs) File "/usr/local/lib/python2.7/dist-packages/savanna/service/api.py", line 111, in _provision_cluster i.create_cluster(cluster) File "/usr/local/lib/python2.7/dist-packages/savanna/service/instances.py", line 51, in create_cluster _rollback_cluster_creation(cluster, ex) File "/usr/local/lib/python2.7/dist-packages/savanna/service/instances.py", line 274, in _rollback_cluster_creation _shutdown_instances(cluster, True) File "/usr/local/lib/python2.7/dist-packages/savanna/service/instances.py", line 303, in _shutdown_instances _shutdown_instance(instance) File "/usr/local/lib/python2.7/dist-packages/savanna/service/instances.py", line 309, in _shutdown_instance nova.client().servers.delete(instance.instance_id) File "/usr/local/lib/python2.7/dist-packages/novaclient/v1_1/servers.py", line 630, in delete self._delete("/servers/%s" % base.getid(server)) File "/usr/local/lib/python2.7/dist-packages/novaclient/base.py", line 154, in _delete _resp, _body = self.api.client.delete(url) File "/usr/local/lib/python2.7/dist-packages/novaclient/client.py ... (more)

edit retag flag offensive close merge delete

29 answers

Sort by ยป oldest newest most voted
0

answered 2013-08-07 05:35:28 -0500

Hello Sarita,

Please make sure that your OpenStack environment configured correctly. To do this try to launch test instance (e.g. https://launchpad.net/cirros/ ) without Savanna Perhaps Nova can't communicate with Keystone because i see message about authorization problem in the traceback. In this case you need check credentials specified in the /etc/nova/nova-api.ini

edit flag offensive delete link more
0

answered 2013-08-08 10:30:21 -0500

sarita18narwal gravatar image

Hii Alexander Rubtsov,

OpenStack environment is configured correctly. I had checked it once again . I had also launched a test instance successfully without Savanna. I had also gone through the credentials specified in /etc/nova/nova-api.ini.

But still i am facing the same problem i.e. Cluster remains in waiting state for some hours and then changed into Error with two launched instance :master and slave.

edit flag offensive delete link more
0

answered 2013-08-08 12:06:48 -0500

Sarita,

Please attach full Savanna log. In order to create it launch savanna-api with flag "--log-file <path>" Also let's check state of instances. Attach the output of command "nova console-log <instance_id>" after cluster goes to error state.

edit flag offensive delete link more
0

answered 2013-08-12 04:33:26 -0500

sarita18narwal gravatar image

The state of instances are active but cluster state goes in error state.

The savanna log is as follows as:

127.0.0.1 - - [08/Aug/2013 20:18:30] "GET /v1.0/fd6e0af3983444bbaa41124740f373d9/clusters/1935935b-b8a2-4525-b2ed-000baf723c22 HTTP/1.1" 200 1790 0.012766 (23643) accepted ('127.0.0.1', 59464) 127.0.0.1 - - [08/Aug/2013 20:19:01] "GET /v1.0/fd6e0af3983444bbaa41124740f373d9/clusters/1935935b-b8a2-4525-b2ed-000baf723c22 HTTP/1.1" 200 1790 0.040173 (23643) accepted ('127.0.0.1', 59781) 127.0.0.1 - - [08/Aug/2013 20:19:31] "GET /v1.0/fd6e0af3983444bbaa41124740f373d9/clusters/1935935b-b8a2-4525-b2ed-000baf723c22 HTTP/1.1" 200 1790 0.034558 (23643) accepted ('127.0.0.1', 60114) 127.0.0.1 - - [08/Aug/2013 20:20:02] "GET /v1.0/fd6e0af3983444bbaa41124740f373d9/clusters/1935935b-b8a2-4525-b2ed-000baf723c22 HTTP/1.1" 200 1790 0.018801 (23643) accepted ('127.0.0.1', 60432) 127.0.0.1 - - [08/Aug/2013 20:20:32] "GET /v1.0/fd6e0af3983444bbaa41124740f373d9/clusters/1935935b-b8a2-4525-b2ed-000baf723c22 HTTP/1.1" 200 1790 0.025333 (23643) accepted ('127.0.0.1', 60756) 127.0.0.1 - - [08/Aug/2013 20:21:02] "GET /v1.0/fd6e0af3983444bbaa41124740f373d9/clusters/1935935b-b8a2-4525-b2ed-000baf723c22 HTTP/1.1" 200 1790 0.015351 (23643) accepted ('127.0.0.1', 32849) 127.0.0.1 - - [08/Aug/2013 20:21:33] "GET /v1.0/fd6e0af3983444bbaa41124740f373d9/clusters/1935935b-b8a2-4525-b2ed-000baf723c22 HTTP/1.1" 200 1790 0.018000 (23643) accepted ('127.0.0.1', 33179) 127.0.0.1 - - [08/Aug/2013 20:22:04] "GET /v1.0/fd6e0af3983444bbaa41124740f373d9/clusters/1935935b-b8a2-4525-b2ed-000baf723c22 HTTP/1.1" 200 1790 0.015437 (23643) accepted ('127.0.0.1', 33501) Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/eventlet/greenpool.py", line 80, in _spawn_n_impl func(args, *kwargs) File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 584, in process_request proto.__init__(socket, address, self) File "/usr/lib/python2.7/SocketServer.py", line 638, in __init__ self.handle() File "/usr/lib/python2.7/BaseHTTPServer.py", line 340, in handle self.handle_one_request() File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 226, in handle_one_request self.raw_requestline = self.rfile.readline(self.server.url_length_limit) File "/usr/lib/python2.7/socket.py", line 476, in readline data = self._sock.recv(self._rbufsize) File "/usr/local/lib/python2.7/dist-packages/eventlet/greenio.py", line 262, in recv timeout_exc=socket.timeout("timed out")) File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/__init__.py", line 151, in trampoline listener = hub.add(hub.READ, fileno, current.switch) File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/epolls.py", line 48, in add listener = BaseHub.add(self, evtype, fileno, cb) File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 126, in add evtype, fileno, evtype)) RuntimeError: Second simultaneous read on fileno 23 detected. Unless you really know what you're doing, make sure that only one greenthread can read any particular socket. Consider using a pools.Pool. If you do know what you're doing and want to disable this error, call eventlet.debug.hub_prevent_multiple_readers(False) 2013-08-09 ... (more)

edit flag offensive delete link more
0

answered 2013-08-12 09:31:09 -0500

Sarita,

Did you install Savanna in a separate python virtual environment? In order to do that run: "cd ~; virtualenv savanna-venv" This will install new virtual environment into savanna-venv directory in your home directory.

Then you can install Savanna inside this virtual environment: "savanna-venv/bin/pip install savanna" (for more information please visit http://savanna.readthedocs.org/en/latest/userdoc/installation.guide.html (http://savanna.readthedocs.org/en/lat...) )

After that try to create cluster again

Also, the "ACTIVE" state sets before the instance is completely launched. Therefore need the output of "nova console-log <instance_id>" after cluster goes to error state

edit flag offensive delete link more
0

answered 2013-08-12 09:50:58 -0500

sarita18narwal gravatar image

Alexander Rubtsov ,

Yes, I had installed Savanna in a separate python virtual environment using the same reference :http://savanna.readthedocs.org/en/latest/userdoc/installation.guide.html

Cluster State remains in Waiting state after launching the instance and after 1 day it will go into error state. So I will be unable to give you the output of "nova console-log <instance_id>" tomorrow when cluster goes to error state.

edit flag offensive delete link more
0

answered 2013-08-12 12:02:33 -0500

sarita18narwal gravatar image

Alexander Rubtsov ,

The Cluster state switches directly from Spawning to Waiting.

I can update you about the launched active state instance console-log,if you want.

Sorry but I'll be only able to give you the output of "nova console-log <instance_id>" tomorrow when cluster goes to error state.

edit flag offensive delete link more
0

answered 2013-08-12 13:19:26 -0500

I suppose that the creation of a cluster ends with error "Unauthorized" and exactly after 1 day, due to the fact that the token provided by Keystone expires (validity of the token by default exactly 24 hours).

To test that next time you can before attempting to create a cluster temporarily decrease parameter value "expiration" in the /etc/keystone/keystone.conf

If this assumption is correct, then you will not forced to wait so long every time

edit flag offensive delete link more
0

answered 2013-08-13 08:56:05 -0500

sarita18narwal gravatar image

I had decrease the value of expiration from 86400 to 864 in the /etc/keystone/keystone.conf. Then database synchronization and start all the nova services and apache2 server.

This all did not helped me out. My cluster is still in waiting state from last 4 hour. My nova console-log for active instance (after cluster goes into error state) is as follows as:

Instance1: <-------------------------------------------------------------------------> .............................................................................. [ 0.407413] pnp: PnP ACPI: found 8 devices [ 0.408832] ACPI: ACPI bus type pnp unregistered [ 0.446954] NET: Registered protocol family 2 [ 0.448505] IP route cache hash table entries: 4096 (order: 3, 32768 bytes) [ 0.450843] TCP established hash table entries: 16384 (order: 6, 262144 bytes) [ 0.453757] TCP bind hash table entries: 16384 (order: 6, 262144 bytes) [ 0.456131] TCP: Hash tables configured (established 16384 bind 16384) [ 0.459077] TCP: reno registered [ 0.460269] UDP hash table entries: 256 (order: 1, 8192 bytes) [ 0.462055] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes) [ 0.464050] NET: Registered protocol family 1 [ 0.465487] pci 0000:00:00.0: Limiting direct PCI/PCI transfers [ 0.467332] pci 0000:00:01.0: PIIX3: Enabling Passive Release [ 0.469206] pci 0000:00:01.0: Activating ISA DMA hang workarounds [ 0.471212] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11 [ 0.473788] audit: initializing netlink socket (disabled) [ 0.475512] type=2000 audit(1376301101.472:1): initialized [ 0.477389] Trying to unpack rootfs image as initramfs... [ 0.529422] HugeTLB registered 2 MB page size, pre-allocated 0 pages [ 0.533234] VFS: Disk quotas dquot_6.5.2 [ 0.534622] Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.544243] fuse init (API version 7.19) [ 0.545667] msgmni has been set to 939 [ 0.560223] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252) [ 0.562686] io scheduler noop registered [ 0.563987] io scheduler deadline registered (default) [ 0.572293] io scheduler cfq registered [ 0.573729] pci_hotplug: PCI Hot Plug PCI Core version: 0.5 [ 0.575454] pciehp: PCI Express Hot Plug Controller Driver version: 0.4 [ 0.577614] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 [ 0.580044] ACPI: Power Button [PWRF] [ 0.583208] GHES: HEST is not enabled! [ 0.592651] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10 [ 0.600195] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled [ 0.624673] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A [ 0.674575] serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A [ 0.734591] 00:05: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A [ 0.779096] 00:06: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A [ 0.804373] Linux agpgart interface v0.103 [ 0.813653] brd: module loaded [ 0.815867] loop: module loaded [ 0.884614] vda: vda1 [ 0.912109] scsi0 : ata_piix [ 0.913322] scsi1 : ata_piix [ 0.914465] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc180 irq 14 [ 0.916462] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc188 irq 15 ... (more)

edit flag offensive delete link more
0

answered 2013-08-13 10:25:09 -0500

Sarita,

At first, please describe what image do you use. Did you build it or download (if you downloaded it - please give link)? Can you connect via ssh from the host on which the Savanna installed to an instance by fixed ip address (for example for instance, that you showed is 10.1.1.4)? Also attach Savanna log in the DEBUG mode. In order to create it launch savanna-api with flags "--log-file <path> -d"

To apply the settings expiration time, service keystone should be restarted.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2013-08-06 10:36:02 -0500

Seen: 246 times

Last updated: Nov 04 '13