Ask Your Question
0

Savanna with Nova-network with Grizlly

asked 2013-08-06 10:36:02 -0500

sarita18narwal gravatar image

I had deployed the hadoop cluster using Savanna API. After launching the cluster it remains in waiting state and after some time goes into Eror State while the nodes remains in active state.

Suppose i had 3 node(1 master and 2 slave) in cluster . When I launched the cluster using respective cluster template,the scenario is like this.

Cluster

Name State Instance Count Sample Error 3

And in Instances

Name IP State 1-master 10.0.0.X ACTIVE 2-slave 10.0.0.X ACTIVE 3-slave 10.0.0.X ACTIVE

Log Description: WARNING savanna.service.instances [-] Can't start cluster 'tstcluster' (reason: Unauthorized (HTTP 401))

ERROR root [-] Original exception being dropped: ['Traceback (most recent call last):\n', ' File "/usr/local/lib/python2.7/dist-packages/savanna/service/instances.py", line 38, in create_cluster\n _await_instances(cluster)\n', ' File "/usr/local/lib/python2.7/dist-packages/savanna/service/instances.py", line 206, in _await_instances\n if not _check_if_up(instance):\n', ' File "/usr/local/lib/python2.7/dist-packages/savanna/service/instances.py", line 215, in _check_if_up\n server = instance.nova_info\n', ' File "/usr/local/lib/python2.7/dist-packages/savanna/db/models.py", line 226, in nova_info\n return nova.client().servers.get(self.instance_id)\n', ' File "/usr/local/lib/python2.7/dist-packages/novaclient/v1_1/servers.py", line 350, in get\n return self._get("/servers/%s" % base.getid(server), "server")\n', ' File "/usr/local/lib/python2.7/dist-packages/novaclient/base.py", line 140, in _get\n _resp, body = self.api.client.get(url)\n', ' File "/usr/local/lib/python2.7/dist-packages/novaclient/client.py", line 230, in get\n return self._cs_request(url, \'GET\', *kwargs)\n', ' File "/usr/local/lib/python2.7/dist-packages/novaclient/client.py", line 227, in _cs_request\n raise e\n', 'Unauthorized: Unauthorized (HTTP 401)\n'] Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/eventlet/hubs/poll.py", line 97, in wait readers.get(fileno, noop).cb(fileno) File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 194, in main result = function(args, *kwargs) File "/usr/local/lib/python2.7/dist-packages/savanna/context.py", line 127, in wrapper func(args, *kwargs) File "/usr/local/lib/python2.7/dist-packages/savanna/service/api.py", line 111, in _provision_cluster i.create_cluster(cluster) File "/usr/local/lib/python2.7/dist-packages/savanna/service/instances.py", line 51, in create_cluster _rollback_cluster_creation(cluster, ex) File "/usr/local/lib/python2.7/dist-packages/savanna/service/instances.py", line 274, in _rollback_cluster_creation _shutdown_instances(cluster, True) File "/usr/local/lib/python2.7/dist-packages/savanna/service/instances.py", line 303, in _shutdown_instances _shutdown_instance(instance) File "/usr/local/lib/python2.7/dist-packages/savanna/service/instances.py", line 309, in _shutdown_instance nova.client().servers.delete(instance.instance_id) File "/usr/local/lib/python2.7/dist-packages/novaclient/v1_1/servers.py", line 630, in delete self._delete("/servers/%s" % base.getid(server)) File "/usr/local/lib/python2.7/dist-packages/novaclient/base.py", line 154, in _delete _resp, _body = self.api.client.delete(url) File "/usr/local/lib/python2.7/dist-packages/novaclient/client.py ... (more)

edit retag flag offensive close merge delete

29 answers

Sort by ยป oldest newest most voted
0

answered 2013-11-04 17:34:33 -0500

alazarev gravatar image

dikshith,

According to config you are using neutron (use_neutron=True) and doesn't use floating IPs (use_floating_ips=False). In this case host with Savanna need to have direct access to VMs via private network (usually it has not). You can always see IP address Savanna trying to use in cluster details (management IP column). Please check that you can ssh to VMs from the host with Savanna after VMs started up.

What version of savanna do you use? Master branch contains fix with netns proxy support ( https://review.openstack.org/#/c/52997/ ). Enabling it (use_namespaces=True) could help you with the issue.

edit flag offensive delete link more
0

answered 2013-11-04 13:54:21 -0500

I am facing the same problem . I am using savanna 0.3 and neutron network. The cluster is in waiting state for 24 hours after which the keystone token is invalid. Sometimes i can log in to the instances. But HDFS is not working in the system. Below is the hadoop report log hadoop dfsadmin -report report: FileSystem file:/// is not a distributed file system Usage: java DFSAdmin [-report]

Here is my savanna configuration file

host="localhost" port=8386

Address and credentials that will be used to check auth tokens

os_auth_host=10.2.1.3 os_auth_port=5000 os_admin_username=admin os_admin_password=*** os_admin_tenant_name=admin

use_floating_ips=False

use_neutron=True

debug=true

verbose=true

plugins=vanilla,hdp

[plugin:vanilla] plugin_class=savanna.plugins.vanilla.plugin:VanillaProvider

[plugin:hdp] plugin_class=savanna.plugins.hdp.ambariplugin:AmbariPlugin

[database] connection=sqlite:///savanna.sqlite

edit flag offensive delete link more
0

answered 2013-08-16 08:39:11 -0500

sarita18narwal gravatar image

What to do if I have to use floating ip as managed ip .

when I have mention the option using_floating_ips=True,cluster goes into waiting state. So what to do for that to get rid out of it ?

edit flag offensive delete link more
0

answered 2013-08-16 06:39:36 -0500

Sarita,

You can access to instances using ssh-key that you specified when you create a cluster. Using Swift is optional. You can don't use it if you want.

edit flag offensive delete link more
0

answered 2013-08-16 06:26:57 -0500

sarita18narwal gravatar image

Alexander,

I did the changes in savanna.conf (use_floating_ips=False) and launch a new cluster named - Cluster-2.

Now my cluster is in ACTIVE State.

Thanks a lot for patience and helping me out . :) :) :) :) :)

Can u please help me about the access of the instances. What is the password of that image for username ubuntu i.e. http://savanna-files.mirantis.com/savanna-0.2-vanilla-1.1.2-ubuntu-12.10.qcow2 (http://savanna-files.mirantis.com/sav...)

and I am also unable to access the instance when i am using Web UI for MAP reduce and HDFS.

Is it necessary to use swift.?

edit flag offensive delete link more
0

answered 2013-08-16 03:35:50 -0500

sarita18narwal gravatar image

Yes,the cluster created through REST was in waiting state.

I had created successfully one instance before the cluster launch. When I created my new cluster using REST having 1 master and 1 worker node ,, The master node and worker node launched successfully as instances with internal IP and cluster went into waiting state. I again launched an instance successfully.

Thus I have 4 successfully launched instances in active state.

1)before cluster 2 & 3) Cluster---> master node and slave node 4)after cluster goes into waiting state.

I created instances before and after cluster just to check whether my nova is contacting to keystone or not .

nova console-log of cluster-- master node is as follows as --->

//////////////////////////////////////////////////////////////////////////////// [......

[ 0.526863] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none [ 0.528036] vgaarb: loaded [ 0.529064] vgaarb: bridge control possible 0000:00:02.0 [ 0.532400] SCSI subsystem initialized [ 0.534088] ACPI: bus type usb registered [ 0.535480] usbcore: registered new interface driver usbfs [ 0.536051] usbcore: registered new interface driver hub [ 0.537928] usbcore: registered new device driver usb [ 0.540386] PCI: Using ACPI for IRQ routing [ 0.542470] NetLabel: Initializing [ 0.544043] NetLabel: domain hash size = 128 [ 0.545438] NetLabel: protocols = UNLABELED CIPSOv4 [ 0.547009] NetLabel: unlabeled traffic allowed by default [ 0.548201] HPET: 3 timers in total, 0 timers will be used for per-cpu timer [ 0.550288] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 [ 0.553013] hpet0: 3 comparators, 64-bit 100.000000 MHz counter [ 0.560122] Switching to clocksource kvm-clock [ 0.579695] AppArmor: AppArmor Filesystem Enabled [ 0.970489] pnp: PnP ACPI init [ 0.971648] ACPI: bus type pnp registered [ 0.974063] pnp: PnP ACPI: found 8 devices [ 0.975446] ACPI: ACPI bus type pnp unregistered [ 0.983793] NET: Registered protocol family 2 [ 0.985417] IP route cache hash table entries: 65536 (order: 7, 524288 bytes) [ 0.989392] TCP established hash table entries: 262144 (order: 10, 4194304 bytes) [ 0.998635] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) [ 1.003642] TCP: Hash tables configured (established 262144 bind 65536) [ 1.005719] TCP: reno registered [ 1.006884] UDP hash table entries: 1024 (order: 3, 32768 bytes) [ 1.008784] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes) [ 1.010837] NET: Registered protocol family 1 [ 1.012308] pci 0000:00:00.0: Limiting direct PCI/PCI transfers [ 1.014125] pci 0000:00:01.0: PIIX3: Enabling Passive Release [ 1.015896] pci 0000:00:01.0: Activating ISA DMA hang workarounds [ 1.017977] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11 [ 1.033502] audit: initializing netlink socket (disabled) [ 1.035208] type=2000 audit(1376481113.032:1): initialized [ 1.037043] Trying to unpack rootfs image as initramfs... [ 1.320896] HugeTLB registered 2 MB page size, pre-allocated 0 pages [ 1.328782] VFS: Disk quotas dquot_6.5.2 [ 1.330348] Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 1.338159] fuse init (API version 7.19) [ 1.339641] msgmni has been set to 3963 [ 1 ... (more)

edit flag offensive delete link more
0

answered 2013-08-14 14:16:06 -0500

Is state "Waiting" has a cluster that you created now through REST? Do I understand correctly that after the you have created this cluster - launched several instances in addition to those that have already been created? In order to see whether there have been any changes after token's update please show their nova console-log (don't need wait for the transition to the ERROR state - if the instances has launched, it will be the last line: "Cloud-init v. 0.7 finished ...")

Also, I'd still like to see the debug log file (for example it can be put on http://paste.openstack.org/ , beforehand hiding all private information if necessary)

edit flag offensive delete link more
0

answered 2013-08-14 12:08:32 -0500

sarita18narwal gravatar image

Ok,Thanks for correcting me.Now i am able to create cluster again using REST.

But still my cluster is in waiting state. :(

edit flag offensive delete link more
0

answered 2013-08-14 11:59:39 -0500

Savanna 0.2.x doesn't need endpoint creation. Also I notice that you missed a comma at the end of the line: "cluster_template_id": "a9dc1023-ddb9-476d-a3f5-a6e9c643e614" fix it and try to create cluster again

edit flag offensive delete link more
0

answered 2013-08-14 11:43:47 -0500

sarita18narwal gravatar image

I had created the cluster from both :through UI or REST requests.

I had also refreshed the token but nothing get beneficial.

In REST it also showing me the waiting state.

Now when i am going to create a new cluster using REST its giving me error :

http http://0.0.0.0:8386/v1.0/11f94438f0f1466d814f2cf5a6f2839a/clusters (http://0.0.0.0:8386/v1.0/11f94438f0f1...) X-Auth-Token:dcb533e8342c44389cdd91569ff2663b < cluster_create.json

HTTP/1.1 500 INTERNAL SERVER ERROR Content-Length: 81 Content-Type: application/json Date: Wed, 14 Aug 2013 11:19:05 GMT

{ "error": 500, "error_message": "Malformed message body: cannot understand JSON" }

my cluster_create.json is :

{ "name": "cluster-1", "plugin_name": "vanilla", "hadoop_version": "1.1.2", "cluster_template_id": "a9dc1023-ddb9-476d-a3f5-a6e9c643e614" "user_keypair_id": "stack", "default_image_id": "8afb092a-f69e-4d26-aea5-09451ac11d8d" }

One more doubt : In the savanna2 doumentation there is no endpoint for Hadoop. In Earlier implementation i had created service and endpoint for hadoop but it does not work.

Is it necessary? I should declare this service and endpoint or not?

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2013-08-06 10:36:02 -0500

Seen: 223 times

Last updated: Nov 04 '13