Ask Your Question
0

Startup problems with a compute node in a multi-node cluster

asked 2011-06-28 21:21:29 -0500

cubranic gravatar image

I am setting up a dual-node cluster, with one node running all services (let's call it nova1), and another just nova-compute (nova2). The first node works fine, but on the latter, the compute node nova2, nova-compute service does not start properly. In the syslog, I see lines like:

Jun 28 06:25:14 compute1 init: nova-compute main process (2914) terminated with status 1 Jun 28 06:25:14 compute1 init: nova-compute main process ended, respawning

(repeating every second)

In /var/log/nova/nova-compute.log, I see the following CRITICAL error with a stack trace: (OperationalError) (1054, "Unknown column 'instances.image_ref' in 'field list'")"

Probably because of this error, Nova never sets up the networking bridges and routes, and the compute node cannot access guest instances running on the nova1 node.

Interestingly, the command line tools that I tried on nova2 ("euca-describe-instances", "nova-manage network list", etc.) still show correct information about the Nova cluster and instances running on nova1.

edit retag flag offensive close merge delete

21 answers

Sort by ยป oldest newest most voted
0

answered 2011-06-29 23:10:54 -0500

cubranic gravatar image

I see it with 2011.3~d3~20110629.150-0ubuntu0ppa1~natty1. After restarting all the Nova services (network, compute, api, objectstore, scheduler, in that order), euca-describe-images still throws an UnknownError. There is a stack trace in nova-api.log:

2011-06-29 16:03:25,080 DEBUG nova.api [-] action: DescribeImages from (pid=2422 7) __call__ /usr/lib/pymodules/python2.7/nova/api/ec2/__init__.py:214 2011-06-29 16:03:25,080 DEBUG nova.api [-] arg: Owner.1 val: self from ( pid=24227) __call__ /usr/lib/pymodules/python2.7/nova/api/ec2/__init__.py:216 2011-06-29 16:03:25,081 ERROR nova.api [35N3X4O8-AL1RI34AL0M prj1admin testprj1] Unexpected error raised: Unable to connect to server. Got error: [Errno 111] EC ONNREFUSED (nova.api): TRACE: Traceback (most recent call last): (nova.api): TRACE: File "/usr/lib/pymodules/python2.7/nova/api/ec2/__init__.py ", line 320, in __call__ (nova.api): TRACE: result = api_request.invoke(context) (nova.api): TRACE: File "/usr/lib/pymodules/python2.7/nova/api/ec2/apirequest. py", line 78, in invoke (nova.api): TRACE: result = method(context, **args) (nova.api): TRACE: File "/usr/lib/pymodules/python2.7/nova/api/ec2/cloud.py", line 1097, in describe_images (nova.api): TRACE: images = self.image_service.detail(context) (nova.api): TRACE: File "/usr/lib/pymodules/python2.7/nova/image/s3.py", line 75, in detail (nova.api): TRACE: return self.service.detail(context) (nova.api): TRACE: File "/usr/lib/pymodules/python2.7/nova/image/glance.py", line 106, in detail (nova.api): TRACE: limit=limit) (nova.api): TRACE: File "/usr/lib/pymodules/python2.7/glance/client.py", line 84, in get_images_detailed (nova.api): TRACE: res = self.do_request("GET", "/images/detail", params=params) (nova.api): TRACE: File "/usr/lib/pymodules/python2.7/glance/client.py", line 54, in do_request (nova.api): TRACE: headers, params) (nova.api): TRACE: File "/usr/lib/pymodules/python2.7/glance/common/client.py", line 148, in do_request (nova.api): TRACE: "server. Got error: %s" % e) (nova.api): TRACE: ClientConnectionError: Unable to connect to server. Got error: [Errno 111] ECONNREFUSED (nova.api): TRACE:

edit flag offensive delete link more
0

answered 2011-06-29 23:39:04 -0500

Davor: It looks like your glance server is either not running, or your glance-api-servers flag in your nova config is incorrect. Check on those two things.

edit flag offensive delete link more
0

answered 2011-06-29 20:06:24 -0500

blamar gravatar image

Hey Davor, the bug fix went in to r148 of Glance so it seems you should have it (I think, I'm not an avid user of the PPAs). If you've restarted all applicable services and this is still happening feel free to submit a bug report or paste the latest error stack.

edit flag offensive delete link more
0

answered 2011-06-29 17:03:42 -0500

cubranic gravatar image

Also, still no Nova-related networking is set up on the second node (br100 and routes/iptables rules to ping the VMs).

edit flag offensive delete link more
0

answered 2011-06-29 18:53:20 -0500

cubranic gravatar image

Thanks Brian.

It looks like there is a new set of updates today, but it didn't fix it yet. Do you know which release your fix will be in? I have python-glance 2011.3~d3~20110629.149-0ubuntu0ppa1~natty1.

edit flag offensive delete link more
0

answered 2011-06-28 21:38:49 -0500

cubranic gravatar image

Information about my setup:

  • each host has two NICs: one on the private management subnet (192.168.11.x), and another on the public internet
  • FlatDHCPManager
  • guest instances run on a virtual network 10.0.0.0/12, starting at 10.0.1.2
  • nova1 is the network controller and has an address on the guest network: 10.0.1.1
edit flag offensive delete link more
0

answered 2011-06-28 21:50:21 -0500

everett-toews gravatar image

I've found that anytime you see "Unknown column" problems in your logs you've got mismatched version problems.

Confirm that you're running the same version of Nova on both nodes.

dpkg -l 'nova'

Everett

On Tue, Jun 28, 2011 at 3:41 PM, Davor Cubranic < question163082@answers.launchpad.net > wrote:

Question #163082 on OpenStack Compute (nova) changed: https://answers.launchpad.net/nova/+question/163082 (https://answers.launchpad.net/nova/+q...)

Davor Cubranic gave more information on the question: Information about my setup:

  • each host has two NICs: one on the private management subnet (192.168.11.x), and another on the public internet
  • FlatDHCPManager
  • guest instances run on a virtual network 10.0.0.0/12, starting at 10.0.1.2
  • nova1 is the network controller and has an address on the guest network: 10.0.1.1

You received this question notification because you are an answer contact for OpenStack Compute (nova).

edit flag offensive delete link more
0

answered 2011-07-04 22:45:02 -0500

cubranic gravatar image

I opened a separate bug about the use of trunk PPA in Cactus docs: https://bugs.launchpad.net/bugs/805711

edit flag offensive delete link more
0

answered 2011-06-28 21:51:57 -0500

vishvananda gravatar image

You need to make sure the hosts are talking to the same database. Sounds like compute host is talking to a local (older) database.

Vish

On Jun 28, 2011, at 2:41 PM, Davor Cubranic wrote:

Question #163082 on OpenStack Compute (nova) changed: https://answers.launchpad.net/nova/+q...

Davor Cubranic gave more information on the question: Information about my setup:

  • each host has two NICs: one on the private management subnet (192.168.11.x), and another on the public internet
  • FlatDHCPManager
  • guest instances run on a virtual network 10.0.0.0/12, starting at 10.0.1.2
  • nova1 is the network controller and has an address on the guest network: 10.0.1.1

You received this question notification because you are a member of Nova Core, which is an answer contact for OpenStack Compute (nova).

edit flag offensive delete link more
0

answered 2011-06-28 22:07:31 -0500

cubranic gravatar image

Everett, there were updates on node1, while node2 was up to date. However, once I updated node1 and rebooted it, everything stopped working. I get numerous DB-related errors in various services' logs:

  • nova-network.log: OperationalError) (1054, "Unknown column 'instances_1.image_ref' in 'field list'"
  • nova-compute.log: (OperationalError) (1054, "Unknown column 'instances.image_ref' in 'field list'")
  • nova-api.log: Unexpected error raised: 'NoneType' object does not support item assignment
  • nova-manage.log: CRITICAL nova [-] enable() takes exactly 3 arguments (1 given)

Did some migration not run on the database after packages were upgraded? Is there a way to recover the database, or at least to reset it so that services can start running again?

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2011-06-28 21:21:29 -0500

Seen: 206 times

Last updated: Jul 04 '11