Ask Your Question

ram5391's profile - activity

2016-11-08 12:08:32 -0500 received badge  Popular Question (source)
2016-11-02 15:13:34 -0500 answered a question Mirantis Fuel KVM migration causes DNS to fail

This ended up being a simple fix, after clearing the Fuel VM's ARP table and restarting the dnsmasq service, the DNS names resolve correctly now.

2016-10-30 09:31:03 -0500 asked a question Mirantis Fuel KVM migration causes DNS to fail

Has anyone had experience migrating a KVM based Fuel9.0 installation?

I copied the working .qcow2 disk image to the new host, did a virsh dumpxml, and defined the domain on the new host. That part works flawlessly.

Fuel comes up and everything works fine except DNS resolution to nodes, it looks like whatever fuel is using as a DNS server is not working.

nslookup will not resolve node names, it hits 8.8.8.8 (which suggests it has failed to use itself as a lookup)

The new instance does have access to the internet, and can ping the machines on their internal IPs, but I cannot ssh into the nodes with their internal IPs.

What is the service that fuel uses for its DNS to its nodes? That looks like it has been corrupted somehow.

2016-10-28 15:20:38 -0500 marked best answer arp requests from compute nodes bogging network

My openstack installation is semi-down at the moment due to high network traffic. Upon doing a TCPDUMP from the head-node, I get the following:

15:47:01.732351 IP node-32 > node-20: GREv0, key=0x4, length 50: ARP, Request who-has 192.168.1.2 tell  192.168.1.1, length 28
15:47:01.732378 IP node-32 > node-20: GREv0, key=0x4, length 50: ARP, Request who-has 192.168.1.2 tell 192.168.1.1, length 28
15:47:01.732419 IP node-19 > node-20: GREv0, key=0x4, length 50: ARP, Request who-has 192.168.1.2 tell 192.168.1.1, length 28
15:47:01.732422 IP node-19 > node-20: GREv0, key=0x4, length 50: ARP, Request who-has 192.168.1.2 tell 192.168.1.1, length 28
15:47:01.732506 IP node-19 > node-20: GREv0, key=0x4, length 50: ARP, Request who-has 192.168.1.2 tell 192.168.1.1, length 28
15:47:01.732509 IP node-19 > node-20: GREv0, key=0x4, length 50: ARP, Request who-has 192.168.1.2 tell 192.168.1.1, length 28

The network is saturated (about 1,800,000 packets in 5 seconds) with this kind of traffic. Can someone please help me diagnose what is going on?

I'm using neutron with GRE tunneling. I'm also using a ceph backend.

Thanks

2016-10-28 15:20:38 -0500 received badge  Scholar (source)
2014-09-28 21:03:28 -0500 received badge  Famous Question (source)
2014-09-27 12:29:51 -0500 received badge  Notable Question (source)
2014-09-27 12:29:51 -0500 received badge  Popular Question (source)
2014-09-13 10:07:28 -0500 received badge  Nice Question (source)
2014-06-25 02:10:42 -0500 received badge  Famous Question (source)
2014-05-20 13:20:06 -0500 received badge  Famous Question (source)
2014-04-22 12:40:51 -0500 received badge  Notable Question (source)
2014-04-22 00:13:16 -0500 received badge  Popular Question (source)
2014-04-21 15:17:10 -0500 commented question cannot communicate with openstack VMs via neutron

We cannot console into the machine to try to the standard username or password, but "cannot communicate" means despite allowing all icmp and tcp connections, we cannot ping or vnc into the image

2014-04-21 15:01:38 -0500 asked a question cannot communicate with openstack VMs via neutron

We deployed an openstack installation using JUJU, all the default configurations were used, and we're essentially using a flat network at the moment. When we boot an instance, (Cirros 0.3.0) we can see the following output in the log of the instance:

Apr 21 12:57:55 cirros kern.info kernel: [    1.130507] acpiphp: Slot [28] registered
Apr 21 12:57:55 cirros kern.info kernel: [    1.130520] acpiphp: Slot [29] registered
Apr 21 12:57:55 cirros kern.info kernel: [    1.130534] acpiphp: Slot [30] registered
Apr 21 12:57:55 cirros kern.info kernel: [    1.130550] acpiphp: Slot [31] registered
Apr 21 12:57:55 cirros kern.info kernel: [    1.133679] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
Apr 21 12:57:55 cirros kern.info kernel: [    1.133682] e1000: Copyright (c) 1999-2006 Intel Corporation.
Apr 21 12:57:55 cirros kern.info kernel: [    1.135678] ne2k-pci.c:v1.03 9/22/2003 D. Becker/P. Gortmaker
Apr 21 12:57:55 cirros kern.info kernel: [    1.137350] 8139cp: 8139cp: 10/100 PCI Ethernet driver v1.3 (Mar 22, 2004)
Apr 21 12:57:55 cirros kern.info kernel: [    1.139077] pcnet32: pcnet32.c:v1.35 21.Apr.2008 tsbogend@alpha.franken.de
Apr 21 12:57:55 cirros kern.info kernel: [    1.142400] ip_tables: (C) 2000-2006 Netfilter Core Team
Apr 21 12:57:55 cirros kern.info kernel: [    1.252120] Refined TSC clocksource calibration: 2000.004 MHz.
Apr 21 12:57:55 cirros kern.info kernel: [    1.676491] eth0: IPv6 duplicate address fe80::f816:3eff:fe7e:660e detected!
############ debug end   ##############
cloud-setup: failed to read iid from metadata. tried 30
WARN: /etc/rc3.d/S45-cloud-setup failed
Starting dropbear sshd: generating rsa key... generating dsa key... OK
===== cloud-final: system completely up in 138.14 seconds ====
wget: can't connect to remote host (169.254.169.254): No route to host
wget: can't connect to remote host (169.254.169.254): No route to host
wget: can't connect to remote host (169.254.169.254): No route to host
  instance-id: 
  public-ipv4: 
  local-ipv4 : 
wget: can't connect to remote host (169.254.169.254): No route to host
cloud-userdata: failed to read instance id
WARN: /etc/rc3.d/S99-cloud-userdata failed
  ____               ____  ____
 / __/ __ ____ ____ / __ \/ __/
/ /__ / // __// __// /_/ /\ \ 
\___//_//_/  /_/   \____/___/ 
 http://launchpad.net/cirros


login as 'cirros' user. default password: 'cubswin:)'. use 'sudo' for root.
cirros login:
2014-04-17 02:06:00 -0500 received badge  Notable Question (source)
2014-04-15 02:50:48 -0500 received badge  Popular Question (source)
2014-04-14 14:25:38 -0500 received badge  Editor (source)
2014-04-14 14:25:16 -0500 asked a question JuJu bootstrap fails, connection refused port 22

I have a MAAS installation with it's interface at 10.12.1.3/MAAS. It is responsible for DHCP/DNS and the configurations are as follows: ip: 10.12.1.4 subnet mask: 255.255.255.0 broadcast ip: 10.12.1.255 router ip: 10.12.1.1 ip range low: 10.12.1.10 ip rane high: 10.12.1.100

I am booting virtualbox machines on the same machine as the cluster controller/region controller, and they enlist fine. I change their name in MAAS, accept, and commission them. The enlistment and commission process take a very long time (just food for thought). Once the machine has been commissioned, (it says ready in MAAS nodes list). Run JuJu bootstrap. Juju has the following environments.yaml:

environments:
    maas:
        type: maas
        maas-server: 'http://10.12.1.3/MAAS/'
        maas-oauth: pVreWZYhzaAFmqNjV3:W96PuEtANsr3n2SkGR:xF8dccH7NPUjhpejauzek$
        admin-secret: 'whatever'
        default-series: precise
        #authorized-keys-path: ~/.ssh/id_rsa.pub

when I run juju bootstrap, then power on the virtualbox VM, I get this:

picked arbitrary tools &{"1.18.1-precise-amd64" "https://streams.canonical.com/juju/tools/releases/juju-1.18.1-   precise-amd64.tgz" "65ea92cd8812bff3e49df78f9e8e964e91c44af0abd49d880c4333a78c8abfda" %!q(int64=5368375)} - /MAAS/api/1.0/nodes/node-f942c69e-c400-11e3-bc28-0025906c5dd6/
Waiting for address
 Attempting to connect to test.draco:22

This screen will hang out for 10 minutes, and then I recieve the following:

 ERROR juju.provider.common bootstrap.go:123 bootstrap failed: waited for 10m0s without being able to connect: ssh: connect to host test.draco port 22: Connection refused

if I run a juju status while it is attempting to bootstrap, I get the following:

ERROR state/api: websocket.Dial wss://nova1.draco:17070/: dial tcp 10.12.1.10:17070: connection refused

over and over again.

I am using the 12.04.4 MAAS ubuntu install, and Juju version: 1.18.1-precise-amd64

The ssh keys have been generated, uploaded to maas, re-generated and uploaded to MAAS again. Also, the machine that juju is supposed to build will finish building (status in MAAS changes to "Allocated to bla") despite the environment bootstrap failure, and then it will power off. The status in MAAS goes back to "ready" and if I turn the virtual machine back on, it seems to go back into enlisting.

Anyone know what could be going wrong?

2014-03-18 10:13:38 -0500 received badge  Famous Question (source)
2014-02-05 15:06:48 -0500 received badge  Notable Question (source)
2014-01-30 09:59:26 -0500 received badge  Popular Question (source)
2014-01-27 12:03:05 -0500 received badge  Student (source)
2014-01-24 14:40:17 -0500 asked a question 401 errors on glance/nova image list

seemingly over night none of my users or my admin account can log in to the horizon dashboard. I tailed the logs of nova-api and got basically a 401 error. I then issued the command

   nova --debug image-list

and recieved the following output:

REQ: curl -i http://10.10.1.10:8774/v2/f7b27e8db30645dda873d4c772a528e6/images/detail -X GET -H "X-Auth-Project-Id: admin" -H "User-Agent: python-novaclient" -H "Accept: application/json" -H "X-Auth-Token: [token code]"

send: u'GET /v2/f7b27e8db30645dda873d4c772a528e6/images/detail HTTP/1.1\r\nHost: 10.10.1.10:8774\r\nx-auth-project-id: admin\r\nx-auth-token: [tokencode]=\r\naccept-encoding: gzip, deflate\r\naccept: application/json\r\nuser-agent: python-novaclient\r\n\r\n'
reply: 'HTTP/1.1 401 Unauthorized\r\n'
header: Www-Authenticate: Keystone uri='http://127.0.0.1:5000/'
header: Content-Length: 276
header: Content-Type: text/plain; charset=UTF-8
header: Date: Fri, 24 Jan 2014 20:34:20 GMT
RESP:{'date': 'Fri, 24 Jan 2014 20:34:20 GMT', 'status': '401', 'content-length': '276', 'content-type': 'text/plain; charset=UTF-8', 'www-authenticate': "Keystone uri='http://127.0.0.1:5000/'"} 401 Unauthorized

This server could not verify that you are authorized to access the document you requested. Either you supplied the wrong credentials (e.g., bad password), or your browser does not understand how to supply the credentials required.

 Authentication required

DEBUG (shell:543) n/a (HTTP 401)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/novaclient/shell.py", line 540, in main
    OpenStackComputeShell().main(sys.argv[1:])
  File "/usr/local/lib/python2.7/dist-packages/novaclient/shell.py", line 476, in main
    args.func(self.cs, args)
  File "/usr/local/lib/python2.7/dist-packages/novaclient/v1_1/shell.py", line 517, in do_image_list
    image_list = cs.images.list()
  File "/usr/local/lib/python2.7/dist-packages/novaclient/v1_1/images.py", line 47, in list
    return self._list("/images/detail", "images")
  File "/usr/local/lib/python2.7/dist-packages/novaclient/base.py", line 62, in _list
    _resp, body = self.api.client.get(url)
  File "/usr/local/lib/python2.7/dist-packages/novaclient/client.py", line 241, in get
    return self._cs_request(url, 'GET', **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/novaclient/client.py", line 238, in _cs_request
    raise ex
Unauthorized: n/a (HTTP 401)
ERROR: n/a (HTTP 401)

Both glance and nova output similar things. However I can still issue keystone commands. Issuing a keystone endpoint-list command returns:

+----------------------------------+-----------+-----------------------------------------+-----------------------------------------+-----------------------------------------+----------------------------------+
|                id                |   region  |                publicurl                |               internalurl               |                 adminurl                |            service_id            |
+----------------------------------+-----------+-----------------------------------------+-----------------------------------------+-----------------------------------------+----------------------------------+
| 170adbc342774125bbb3501ff75a77b6 | RegionOne |        http://10.10.1.10:9292/v1        |        http://10.10.1.10:9292/v1        |        http://10.10.1.10:9292/v1        | 9eb322e90a894dbcb6ccb2273df7b080 |
| 3c59ced5b1f542b6a66f83fc0cda7feb | RegionOne |       http://10.10.1.10:5000/v2.0       |       http://10.10.1.10:5000/v2.0       |       http://10.10.1.10:35357/v2.0      | a3f470c0e4dc4926bb5456354182e922 |
| 8a79e7b7331e44058d1f51deaaad87e9 | RegionOne | http://10.10.1.10:8774/v2/%(tenant_id)s | http://10.10.1.10:8774/v2/%(tenant_id)s | http://10.10.1.10:8774/v2/%(tenant_id)s | cf35bb2df8324710b9fcd1a91901f353 |
+----------------------------------+-----------+-----------------------------------------+-----------------------------------------+-----------------------------------------+----------------------------------+

What has gone wrong? Where can I start looking?