Ask Your Question
0

nova scheduler unfairly favoring host - filters and weights

asked 2016-10-11 14:36:14 -0500

shorton gravatar image

updated 2016-10-11 17:28:22 -0500

I have a 3 node Openstack cluster (Mitaka on ubuntu 16.04) where my Controller node is also serving at Compute Host #1. My other 2 hosts are compute #2 and compute #3. I am using Ceph (Jewel) distributed cluster pools to back Nova ephemeral, Cinder, Glance, and Manila, and all is working well. I have launched a number of large and small VMs, and 80% of them (18) were provisioned on Host #1. Host 2 and 3 have 4 VMs each. I originally thought that this could be due to ceph--that when I look at the hypervisor summary under System, Host #1 local storage total shows the combined storage for all 3 hosts (42TB). Host 2 and 3 only show their local (real physical) storage (14TB).

To remedy this problem, I added the following to /etc/nova/nova.conf: scheduler_default_filters = RetryFilter, AvailabilityZoneFilter, RamFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter (which has 'DiskFilter' removed from the default filter list) and I restarted all of the Nova services on the controller. However, when I provision new VMs, they are still getting placed on Host #1. Does anyone have any guidance on what I need to do to better balance the allocation?

Note that my 3 hosts are identical: 32cpus, 256GB RAM, 14TB raid disk. Thank you

Update: Host1:


ceph -s
    cluster 6e647506-631a-457e-a52a-f21a3866a023
     health HEALTH_OK
     monmap e1: 3 mons at {arccloud01=10.155.92.128:6789/0,arccloud02=10.155.92.129:6789/0,arccloud03=10.155.92.130:6789/0}
            election epoch 5152, quorum 0,1,2 arccloud01,arccloud02,arccloud03
      fsmap e1858: 1/1/1 up {0=arccloud01=up:active}
     osdmap e1532: 3 osds: 3 up, 3 in
            flags sortbitwise
      pgmap v1982739: 384 pgs, 6 pools, 661 GB data, 2135 kobjects
            2529 GB used, 39654 GB / 42184 GB avail
                 384 active+clean
  client io 807 kB/s wr, 0 op/s rd, 301 op/s wr


cat /etc/ceph/ceph.conf
[global]
  fsid = 6e647506-631a-457e-a52a-f21a3866a023
  mon_initial_members = arccloud01, arccloud02, arccloud03
  mon_host = 10.155.92.128,10.155.92.129,10.155.92.130
  mon_pg_warn_max_per_osd = 400
  mon_lease = 50
  mon_lease_renew_interval = 30
  mon_lease_ack_timeout = 100
  auth_cluster_required = cephx
  auth_service_required = cephx
  auth_client_required = cephx
  public_network = 10.155.92.0/22
  cluster_network = 192.168.92.0/22
[client.glanceimages]
  keyring = /etc/ceph/ceph.client.glanceimages.keyring
[client.novapool]
  keyring = /etc/ceph/ceph.client.novapool.keyring
[client.cindervolumes]
  keyring = /etc/ceph/ceph.client.cindervolumes.keyring
[client.manila]
  client_mount_uid = 0
  client_mount_gid = 0
  log_file = /opt/stack/logs/ceph-client.manila.log
  admin_socket = /opt/stack/status/stack/ceph-$name.$pid.asok
  keyring = /etc/ceph/ceph.client.manila.keyring
[mon.arccloud01]
  host = arccloud01
  mon addr = 10.155.92.128:6789
[mon.arccloud02]
  host = arccloud02
  mon addr = 10.155.92.129:6789
[mon.arccloud03]
  host = arccloud03
  mon addr = 10.155.92.130:6789
[osd.2]
  host = arccloud01
  public addr = 10.155.92.128
  cluster addr = 192.168.92.128
[osd.1]
  host = arccloud02
  public addr = 10.155.92.129
  cluster addr = 192.168.92.129
[osd.0]
  host = arccloud03
  public addr = 10.155.92.130
  cluster addr = 192.168 ...
(more)
edit retag flag offensive close merge delete

2 answers

Sort by ยป oldest newest most voted
0

answered 2016-10-11 16:30:56 -0500

updated 2016-10-12 14:30:52 -0500

1st, if you are using Ceph there is no need for RAID. It'll impact performance and you'll incur the overhead of RAID _and_ the overhead of Ceph.

http://www.techrepublic.com/article/why-ceph-could-be-the-raid-replacement-the-enterprise-needs/ (http://www.techrepublic.com/article/w...)

What do you see if you do a ceph -s?

You may also need to update some other settings as well for volumes (cinder) and images (glance) sections to use rbd.

Can you post your nova.conf for host 2 and/or host 3 in its entirety? Or, you can just look at host1's nova.conf and compare it to host2 and host3 and see if there are differences.

Probably wouldn't hurt to post the /etc/ceph/ceph.conf file details as well to see how your Ceph is setup.

UPDATE:

Your nova.conf for host1 is different from host2 and host3.

Host2 and Host3 nova.conf are missing the libvirt info necessary to use rbd:

[libvirt]
images_rbd_pool=novapool
images_type=rbd
rbd_secret_uuid=xxxxxxxxx
rbd_user=novapool

Please add that to nova.conf for host2 and host3 and restart nova-compute service. That should fix it.

ETA: There are other differences between host1 nova.conf and host2 and host3's nova.conf. You should go through the nova.conf line-by-line to ensure all the settings are proper and they should be virtually identical across all hosts--except for a few settings specific to each host such as my_ip = x.x.x.x or vncserver_listen = x.x.x.x which are specific to that host.

ETA2:

When you sync up the settings on all nova.conf files you'll need to restart scheduler service as well. Your host2 and host3 nova.conf are missing scheduler settings.

ETA3:

Did that work for you?

edit flag offensive delete link more
0

answered 2016-10-11 17:45:54 -0500

shorton gravatar image

Hi Rick, see file contents above in update.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2016-10-11 14:36:14 -0500

Seen: 126 times

Last updated: Oct 12 '16