help with failing Mirantis fuel 9.1 deployment

asked 2017-01-25

racingferret

updated 2017-01-25 11:12:20 -0500

Hi All,

I'm currently trying to deploy a basic 4 node PoC with Mirantis fuel 9.1 All hardware is identical and the comms check passes on the Fuel dashboard. I have one controller node and 3 compute/osd nodes. Cinder, Glance, Nove and Swift are all backed by Ceph with an object replication factor of 3.

The controller node finishes deployment without issue, but all the compute/osd nodes fail as they timeout running ceph commands. On the controller node, if I run "ceph -s", I get the following output:

cluster a7f64266-0894-4f1e-a635-d0aeaca0e993
  health HEALTH_ERR 192 pgs stuck inactive; 192 pgs stuck unclean; no osds
  monmap e1: 1 mons at {node1=}, election epoch 1, quorum 0 node1
  osdmap e1: 0 osds: 0 up, 0 in
  pgmap v2: 192 pgs, 3 pools, 0 bytes data, 0 objects
     0 kB used, 0 kB / 0 kB avail
     192 creating

The ceph log file on and controller node also outputs the following:

mon.node-14@0(leader).auth v33 caught error when trying to handle auth request, probably malformed request

Running any ceph command on any of the osd nodes results in a pause, followed by the following being output continuously:

monclient: hunting for new mon

Any help with how to start debudding this would be great.


answered 2017-02-15

racingferret

In case anyone else stumbles across this issue, turns out the non-default MTU of 9000 I set for the storage network wasn't being set on the vswitch. Setting it back to "default" allowed the install to complete without issue.

Asked: 2017-01-25

Last updated: Feb 15 '17