Ask Your Question
0

Small HA installation: ceph nodes not considered?

asked 2020-04-28 12:04:30 -0500

tekkafedora gravatar image

Hello, sorry for long post, I try to give more details possible below. I'm setting up a small test lab for HA Openstack environment based on Queens and CentOS 7 (to be the best similar with OSP 13, as my target) and the Openstack nodes will be oVirt VMs. Also the director is a VM in oVirt.

Idea is to have 2 compute, 3 controllers, 3 ceph storage nodes (for image, block, object and manila). The nodes have 1 60Gb root disk; the ceph nodes have 2 more disks (100Gb for journal and 150Gb for OSD).

I have installed undercloud and I have made up some combinations of instackenv.json file for introspection and all nodes are correctly introspected with VMs powered on and off. I have 4 questions:

  • which value to use for this small storage cluster and ovveride default ceph parameters (pgnum, mon_max_pg_per_osd, ecc.) without getting errors during deploy?

  • what is the correct parameter to set in instackenv.json or through "openstack baremetal node set --property .." command to have a map for ceph OSD for the 3 designated hosts?

  • at which stage of the overcloud deploy are ceph nodes expected to be powered on and installed?

  • is it correct that in this architecture layout mon, mgr and mds are deployed on controller nodes as docker containers while only OSD on the dedicated storage nodes?

Thanks, Gianluca

Details: For ceph OSD nodes I have tried to give these capabilities in instackenv.json file:

"name": "ostack-ceph2",
"capabilities": "profile:ceph-storage,node:ceph-2,boot_option:local"

with then a scheduler_hints_env.yaml file of this type:

parameter_defaults:
  ControllerSchedulerHints:
    'capabilities:node': 'controller-%index%'
  ComputeSchedulerHints:
    'capabilities:node': 'compute-%index%'
  CephStorageSchedulerHints:
    'capabilities:node': 'ceph-%index%'
  HostnameMap:
    overcloud-controller-0: ostack-ctrl0
    overcloud-controller-1: ostack-ctrl1
    overcloud-controller-2: ostack-ctrl2
    overcloud-novacompute-0: ostack-compute0
    overcloud-novacompute-1: ostack-compute1
    overcloud-ceph-storage-0: ostack-ceph0
    overcloud-ceph-storage-1: ostack-ceph1
    overcloud-ceph-storage-2: ostack-ceph2

But while compute and controllers are deployed ok and their hostnames are also correctly mapped, ceph nodes remain untouched, not even powered on; I don't know if it depends on expected workflow and they need to be set up only at a final stage that doesn't arrive.

To accomplish this, for ceph I'm giving to overcloud deploy these environment files:

-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-mds.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/manila-cephfsnative-config.yaml \

I'm also using this env file:

parameter_defaults:
  ControllerCount: 3
  ComputeCount: 2
  CephCount: 3

BTW: I receive at very beginning that CephCount parameter is ignored (?) Initially I received errors during Ceph setup due to low PGs defaut numbers:

"stderr": "Error ERANGE:  pg_num 128 size 3 would mean 768 total pgs, which exceeds max 750 (mon_max_pg_per_osd 250 * num_in_osds 3)"

So I'm trying to change with this env file:

parameter_defaults:
  CephPoolDefaultSize: 3
  CephPoolDefaultPgNum: 64
  CephConfigOverrides:
    mon_max_pg_per_osd: 400

Right now the deploy seems stuck after step

2020-04-28 14:04:27Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step5.2]: CREATE_COMPLETE  state changed

and on controller nodes I have:

[root@ostack-ctrl0 ~]# ceph -s
  cluster:
    id:     5d194678-8950-11ea-b8c5-566f3d480013
    health: HEALTH_WARN
            1 MDSs report slow metadata IOs
            Reduced data ...
(more)
edit retag flag offensive close merge delete

Comments

Don't co-locate ceph services with openstack services, mon and mgr services should also run on your osd nodes if you only have 3 of them. With so few OSDs you should choose a very low pg_num value, 8 or 16 at most. mon_max_pg_per_osd should not be increased, just lower CephPoolDefaultPgNum to 16

eblock gravatar imageeblock ( 2020-04-29 02:31:47 -0500 )edit

Your OSD daemons don't seem to work (osd: 0 osds: 0 up, 0 in), have you deployed OSDs before trying to create pools? Without OSDs you can't store data.

eblock gravatar imageeblock ( 2020-04-29 02:33:32 -0500 )edit

1 answer

Sort by ยป oldest newest most voted
0

answered 2020-04-29 03:41:21 -0500

tekkafedora gravatar image

Thanks for answering.

For me there is no problem to have mon and mgr services to run on osd nodes, the problem is how to compile instackenv.json and instruct deploy command (through env.yaml files) to do so....

I was basing on some links like:

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/director_installation_and_usage/chap-requirements#sect-Environment_Requirements (https://access.redhat.com/documentati...)

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/profile_matching.html (https://docs.openstack.org/project-de...)

But it is not clear to me how to create the map to say the installer to use the 3 dedicated ceph nodes I have chosen for that.... can you give a tip about it?

You say "without OSD you can't store data", but I would expect the deploy command to setup OSD nodess... while my 3 nodes elected to be the ceph ones are "correctly" (in the sense that they are powered on and then off) introspected, but then they seem to have been excluded from the whole workflow operation... Can you confirm that at the stage where compute and controller nodes are powered on, also ceph nodes should have been powered on at the same time? Or are they expected to be powered on only in a second moment?

Are my settings below in jstackenv.json correct for the candidate ceph nodes? Or what do I have to change?

"capabilities": "profile:ceph-storage,node:ceph-2,boot_option:local"
edit flag offensive delete link more

Comments

Unfortunately I'm not familiar with tripleo and the respective mappings, sorry. Maybe someone else can chime in here.

eblock gravatar imageeblock ( 2020-04-29 08:26:51 -0500 )edit

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Get to know Ask OpenStack

Resources for moderators

Question Tools

1 follower

Stats

Asked: 2020-04-28 12:04:30 -0500

Seen: 91 times

Last updated: Apr 28