I'm setting up a small test lab for HA Openstack environment based on Queens and CentOS 7 (to be the best similar with OSP 13, as my target). Director is CentOS 7.8 with ansible ansible-2.6.19-1.el7.ans.noarch and undercloud deployed.
Idea is to have 2 compute, 3 controllers, 3 ceph storage nodes
As the first step of overcloud deploy the correct 3 ceph nodes candidates for OSD role are powered on and begin to be configured, but at a certain point I have with this error
2020-05-03 22:31:07Z [AllNodesDeploySteps.CephStorageDeployment_Step1.0]: CREATE_COMPLETE state changed 2020-05-03 22:31:07Z [AllNodesDeploySteps.CephStorageDeployment_Step1]: CREATE_COMPLETE Stack CREATE completed successfully 2020-05-03 22:31:08Z [AllNodesDeploySteps.CephStorageDeployment_Step1]: CREATE_COMPLETE state changed 2020-05-03 22:31:08Z [AllNodesDeploySteps.WorkflowTasks_Step2]: CREATE_IN_PROGRESS state changed 2020-05-03 22:31:09Z [AllNodesDeploySteps.WorkflowTasks_Step2]: CREATE_COMPLETE state changed 2020-05-03 22:31:09Z [AllNodesDeploySteps.WorkflowTasks_Step2_Execution]: CREATE_IN_PROGRESS state changed 2020-05-03 22:33:49Z [AllNodesDeploySteps.WorkflowTasks_Step2_Execution]: CREATE_FAILED resources.WorkflowTasks_Step2_Execution: Failure caused by error in tasks: ceph_base_ansible_workflow ceph_base_ansible_workflow [task_ex_id=b9eee634-5f66-47f9-bef2-57e46cd6d80f] -> Failure caused by error in tasks: ceph_install ceph_install [task_e 2020-05-03 22:33:49Z [AllNodesDeploySteps]: CREATE_FAILED Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: Failure caused by error in tasks: ceph_base_ansible_workflow ... overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution: resource_type: OS::TripleO::WorkflowSteps physical_resource_id: 71e920fe-b781-4d41-b082-cf27c7bbdb4c status: CREATE_FAILED status_reason: | resources.WorkflowTasks_Step2_Execution: Failure caused by error in tasks: ceph_base_ansible_workflow ceph_base_ansible_workflow [task_ex_id=b9eee634-5f66-47f9-bef2-57e46cd6d80f] -> Failure caused by error in tasks: ceph_install ceph_install [task_ex_id=567a8737-4abe-4b24-89e9-6116d4e8eff2] -> One or more actions had failed. ... Unexpected error while running command. Command: ansible-playbook /usr/share/ceph-ansible/site-docker.yml.sample --user tripleo-admin --become --become-user root --extra-vars {"ireallymeanit": "yes", "osd_pool_default_pgp_num": 16, "osd_pool_default_pg_num": 16} --inventory-file /tmp/ansible-mistral-actionV7vbpj/inventory.yaml --private-key /tmp/ansible-mistral-actionV7vbpj/ssh_private_key --skip*** package-install,with_pkg Exit code: 2
And this in /var/log/mistral/ceph-install-workflow.log:
2020-05-04 00:33:45,311 p=17037 u=mistral | TASK [ceph-osd : include_tasks common.yml] ************************************* 2020-05-04 00:33:45,311 p=17037 u=mistral | Monday 04 May 2020 00:33:45 +0200 (0:00:00.145) 0:01:48.161 ************ 2020-05-04 00:33:45,517 p=17037 u=mistral | included: /usr/share/ceph-ansible/roles/ceph-osd/tasks/common.yml for 172.23.0.239, 172.23.0.229, 172.23.0.234 2020-05-04 00:33:45,562 p=17037 u=mistral | TASK [ceph-osd : create bootstrap-osd and osd directories] ********************* 2020-05-04 00:33:45,562 p=17037 u=mistral | Monday 04 May 2020 00:33:45 +0200 (0:00:00.251) 0:01:48.413 ************ 2020-05-04 00:33:45,793 p=17037 u=mistral | ok: [172.23.0.239] => (item=/var/lib/ceph/bootstrap-osd/) 2020-05-04 00:33:45,845 p=17037 u=mistral | ok: [172.23.0.229] => (item=/var/lib/ceph/bootstrap-osd/) 2020-05-04 00:33:45,891 p=17037 u=mistral | ok: [172.23.0.234] => (item=/var/lib/ceph/bootstrap-osd/) 2020-05-04 00:33:45,995 p=17037 u=mistral | ok: [172.23.0.239] => (item=/var/lib/ceph/osd/) 2020-05-04 00:33:46,044 p=17037 u=mistral | ok: [172.23.0.229] => (item=/var/lib/ceph/osd/) 2020-05-04 00:33:46,086 p=17037 u=mistral | ok: [172.23.0.234] => (item=/var/lib/ceph/osd/) 2020-05-04 00:33:46,122 p=17037 u=mistral | TASK [ceph-osd : copy ceph key(s) if needed] *********************************** 2020-05-04 00:33:46,122 p=17037 u=mistral | Monday 04 May 2020 00:33:46 +0200 (0:00:00.559) 0:01:48.973 ************ 2020-05-04 00:33:46,294 p=17037 u=mistral | An exception occurred during task execution. To see the full traceback, use -vvv. The error was: If you are using a module and expect the file to exist on the remote, see the remote_src option 2020-05-04 00:33:46,295 p=17037 u=mistral | failed: [172.23.0.239] (item={u'name': u'/var/lib/ceph/bootstrap-osd/ceph.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "name": "/var/lib/ceph/bootstrap-osd/ceph.keyring"}, "msg": "Could not find or access '/tmp/file-mistral-actionn7KBrg/b5f90676-8d82-11ea-b301-566f3d480013//var/lib/ceph/bootstrap-osd/ceph.keyring' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"} 2020-05-04 00:33:46,296 p=17037 u=mistral | skipping: [172.23.0.239] => (item={u'name': u'/etc/ceph/ceph.client.admin.keyring', u'copy_key': False}) 2020-05-04 00:33:46,314 p=17037 u=mistral | An exception occurred during task execution. To see the full traceback, use -vvv. The error was: If you are using a module and expect the file to exist on the remote, see the remote_src option 2020-05-04 00:33:46,314 p=17037 u=mistral | failed: [172.23.0.229] (item={u'name': u'/var/lib/ceph/bootstrap-osd/ceph.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "name": "/var/lib/ceph/bootstrap-osd/ceph.keyring"}, "msg": "Could not find or access '/tmp/file-mistral-actionn7KBrg/b5f90676-8d82-11ea-b301-566f3d480013//var/lib/ceph/bootstrap-osd/ceph.keyring' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"} 2020-05-04 00:33:46,320 p=17037 u=mistral | skipping: [172.23.0.229] => (item={u'name': u'/etc/ceph/ceph.client.admin.keyring', u'copy_key': False}) 2020-05-04 00:33:46,353 p=17037 u=mistral | An exception occurred during task execution. To see the full traceback, use -vvv. The error was: If you are using a module and expect the file to exist on the remote, see the remote_src option 2020-05-04 00:33:46,353 p=17037 u=mistral | failed: [172.23.0.234] (item={u'name': u'/var/lib/ceph/bootstrap-osd/ceph.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "name": "/var/lib/ceph/bootstrap-osd/ceph.keyring"}, "msg": "Could not find or access '/tmp/file-mistral-actionn7KBrg/b5f90676-8d82-11ea-b301-566f3d480013//var/lib/ceph/bootstrap-osd/ceph.keyring' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"} 2020-05-04 00:33:46,361 p=17037 u=mistral | skipping: [172.23.0.234] => (item={u'name': u'/etc/ceph/ceph.client.admin.keyring', u'copy_key': False}) 2020-05-04 00:33:46,363 p=17037 u=mistral | NO MORE HOSTS LEFT ************************************************************* 2020-05-04 00:33:46,364 p=17037 u=mistral | PLAY RECAP ********************************************************************* 2020-05-04 00:33:46,364 p=17037 u=mistral | 172.23.0.229 : ok=66 changed=7 unreachable=0 failed=1 2020-05-04 00:33:46,364 p=17037 u=mistral | 172.23.0.234 : ok=66 changed=7 unreachable=0 failed=1 2020-05-04 00:33:46,364 p=17037 u=mistral | 172.23.0.239 : ok=69 changed=7 unreachable=0 failed=1 2020-05-04 00:33:46,364 p=17037 u=mistral | INSTALLER STATUS *************************************************************** 2020-05-04 00:33:46,366 p=17037 u=mistral | Install Ceph OSD : In Progress (0:01:32) 2020-05-04 00:33:46,366 p=17037 u=mistral | This phase can be restarted by running: roles/ceph-osd/tasks/main.yml 2020-05-04 00:33:46,366 p=17037 u=mistral | Monday 04 May 2020 00:33:46 +0200 (0:00:00.244) 0:01:49.217 ************ 2020-05-04 00:33:46,367 p=17037 u=mistral | ===============================================================================
Any hint about the reason of failure? What would be the exact command to run again he step that is expressed with the phrase: "This phase can be restarted by running: roles/ceph-osd/tasks/main.yml"?
The ansible code with problems seems
- name: copy ceph key(s) if needed copy: src: "{{ fetch_directory }}/{{ fsid }}/{{ item.name }}" dest: "{{ item.name }}" owner: "{{ ceph_uid if containerized_deployment else 'ceph' }}" group: "{{ ceph_uid if containerized_deployment else 'ceph' }}" mode: "{{ ceph_keyring_permissions }}" with_items: - { name: "/var/lib/ceph/bootstrap-osd/{{ cluster }}.keyring", copy_key: true } - { name: "/etc/ceph/{{ cluster }}.client.admin.keyring", copy_key: "{{ copy_admin_key }}" } when: - cephx - item.copy_key|bool
At thsis momento on the ceph nodes I have:
[root@cephstorage-0 ~]# ll /etc/ceph/ total 8 -rw-r--r--. 1 root root 920 May 4 00:33 ceph.conf -rw-r--r--. 1 root root 92 Feb 5 2019 rbdmap [root@cephstorage-0 ~]# ll /var/lib/ceph/bootstrap-osd/ total 0 [root@cephstorage-0 ~]#
Thanks in advance, Gianluca