Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

Error during ceph nodes setup in copy ceph keys

I'm setting up a small test lab for HA Openstack environment based on Queens and CentOS 7 (to be the best similar with OSP 13, as my target). Director is CentOS 7.8 with ansible ansible-2.6.19-1.el7.ans.noarch and undercloud deployed.

Idea is to have 2 compute, 3 controllers, 3 ceph storage nodes

As the first step of overcloud deploy the correct 3 ceph nodes candidates for OSD role are powered on and begin to be configured, but at a certain point I have with this error

2020-05-03 22:31:07Z [AllNodesDeploySteps.CephStorageDeployment_Step1.0]: CREATE_COMPLETE  state changed
2020-05-03 22:31:07Z [AllNodesDeploySteps.CephStorageDeployment_Step1]: CREATE_COMPLETE  Stack CREATE completed successfully
2020-05-03 22:31:08Z [AllNodesDeploySteps.CephStorageDeployment_Step1]: CREATE_COMPLETE  state changed
2020-05-03 22:31:08Z [AllNodesDeploySteps.WorkflowTasks_Step2]: CREATE_IN_PROGRESS  state changed
2020-05-03 22:31:09Z [AllNodesDeploySteps.WorkflowTasks_Step2]: CREATE_COMPLETE  state changed
2020-05-03 22:31:09Z [AllNodesDeploySteps.WorkflowTasks_Step2_Execution]: CREATE_IN_PROGRESS  state changed
2020-05-03 22:33:49Z [AllNodesDeploySteps.WorkflowTasks_Step2_Execution]: CREATE_FAILED  resources.WorkflowTasks_Step2_Execution: Failure caused by error in tasks: ceph_base_ansible_workflow

  ceph_base_ansible_workflow [task_ex_id=b9eee634-5f66-47f9-bef2-57e46cd6d80f] -> Failure caused by error in tasks: ceph_install

  ceph_install [task_e
2020-05-03 22:33:49Z [AllNodesDeploySteps]: CREATE_FAILED  Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: Failure caused by error in tasks: ceph_base_ansible_workflow
...
overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::TripleO::WorkflowSteps
  physical_resource_id: 71e920fe-b781-4d41-b082-cf27c7bbdb4c
  status: CREATE_FAILED
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: Failure caused by error in tasks: ceph_base_ansible_workflow

      ceph_base_ansible_workflow [task_ex_id=b9eee634-5f66-47f9-bef2-57e46cd6d80f] -> Failure caused by error in tasks: ceph_install

      ceph_install [task_ex_id=567a8737-4abe-4b24-89e9-6116d4e8eff2] -> One or more actions had failed.
...
     Unexpected error while running command.
    Command: ansible-playbook /usr/share/ceph-ansible/site-docker.yml.sample --user tripleo-admin --become --become-user root --extra-vars {"ireallymeanit": "yes", "osd_pool_default_pgp_num": 16, "osd_pool_default_pg_num": 16} --inventory-file /tmp/ansible-mistral-actionV7vbpj/inventory.yaml --private-key /tmp/ansible-mistral-actionV7vbpj/ssh_private_key --skip*** package-install,with_pkg
    Exit code: 2

And this in /var/log/mistral/ceph-install-workflow.log:

2020-05-04 00:33:45,311 p=17037 u=mistral |  TASK [ceph-osd : include_tasks common.yml] *************************************
2020-05-04 00:33:45,311 p=17037 u=mistral |  Monday 04 May 2020  00:33:45 +0200 (0:00:00.145)       0:01:48.161 ************ 
2020-05-04 00:33:45,517 p=17037 u=mistral |  included: /usr/share/ceph-ansible/roles/ceph-osd/tasks/common.yml for 172.23.0.239, 172.23.0.229, 
172.23.0.234
2020-05-04 00:33:45,562 p=17037 u=mistral |  TASK [ceph-osd : create bootstrap-osd and osd directories] *********************
2020-05-04 00:33:45,562 p=17037 u=mistral |  Monday 04 May 2020  00:33:45 +0200 (0:00:00.251)       0:01:48.413 ************ 
2020-05-04 00:33:45,793 p=17037 u=mistral |  ok: [172.23.0.239] => (item=/var/lib/ceph/bootstrap-osd/)
2020-05-04 00:33:45,845 p=17037 u=mistral |  ok: [172.23.0.229] => (item=/var/lib/ceph/bootstrap-osd/)
2020-05-04 00:33:45,891 p=17037 u=mistral |  ok: [172.23.0.234] => (item=/var/lib/ceph/bootstrap-osd/)
2020-05-04 00:33:45,995 p=17037 u=mistral |  ok: [172.23.0.239] => (item=/var/lib/ceph/osd/)
2020-05-04 00:33:46,044 p=17037 u=mistral |  ok: [172.23.0.229] => (item=/var/lib/ceph/osd/)
2020-05-04 00:33:46,086 p=17037 u=mistral |  ok: [172.23.0.234] => (item=/var/lib/ceph/osd/)
2020-05-04 00:33:46,122 p=17037 u=mistral |  TASK [ceph-osd : copy ceph key(s) if needed] ***********************************
2020-05-04 00:33:46,122 p=17037 u=mistral |  Monday 04 May 2020  00:33:46 +0200 (0:00:00.559)       0:01:48.973 ************ 
2020-05-04 00:33:46,294 p=17037 u=mistral |  An exception occurred during task execution. To see the full traceback, use -vvv. The error was: If you are using a module and expect the file to exist on the remote, see the remote_src option
2020-05-04 00:33:46,295 p=17037 u=mistral |  failed: [172.23.0.239] (item={u'name': u'/var/lib/ceph/bootstrap-osd/ceph.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "name": "/var/lib/ceph/bootstrap-osd/ceph.keyring"}, "msg": "Could not find or access '/tmp/file-mistral-actionn7KBrg/b5f90676-8d82-11ea-b301-566f3d480013//var/lib/ceph/bootstrap-osd/ceph.keyring' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
2020-05-04 00:33:46,296 p=17037 u=mistral |  skipping: [172.23.0.239] => (item={u'name': u'/etc/ceph/ceph.client.admin.keyring', u'copy_key': False}) 
2020-05-04 00:33:46,314 p=17037 u=mistral |  An exception occurred during task execution. To see the full traceback, use -vvv. The error was: If you are using a module and expect the file to exist on the remote, see the remote_src option
2020-05-04 00:33:46,314 p=17037 u=mistral |  failed: [172.23.0.229] (item={u'name': u'/var/lib/ceph/bootstrap-osd/ceph.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "name": "/var/lib/ceph/bootstrap-osd/ceph.keyring"}, "msg": "Could not find or access '/tmp/file-mistral-actionn7KBrg/b5f90676-8d82-11ea-b301-566f3d480013//var/lib/ceph/bootstrap-osd/ceph.keyring' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
2020-05-04 00:33:46,320 p=17037 u=mistral |  skipping: [172.23.0.229] => (item={u'name': u'/etc/ceph/ceph.client.admin.keyring', u'copy_key': False}) 
2020-05-04 00:33:46,353 p=17037 u=mistral |  An exception occurred during task execution. To see the full traceback, use -vvv. The error was: If you are using a module and expect the file to exist on the remote, see the remote_src option
2020-05-04 00:33:46,353 p=17037 u=mistral |  failed: [172.23.0.234] (item={u'name': u'/var/lib/ceph/bootstrap-osd/ceph.keyring', u'copy_key': True}) => {"changed": false, "item": {"copy_key": true, "name": "/var/lib/ceph/bootstrap-osd/ceph.keyring"}, "msg": "Could not find or access '/tmp/file-mistral-actionn7KBrg/b5f90676-8d82-11ea-b301-566f3d480013//var/lib/ceph/bootstrap-osd/ceph.keyring' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
2020-05-04 00:33:46,361 p=17037 u=mistral |  skipping: [172.23.0.234] => (item={u'name': u'/etc/ceph/ceph.client.admin.keyring', u'copy_key': False}) 
2020-05-04 00:33:46,363 p=17037 u=mistral |  NO MORE HOSTS LEFT *************************************************************
2020-05-04 00:33:46,364 p=17037 u=mistral |  PLAY RECAP *********************************************************************
2020-05-04 00:33:46,364 p=17037 u=mistral |  172.23.0.229               : ok=66   changed=7    unreachable=0    failed=1   
2020-05-04 00:33:46,364 p=17037 u=mistral |  172.23.0.234               : ok=66   changed=7    unreachable=0    failed=1   
2020-05-04 00:33:46,364 p=17037 u=mistral |  172.23.0.239               : ok=69   changed=7    unreachable=0    failed=1   
2020-05-04 00:33:46,364 p=17037 u=mistral |  INSTALLER STATUS ***************************************************************
2020-05-04 00:33:46,366 p=17037 u=mistral |  Install Ceph OSD            : In Progress (0:01:32)
2020-05-04 00:33:46,366 p=17037 u=mistral |     This phase can be restarted by running: roles/ceph-osd/tasks/main.yml
2020-05-04 00:33:46,366 p=17037 u=mistral |  Monday 04 May 2020  00:33:46 +0200 (0:00:00.244)       0:01:49.217 ************ 
2020-05-04 00:33:46,367 p=17037 u=mistral |  =============================================================================== 

Any hint about the reason of failure? What would be the exact command to run again he step that is expressed with the phrase: "This phase can be restarted by running: roles/ceph-osd/tasks/main.yml"?

The ansible code with problems seems

- name: copy ceph key(s) if needed
  copy:
    src: "{{ fetch_directory }}/{{ fsid }}/{{ item.name }}"
    dest: "{{ item.name }}"
    owner: "{{ ceph_uid if containerized_deployment else 'ceph' }}"
    group: "{{ ceph_uid if containerized_deployment else 'ceph' }}"
    mode: "{{ ceph_keyring_permissions }}"
  with_items:
    - { name: "/var/lib/ceph/bootstrap-osd/{{ cluster }}.keyring", copy_key: true }
    - { name: "/etc/ceph/{{ cluster }}.client.admin.keyring", copy_key: "{{ copy_admin_key }}" }
  when:
    - cephx
    - item.copy_key|bool

At thsis momento on the ceph nodes I have:

[root@cephstorage-0 ~]# ll /etc/ceph/
total 8
-rw-r--r--. 1 root root 920 May  4 00:33 ceph.conf
-rw-r--r--. 1 root root  92 Feb  5  2019 rbdmap
[root@cephstorage-0 ~]# ll /var/lib/ceph/bootstrap-osd/
total 0
[root@cephstorage-0 ~]# 

Thanks in advance, Gianluca