race condition on subnet overlap check with multiple neutron

asked 2016-06-21 19:06:26 -0600

don gravatar image

So I have a heat template that creates a router, a network, a subnet. When I have a single server running neutron api, it works every time. When I have 3 servers running neutron api, load balanced, it fails nearly every time. The failure, shown below, is that the subnet overlaps with another on this network. But its checking against itself. E.g. there is a single subnet (the one we are adding).

Resources.Data Int Subnet2: Invalid Input For Operation: Requested Subnet With Cidr: 127.0.3.0/24 For Network: 3c445ce9-78c6-4489-B2c6-9d1bf814d33d Overlaps With Another Subnet

As you can see below, the CIDR it is complaining about is 127.0.3.0/24. Its already been created. The error is coming from db/ipam_backend_mixin.py, specifically _validate_subnet_cidr().

This method is called with 'thenetwork', 'thecidr'. 'thecidr' is already created on 'thenetwork', presumably in one of the other workers. But its honestly this subnet, its not a second subnet.

Does anybody have a suggestion for what is going wrong? If i stop the other 2 neutron api backends, the problem is gone. If i do this manually from the CLI, allowing time, it doesn't seem to show up either.

I have allow_overlapping_ips set to true (but that is not this case, its saying 'subnetA overlaps with subnetA on networkB').

$ os subnet list
+--------------------------------------+----------------------+--------------------------------------+-----------------+
| ID                                   | Name                 | Network                              | Subnet          |
+--------------------------------------+----------------------+--------------------------------------+-----------------+
| bef016c1-4104-4328-9656-10b17595fca3 | ext-subnet           | 3d624eca-75ae-480b-9ea3-776c9dac7da6 | 10.129.192.0/20 |
| 35b102e1-c748-4710-bb80-aa9145fd33b7 | 172.16.5.0/24        | ea53ed06-ef41-4e9a-97e4-f526ef8cd376 | 172.16.5.0/24   |
| 64688ccb-1246-4533-8e65-c0e78bde7ca8 | ddb-data_sub-subnet2 | 60553a87-e474-499b-bce8-939ee1842ec4 | 127.0.2.0/24    |
| 7860d0f9-79c0-4923-b6d8-618b7da304a7 | ddb-servce-subnet    | 20253cbc-d450-400e-9508-be80c124aad7 | 127.0.0.0/24    |
| a2607676-78ad-4bed-90ef-5ef8b591d6ca | ddb-data-int-subnet1 | bb11e0c0-45ba-4fff-b760-f09f41ee2605 | 172.16.3.0/24   |
| 141fcd08-d8d0-464d-b904-2b2b96b7aa35 | ddb-ctrl-subnet      | a08764d3-85e4-4be3-992a-6c453c98c1fb | 172.16.1.0/24   |
| e2553bc1-5d79-4c00-a91e-dae28a1f0e6d | ddb-data_int-subnet2 | 3c445ce9-78c6-4489-b2c6-9d1bf814d33d | 127.0.3.0/24    |
| 21dba339-4b44-42df-b32e-8e361552ce65 | ddb-vctrl-subnet     | ece00a54-059d-4e67-8fdd-788f52e47eb6 | 172.16.1.0/24   |
| 34c3bd1a-d907-41fc-ac0a-3de4da325b6a | ddb-data_sub-subnet1 | 4edb52c3-aba1-453d-9331-2149fbcdf83e | 127.0.1.0/24    |
+--------------------------------------+----------------------+--------------------------------------+-----------------+

d

edit retag flag offensive close merge delete

Comments

if I comment out the check, then nothing bad happens for me. But that is not the solution. There must be some missing lock, is no one else hitting this?

don gravatar imagedon ( 2016-06-23 11:41:16 -0600 )edit

As a workaround, have you tried to describe a dependency between the network and subnet resources in the heat template? Something like this:

nw_1:
  ...
  type: OS::Neutron::Net
subnet_1:
  ...
  type: OS::Neutron::Subnet
  depends_on: nw_1
Viktor Schlaffer gravatar imageViktor Schlaffer ( 2016-06-30 05:16:26 -0600 )edit