Ask Your Question

Revision history [back]

click to hide/show revision 1
initial version

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ova ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow outside the range 400~1000, this value is definitely wrong. I thought the HA vlan number should come from the default vlan range 'default:400:1000', but I think something's' wrong, or some configuration is wrong.

Any further hints or comments?

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ova ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but I think something's' wrong, or some configuration is wrong.

Any further hints or comments?

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ova ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but I think something's' wrong, or some configuration is wrong.

Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.

Any further hints or comments?

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ova ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I think something's' wrong, or some configuration is wrong.

don't know there the '1' came from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default. default.

Any further hints or comments?

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ova ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I don't know there the '1' came from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.

Any further hints or comments?

UPDATE 2

After re-setting the controller node and two network nodes, and after struggling with source codes, I found that the br-int and br-ens2f0 bridges has correct flow tables, as indicated by the codes:

[root@network2 agent]# ovs-ofctl dump-flows br-ens2f0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1344.498s, table=0, n_packets=0, n_bytes=0, idle_age=1344, priority=1 actions=NORMAL
 cookie=0x0, duration=651.521s, table=0, n_packets=330, n_bytes=16832, idle_age=0, priority=4,in_port=9,dl_vlan=2 actions=mod_vlan_vid:401,NORMAL
 cookie=0x0, duration=1343.975s, table=0, n_packets=14, n_bytes=1164, idle_age=651, priority=2,in_port=9 actions=drop
[root@network2 agent]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1363.640s, table=0, n_packets=671, n_bytes=34602, idle_age=1, priority=1 actions=NORMAL
 cookie=0x0, duration=670.231s, table=0, n_packets=0, n_bytes=0, idle_age=670, priority=3,in_port=18,dl_vlan=401 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=1362.813s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=18 actions=drop
 cookie=0x0, duration=1362.032s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=19 actions=drop
 cookie=0x0, duration=1363.576s, table=23, n_packets=0, n_bytes=0, idle_age=1363, priority=0 actions=drop

As given by the flow tables, the traffic from br-ens2f0 with VLAN 401 is converted to VLAN 2, and traffic from br-int with VLAN 2 is converted to 401. That means, local vlan port 2 (which I first thought is the wrong value assigned) is converted to VLAN 401 before being injected to physical network, and vice versa.

However, as indicated by the n_packets values, that only works for one direction, and each HA router does not receive any keepalive messages from each other, and that makes each HA router MASTER at the same time.

And still, I'm seeing following log messages at /var/log/openvswitch/ovs-vswitchd.log.

2014-12-29T08:54:07.812Z|00101|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00102|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-d75e6f07-5e device failed: No such device

I think I have come very close to the final answer, but still needs help from others. Any comments and hints are welcomed.

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ova ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I don't know there the '1' came from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.

Any further hints or comments?

UPDATE 2

After re-setting the controller node and two network nodes, and after struggling with source codes, I found that the br-int and br-ens2f0 bridges has correct flow tables, as indicated by the codes:

[root@network2 agent]# ovs-ofctl dump-flows br-ens2f0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1344.498s, table=0, n_packets=0, n_bytes=0, idle_age=1344, priority=1 actions=NORMAL
 cookie=0x0, duration=651.521s, table=0, n_packets=330, n_bytes=16832, idle_age=0, priority=4,in_port=9,dl_vlan=2 actions=mod_vlan_vid:401,NORMAL
 cookie=0x0, duration=1343.975s, table=0, n_packets=14, n_bytes=1164, idle_age=651, priority=2,in_port=9 actions=drop
[root@network2 agent]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1363.640s, table=0, n_packets=671, n_bytes=34602, idle_age=1, priority=1 actions=NORMAL
 cookie=0x0, duration=670.231s, table=0, n_packets=0, n_bytes=0, idle_age=670, priority=3,in_port=18,dl_vlan=401 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=1362.813s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=18 actions=drop
 cookie=0x0, duration=1362.032s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=19 actions=drop
 cookie=0x0, duration=1363.576s, table=23, n_packets=0, n_bytes=0, idle_age=1363, priority=0 actions=drop

As given by the flow tables, the traffic from br-ens2f0 with VLAN 401 is converted to VLAN 2, and traffic from br-int with VLAN 2 is converted to 401. That means, local vlan port 2 (which I first thought is the wrong value assigned) is converted to VLAN 401 before being injected to physical network, and vice versa.

However, as indicated by the n_packets values, that only works for one direction, and each HA router does not receive any keepalive messages from each other, and that makes each HA router MASTER at the same time.

And still, I'm seeing following log messages at /var/log/openvswitch/ovs-vswitchd.log.

2014-12-29T08:54:07.812Z|00101|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00102|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00103|netdev_linux|WARN|ha-d75e6f07-5e: removing policing failed: No such device

I think I have come very close to the final answer, but still needs help from others. Any comments and hints are welcomed.

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ova ovs ports are not found is finally resolved. resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I don't know there the '1' came from. from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.default.

Any further hints or comments?

UPDATE 2

After re-setting the controller node and two network nodes, and after struggling with source codes, I found that the br-int and br-ens2f0 bridges has correct flow tables, as indicated by the codes:tables:

[root@network2 agent]# ovs-ofctl dump-flows br-ens2f0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1344.498s, table=0, n_packets=0, n_bytes=0, idle_age=1344, priority=1 actions=NORMAL
 cookie=0x0, duration=651.521s, table=0, n_packets=330, n_bytes=16832, idle_age=0, priority=4,in_port=9,dl_vlan=2 actions=mod_vlan_vid:401,NORMAL
 cookie=0x0, duration=1343.975s, table=0, n_packets=14, n_bytes=1164, idle_age=651, priority=2,in_port=9 actions=drop
[root@network2 agent]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1363.640s, table=0, n_packets=671, n_bytes=34602, idle_age=1, priority=1 actions=NORMAL
 cookie=0x0, duration=670.231s, table=0, n_packets=0, n_bytes=0, idle_age=670, priority=3,in_port=18,dl_vlan=401 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=1362.813s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=18 actions=drop
 cookie=0x0, duration=1362.032s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=19 actions=drop
 cookie=0x0, duration=1363.576s, table=23, n_packets=0, n_bytes=0, idle_age=1363, priority=0 actions=drop

As given by the flow tables, the traffic from br-ens2f0 with VLAN 401 is converted to VLAN 2, and traffic from br-int with VLAN 2 is converted to 401. That means, local vlan port 2 (which I first thought is the wrong value assigned) is converted to VLAN 401 before being injected to physical network, and vice versa.

However, as indicated by the n_packets values, that only works for one direction, and each HA router does not receive any keepalive messages from each other, and that makes each HA router MASTER at the same time.

And still, I'm seeing following log messages at /var/log/openvswitch/ovs-vswitchd.log.

2014-12-29T08:54:07.812Z|00101|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00102|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00103|netdev_linux|WARN|ha-d75e6f07-5e: removing policing failed: No such device

I think I have come very close to the final answer, but still needs help from others. Any comments and hints are welcomed.

UPDATE 3

OK. By enabling veth support in /etc/neutron/l3_agent.ini and restarting the l3 and openvswitch agent, the logs what I've shown in previously (removing policing failed:No such device) has totally been amortized.

But still, the communication between the two HA routers are not engaged. Still open for comments and hints.

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ovs ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I don't know there the '1' came from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.

Any further hints or comments?

UPDATE 2

After re-setting the controller node and two network nodes, and after struggling with source codes, I found that the br-int and br-ens2f0 bridges has correct flow tables:

[root@network2 agent]# ovs-ofctl dump-flows br-ens2f0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1344.498s, table=0, n_packets=0, n_bytes=0, idle_age=1344, priority=1 actions=NORMAL
 cookie=0x0, duration=651.521s, table=0, n_packets=330, n_bytes=16832, idle_age=0, priority=4,in_port=9,dl_vlan=2 actions=mod_vlan_vid:401,NORMAL
 cookie=0x0, duration=1343.975s, table=0, n_packets=14, n_bytes=1164, idle_age=651, priority=2,in_port=9 actions=drop
[root@network2 agent]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1363.640s, table=0, n_packets=671, n_bytes=34602, idle_age=1, priority=1 actions=NORMAL
 cookie=0x0, duration=670.231s, table=0, n_packets=0, n_bytes=0, idle_age=670, priority=3,in_port=18,dl_vlan=401 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=1362.813s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=18 actions=drop
 cookie=0x0, duration=1362.032s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=19 actions=drop
 cookie=0x0, duration=1363.576s, table=23, n_packets=0, n_bytes=0, idle_age=1363, priority=0 actions=drop

As given by the flow tables, the traffic from br-ens2f0 with VLAN 401 is converted to VLAN 2, and traffic from br-int with VLAN 2 is converted to 401. That means, local vlan port 2 (which I first thought is the wrong value assigned) is converted to VLAN 401 before being injected to physical network, and vice versa.

However, as indicated by the n_packets values, that only works for one direction, and each HA router does not receive any keepalive messages from each other, and that makes each HA router MASTER at the same time.

And still, I'm seeing following log messages at /var/log/openvswitch/ovs-vswitchd.log.

2014-12-29T08:54:07.812Z|00101|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00102|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00103|netdev_linux|WARN|ha-d75e6f07-5e: removing policing failed: No such device

I think I have come very close to the final answer, but still needs help from others. Any comments and hints are welcomed.

UPDATE 3

OK. By enabling veth support in /etc/neutron/l3_agent.ini and restarting the l3 and openvswitch agent, the logs what I've shown in previously (removing policing failed:No such device) has totally been amortized.

One notable difference with previous setting is that the ovs port name "ha-xxxx" is no longer used. Instead, I can see following logs:

Dec 29 19:45:42 network2 kernel: device tapf0afab77-ea entered promiscuous mode

Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_UP): ha-f0afab77-ea: link is not ready
Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ha-f0afab77-ea: link becomes ready
Dec 29 19:45:43 network2 Keepalived[9432]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 19:45:43 network2 Keepalived[9433]: Starting VRRP child process, pid=9434
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink reflector
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink command channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering gratuitous ARP shared channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Opening file '/var/lib/neutron/ha_confs/ed04d4e6-5f00-425d-b856-0cec3ab69ae8/keepalived.conf'.
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Configuration is using : 65206 Bytes
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Using LinkWatch kernel netlink reflector...
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 19:45:44 network2 avahi-daemon[788]: Registering new address record for fe80::e8e5:6dff:fea5:e912 on tapf0afab77-ea.*.
Dec 29 19:45:44 network2 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port tapf0afab77-ea tag=3
Dec 29 19:45:45 network2 ntpd[899]: Listen normally on 10 tapf0afab77-ea fe80::e8e5:6dff:fea5:e912 UDP 123
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 19:45:52 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering MASTER STATE

It means, I think, that a tap device is created and virtual link ha-xxx is created on top of it. (Of course, I'm not sure this is correct behavior.) But still, the communication between the two HA routers are not engaged.

Still open for comments and hints.

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ovs ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I don't know there the '1' came from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.

Any further hints or comments?

UPDATE 2

After re-setting the controller node and two network nodes, and after struggling with source codes, I found that the br-int and br-ens2f0 bridges has correct flow tables:

[root@network2 agent]# ovs-ofctl dump-flows br-ens2f0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1344.498s, table=0, n_packets=0, n_bytes=0, idle_age=1344, priority=1 actions=NORMAL
 cookie=0x0, duration=651.521s, table=0, n_packets=330, n_bytes=16832, idle_age=0, priority=4,in_port=9,dl_vlan=2 actions=mod_vlan_vid:401,NORMAL
 cookie=0x0, duration=1343.975s, table=0, n_packets=14, n_bytes=1164, idle_age=651, priority=2,in_port=9 actions=drop
[root@network2 agent]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1363.640s, table=0, n_packets=671, n_bytes=34602, idle_age=1, priority=1 actions=NORMAL
 cookie=0x0, duration=670.231s, table=0, n_packets=0, n_bytes=0, idle_age=670, priority=3,in_port=18,dl_vlan=401 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=1362.813s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=18 actions=drop
 cookie=0x0, duration=1362.032s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=19 actions=drop
 cookie=0x0, duration=1363.576s, table=23, n_packets=0, n_bytes=0, idle_age=1363, priority=0 actions=drop

As given by the flow tables, the traffic from br-ens2f0 with VLAN 401 is converted to VLAN 2, and traffic from br-int with VLAN 2 is converted to 401. That means, local vlan port 2 (which I first thought is the wrong value assigned) is converted to VLAN 401 before being injected to physical network, and vice versa.

However, as indicated by the n_packets values, that only works for one direction, and each HA router does not receive any keepalive messages from each other, and that makes each HA router MASTER at the same time.

And still, I'm seeing following log messages at /var/log/openvswitch/ovs-vswitchd.log.

2014-12-29T08:54:07.812Z|00101|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00102|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00103|netdev_linux|WARN|ha-d75e6f07-5e: removing policing failed: No such device

Further, the ovs-ofctl show br-int indicates the ha-XXX devices is down:

[root@network1 agent]# ovs-ofctl show br-int

OFPT_FEATURES_REPLY (xid=0x2): dpid:0000d67d91611247
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 29(int-br-ens2f0): addr:e6:97:65:12:a8:b2
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 30(int-br-ex): addr:56:04:61:d4:01:3a
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 32(ha-365b05db-b1): addr:56:04:61:d4:01:3a
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-int): addr:d6:7d:91:61:12:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

I think I have come very close to the final answer, but still needs help from others. Any comments and hints are welcomed.

UPDATE 3

OK. By enabling veth support in /etc/neutron/l3_agent.ini and restarting the l3 and openvswitch agent, the logs what I've shown in previously (removing policing failed:No such device) has totally been amortized. One notable difference with previous setting is that the ovs port name "ha-xxxx" is no longer used. Instead, I can see following logs:

Dec 29 19:45:42 network2 kernel: device tapf0afab77-ea entered promiscuous mode

Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_UP): ha-f0afab77-ea: link is not ready
Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ha-f0afab77-ea: link becomes ready
Dec 29 19:45:43 network2 Keepalived[9432]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 19:45:43 network2 Keepalived[9433]: Starting VRRP child process, pid=9434
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink reflector
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink command channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering gratuitous ARP shared channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Opening file '/var/lib/neutron/ha_confs/ed04d4e6-5f00-425d-b856-0cec3ab69ae8/keepalived.conf'.
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Configuration is using : 65206 Bytes
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Using LinkWatch kernel netlink reflector...
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 19:45:44 network2 avahi-daemon[788]: Registering new address record for fe80::e8e5:6dff:fea5:e912 on tapf0afab77-ea.*.
Dec 29 19:45:44 network2 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port tapf0afab77-ea tag=3
Dec 29 19:45:45 network2 ntpd[899]: Listen normally on 10 tapf0afab77-ea fe80::e8e5:6dff:fea5:e912 UDP 123
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 19:45:52 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering MASTER STATE

It means, I think, that a tap device is created and virtual link ha-xxx is created on top of it. (Of course, I'm not sure this is correct behavior.) But still, the communication between the two HA routers are not engaged.

Still open for comments and hints.

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ovs ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I don't know there the '1' came from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.

Any further hints or comments?

UPDATE 2

After re-setting the controller node and two network nodes, and after struggling with source codes, I found that the br-int and br-ens2f0 bridges has correct flow tables:

[root@network2 agent]# ovs-ofctl dump-flows br-ens2f0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1344.498s, table=0, n_packets=0, n_bytes=0, idle_age=1344, priority=1 actions=NORMAL
 cookie=0x0, duration=651.521s, table=0, n_packets=330, n_bytes=16832, idle_age=0, priority=4,in_port=9,dl_vlan=2 actions=mod_vlan_vid:401,NORMAL
 cookie=0x0, duration=1343.975s, table=0, n_packets=14, n_bytes=1164, idle_age=651, priority=2,in_port=9 actions=drop
[root@network2 agent]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1363.640s, table=0, n_packets=671, n_bytes=34602, idle_age=1, priority=1 actions=NORMAL
 cookie=0x0, duration=670.231s, table=0, n_packets=0, n_bytes=0, idle_age=670, priority=3,in_port=18,dl_vlan=401 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=1362.813s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=18 actions=drop
 cookie=0x0, duration=1362.032s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=19 actions=drop
 cookie=0x0, duration=1363.576s, table=23, n_packets=0, n_bytes=0, idle_age=1363, priority=0 actions=drop

As given by the flow tables, the traffic from br-ens2f0 with VLAN 401 is converted to VLAN 2, and traffic from br-int with VLAN 2 is converted to 401. That means, local vlan port 2 (which I first thought is the wrong value assigned) is converted to VLAN 401 before being injected to physical network, and vice versa.

However, as indicated by the n_packets values, that only works for one direction, and each HA router does not receive any keepalive messages from each other, and that makes each HA router MASTER at the same time.

And still, I'm seeing following log messages at /var/log/openvswitch/ovs-vswitchd.log.

2014-12-29T08:54:07.812Z|00101|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00102|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00103|netdev_linux|WARN|ha-d75e6f07-5e: removing policing failed: No such device

Further, the ovs-ofctl show br-int indicates the ha-XXX devices is down:

[root@network1 agent]# ovs-ofctl show br-int

OFPT_FEATURES_REPLY (xid=0x2): dpid:0000d67d91611247
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 29(int-br-ens2f0): addr:e6:97:65:12:a8:b2
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 30(int-br-ex): addr:56:04:61:d4:01:3a
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 32(ha-365b05db-b1): addr:56:04:61:d4:01:3a
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-int): addr:d6:7d:91:61:12:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

I think I have come very close to the final answer, but still needs help from others. Any comments and hints are welcomed.

UPDATE 3

OK. By enabling veth support in /etc/neutron/l3_agent.ini and restarting the l3 and openvswitch agent, the logs what I've shown in previously (removing policing failed:No such device) has totally been amortized. One notable difference with previous setting is that the ovs port name "ha-xxxx" is no longer used. Instead, I can see following logs:

Dec 29 19:45:42 network2 kernel: device tapf0afab77-ea entered promiscuous mode

Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_UP): ha-f0afab77-ea: link is not ready
Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ha-f0afab77-ea: link becomes ready
Dec 29 19:45:43 network2 Keepalived[9432]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 19:45:43 network2 Keepalived[9433]: Starting VRRP child process, pid=9434
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink reflector
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink command channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering gratuitous ARP shared channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Opening file '/var/lib/neutron/ha_confs/ed04d4e6-5f00-425d-b856-0cec3ab69ae8/keepalived.conf'.
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Configuration is using : 65206 Bytes
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Using LinkWatch kernel netlink reflector...
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 19:45:44 network2 avahi-daemon[788]: Registering new address record for fe80::e8e5:6dff:fea5:e912 on tapf0afab77-ea.*.
Dec 29 19:45:44 network2 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port tapf0afab77-ea tag=3
Dec 29 19:45:45 network2 ntpd[899]: Listen normally on 10 tapf0afab77-ea fe80::e8e5:6dff:fea5:e912 UDP 123
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 19:45:52 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering MASTER STATE

It means, I think, that a tap device is created and virtual link ha-xxx is created on top of it. (Of course, I'm not sure this is correct behavior.) understanding.) But still, the communication between the two HA routers are not engaged.

Still open for comments and hints.

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ovs ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I don't know there the '1' came from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.

Any further hints or comments?

UPDATE 2

After re-setting the controller node and two network nodes, and after struggling with source codes, I found that the br-int and br-ens2f0 bridges has correct flow tables:

[root@network2 agent]# ovs-ofctl dump-flows br-ens2f0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1344.498s, table=0, n_packets=0, n_bytes=0, idle_age=1344, priority=1 actions=NORMAL
 cookie=0x0, duration=651.521s, table=0, n_packets=330, n_bytes=16832, idle_age=0, priority=4,in_port=9,dl_vlan=2 actions=mod_vlan_vid:401,NORMAL
 cookie=0x0, duration=1343.975s, table=0, n_packets=14, n_bytes=1164, idle_age=651, priority=2,in_port=9 actions=drop
[root@network2 agent]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1363.640s, table=0, n_packets=671, n_bytes=34602, idle_age=1, priority=1 actions=NORMAL
 cookie=0x0, duration=670.231s, table=0, n_packets=0, n_bytes=0, idle_age=670, priority=3,in_port=18,dl_vlan=401 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=1362.813s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=18 actions=drop
 cookie=0x0, duration=1362.032s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=19 actions=drop
 cookie=0x0, duration=1363.576s, table=23, n_packets=0, n_bytes=0, idle_age=1363, priority=0 actions=drop

As given by the flow tables, the traffic from br-ens2f0 with VLAN 401 is converted to VLAN 2, and traffic from br-int with VLAN 2 is converted to 401. That means, local vlan port 2 (which I first thought is the wrong value assigned) is converted to VLAN 401 before being injected to physical network, and vice versa.

However, as indicated by the n_packets values, that only works for one direction, and each HA router does not receive any keepalive messages from each other, and that makes each HA router MASTER at the same time.

And still, I'm seeing following log messages at /var/log/openvswitch/ovs-vswitchd.log.

2014-12-29T08:54:07.812Z|00101|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00102|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00103|netdev_linux|WARN|ha-d75e6f07-5e: removing policing failed: No such device

Further, the ovs-ofctl show br-int indicates the ha-XXX devices is down:

[root@network1 agent]# ovs-ofctl show br-int

OFPT_FEATURES_REPLY (xid=0x2): dpid:0000d67d91611247
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 29(int-br-ens2f0): addr:e6:97:65:12:a8:b2
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 30(int-br-ex): addr:56:04:61:d4:01:3a
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 32(ha-365b05db-b1): addr:56:04:61:d4:01:3a
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-int): addr:d6:7d:91:61:12:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

I think I have come very close to the final answer, but still needs help from others. Any comments and hints are welcomed.

UPDATE 3

OK. By enabling veth support in /etc/neutron/l3_agent.ini and restarting the l3 and openvswitch agent, the logs what I've shown in previously (removing policing failed:No such device) has totally been amortized. One notable difference with previous setting is that the ovs port name "ha-xxxx" is no longer used. Instead, I can see following logs:

Dec 29 19:45:42 network2 kernel: device tapf0afab77-ea entered promiscuous mode

Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_UP): ha-f0afab77-ea: link is not ready
Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ha-f0afab77-ea: link becomes ready
Dec 29 19:45:43 network2 Keepalived[9432]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 19:45:43 network2 Keepalived[9433]: Starting VRRP child process, pid=9434
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink reflector
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink command channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering gratuitous ARP shared channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Opening file '/var/lib/neutron/ha_confs/ed04d4e6-5f00-425d-b856-0cec3ab69ae8/keepalived.conf'.
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Configuration is using : 65206 Bytes
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Using LinkWatch kernel netlink reflector...
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 19:45:44 network2 avahi-daemon[788]: Registering new address record for fe80::e8e5:6dff:fea5:e912 on tapf0afab77-ea.*.
Dec 29 19:45:44 network2 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port tapf0afab77-ea tag=3
Dec 29 19:45:45 network2 ntpd[899]: Listen normally on 10 tapf0afab77-ea fe80::e8e5:6dff:fea5:e912 UDP 123
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 19:45:52 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering MASTER STATE

It means, I think, that a tap device is created and virtual link ha-xxx is created on top of it. (Of course, I'm not sure this is correct understanding.) But still, the communication between the two HA routers are not engaged.

Still open for comments and hints.

UPDATE 4

All right. Finally, I solved the problem. The communication between two keepalived processes were blocked because of ens2f0, and br-ens2f0. Originally, it was like below:

[root@network2 agent]# ovs-ofctl show br-ens2f0
OFPT_FEATURES_REPLY (xid=0x2): dpid:000090e2ba1f1ec4
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 1(ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     current:    COPPER AUTO_NEG
     advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     supported:  10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     speed: 0 Mbps now, 1000 Mbps max
 13(phy-br-ens2f0): addr:3e:77:de:63:a0:95
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

As you can see, the br-ens2f0 and ens2f0 interfaces are DOWN. So I did...

[root@network2 agent]# ip link set br-ens2f0 up [root@network2 agent]# ip link set ens2f0 up

Then... finally.... the second keepalived turns into BACKUP mode.

Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Received higher prio advert
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Group(VG_1) Syncing instances to BACKUP state

What a long story. Anyway, problem solved.

Thank you fellow OpenStack guys. :-)

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ovs ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I don't know there the '1' came from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.

Any further hints or comments?

UPDATE 2

After re-setting the controller node and two network nodes, and after struggling with source codes, I found that the br-int and br-ens2f0 bridges has correct flow tables:

[root@network2 agent]# ovs-ofctl dump-flows br-ens2f0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1344.498s, table=0, n_packets=0, n_bytes=0, idle_age=1344, priority=1 actions=NORMAL
 cookie=0x0, duration=651.521s, table=0, n_packets=330, n_bytes=16832, idle_age=0, priority=4,in_port=9,dl_vlan=2 actions=mod_vlan_vid:401,NORMAL
 cookie=0x0, duration=1343.975s, table=0, n_packets=14, n_bytes=1164, idle_age=651, priority=2,in_port=9 actions=drop
[root@network2 agent]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1363.640s, table=0, n_packets=671, n_bytes=34602, idle_age=1, priority=1 actions=NORMAL
 cookie=0x0, duration=670.231s, table=0, n_packets=0, n_bytes=0, idle_age=670, priority=3,in_port=18,dl_vlan=401 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=1362.813s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=18 actions=drop
 cookie=0x0, duration=1362.032s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=19 actions=drop
 cookie=0x0, duration=1363.576s, table=23, n_packets=0, n_bytes=0, idle_age=1363, priority=0 actions=drop

As given by the flow tables, the traffic from br-ens2f0 with VLAN 401 is converted to VLAN 2, and traffic from br-int with VLAN 2 is converted to 401. That means, local vlan port 2 (which I first thought is the wrong value assigned) is converted to VLAN 401 before being injected to physical network, and vice versa.

However, as indicated by the n_packets values, that only works for one direction, and each HA router does not receive any keepalive messages from each other, and that makes each HA router MASTER at the same time.

And still, I'm seeing following log messages at /var/log/openvswitch/ovs-vswitchd.log.

2014-12-29T08:54:07.812Z|00101|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00102|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00103|netdev_linux|WARN|ha-d75e6f07-5e: removing policing failed: No such device

Further, the ovs-ofctl show br-int indicates the ha-XXX devices is down:

[root@network1 agent]# ovs-ofctl show br-int

OFPT_FEATURES_REPLY (xid=0x2): dpid:0000d67d91611247
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 29(int-br-ens2f0): addr:e6:97:65:12:a8:b2
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 30(int-br-ex): addr:56:04:61:d4:01:3a
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 32(ha-365b05db-b1): addr:56:04:61:d4:01:3a
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-int): addr:d6:7d:91:61:12:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

I think I have come very close to the final answer, but still needs help from others. Any comments and hints are welcomed.

UPDATE 3

OK. By enabling veth support in /etc/neutron/l3_agent.ini and restarting the l3 and openvswitch agent, the logs what I've shown in previously (removing policing failed:No such device) has totally been amortized. One notable difference with previous setting is that the ovs port name "ha-xxxx" is no longer used. Instead, I can see following logs:

Dec 29 19:45:42 network2 kernel: device tapf0afab77-ea entered promiscuous mode

Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_UP): ha-f0afab77-ea: link is not ready
Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ha-f0afab77-ea: link becomes ready
Dec 29 19:45:43 network2 Keepalived[9432]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 19:45:43 network2 Keepalived[9433]: Starting VRRP child process, pid=9434
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink reflector
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink command channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering gratuitous ARP shared channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Opening file '/var/lib/neutron/ha_confs/ed04d4e6-5f00-425d-b856-0cec3ab69ae8/keepalived.conf'.
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Configuration is using : 65206 Bytes
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Using LinkWatch kernel netlink reflector...
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 19:45:44 network2 avahi-daemon[788]: Registering new address record for fe80::e8e5:6dff:fea5:e912 on tapf0afab77-ea.*.
Dec 29 19:45:44 network2 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port tapf0afab77-ea tag=3
Dec 29 19:45:45 network2 ntpd[899]: Listen normally on 10 tapf0afab77-ea fe80::e8e5:6dff:fea5:e912 UDP 123
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 19:45:52 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering MASTER STATE

It means, I think, that a tap device is created and virtual link ha-xxx is created on top of it. (Of course, I'm not sure this is correct understanding.) But still, the communication between the two HA routers are not engaged.

Still open for comments and hints.

UPDATE 4

All right. Finally, I solved the problem. The communication between two keepalived processes were blocked because of ens2f0, and br-ens2f0. Originally, it was like below:

[root@network2 agent]# ovs-ofctl show br-ens2f0
OFPT_FEATURES_REPLY (xid=0x2): dpid:000090e2ba1f1ec4
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 1(ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     current:    COPPER AUTO_NEG
     advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     supported:  10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     speed: 0 Mbps now, 1000 Mbps max
 13(phy-br-ens2f0): addr:3e:77:de:63:a0:95
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

As you can see, the br-ens2f0 and ens2f0 interfaces are DOWN. So I did...

[root@network2 agent]# ip link set br-ens2f0 up
[root@network2 agent]# ip link set ens2f0 up

up

Then... finally.... the second keepalived turns into BACKUP mode.

Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Received higher prio advert
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Group(VG_1) Syncing instances to BACKUP state

What a long story. Anyway, problem solved.

Thank you fellow OpenStack guys. :-)

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ovs ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I don't know there the '1' came from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.

Any further hints or comments?

UPDATE 2

After re-setting the controller node and two network nodes, and after struggling with source codes, I found that the br-int and br-ens2f0 bridges has correct flow tables:

[root@network2 agent]# ovs-ofctl dump-flows br-ens2f0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1344.498s, table=0, n_packets=0, n_bytes=0, idle_age=1344, priority=1 actions=NORMAL
 cookie=0x0, duration=651.521s, table=0, n_packets=330, n_bytes=16832, idle_age=0, priority=4,in_port=9,dl_vlan=2 actions=mod_vlan_vid:401,NORMAL
 cookie=0x0, duration=1343.975s, table=0, n_packets=14, n_bytes=1164, idle_age=651, priority=2,in_port=9 actions=drop
[root@network2 agent]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1363.640s, table=0, n_packets=671, n_bytes=34602, idle_age=1, priority=1 actions=NORMAL
 cookie=0x0, duration=670.231s, table=0, n_packets=0, n_bytes=0, idle_age=670, priority=3,in_port=18,dl_vlan=401 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=1362.813s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=18 actions=drop
 cookie=0x0, duration=1362.032s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=19 actions=drop
 cookie=0x0, duration=1363.576s, table=23, n_packets=0, n_bytes=0, idle_age=1363, priority=0 actions=drop

As given by the flow tables, the traffic from br-ens2f0 with VLAN 401 is converted to VLAN 2, and traffic from br-int with VLAN 2 is converted to 401. That means, local vlan port 2 (which I first thought is the wrong value assigned) is converted to VLAN 401 before being injected to physical network, and vice versa.

However, as indicated by the n_packets values, that only works for one direction, and each HA router does not receive any keepalive messages from each other, and that makes each HA router MASTER at the same time.

And still, I'm seeing following log messages at /var/log/openvswitch/ovs-vswitchd.log.

2014-12-29T08:54:07.812Z|00101|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00102|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00103|netdev_linux|WARN|ha-d75e6f07-5e: removing policing failed: No such device

Further, the ovs-ofctl show br-int indicates the ha-XXX devices is down:

[root@network1 agent]# ovs-ofctl show br-int

OFPT_FEATURES_REPLY (xid=0x2): dpid:0000d67d91611247
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 29(int-br-ens2f0): addr:e6:97:65:12:a8:b2
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 30(int-br-ex): addr:56:04:61:d4:01:3a
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 32(ha-365b05db-b1): addr:56:04:61:d4:01:3a
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-int): addr:d6:7d:91:61:12:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

I think I have come very close to the final answer, but still needs help from others. Any comments and hints are welcomed.

UPDATE 3

OK. By enabling veth support in /etc/neutron/l3_agent.ini and restarting the l3 and openvswitch agent, the logs what I've shown in previously (removing policing failed:No such device) has totally been amortized. One notable difference with previous setting is that the ovs port name "ha-xxxx" is no longer used. Instead, I can see following logs:

Dec 29 19:45:42 network2 kernel: device tapf0afab77-ea entered promiscuous mode

Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_UP): ha-f0afab77-ea: link is not ready
Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ha-f0afab77-ea: link becomes ready
Dec 29 19:45:43 network2 Keepalived[9432]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 19:45:43 network2 Keepalived[9433]: Starting VRRP child process, pid=9434
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink reflector
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink command channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering gratuitous ARP shared channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Opening file '/var/lib/neutron/ha_confs/ed04d4e6-5f00-425d-b856-0cec3ab69ae8/keepalived.conf'.
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Configuration is using : 65206 Bytes
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Using LinkWatch kernel netlink reflector...
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 19:45:44 network2 avahi-daemon[788]: Registering new address record for fe80::e8e5:6dff:fea5:e912 on tapf0afab77-ea.*.
Dec 29 19:45:44 network2 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port tapf0afab77-ea tag=3
Dec 29 19:45:45 network2 ntpd[899]: Listen normally on 10 tapf0afab77-ea fe80::e8e5:6dff:fea5:e912 UDP 123
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 19:45:52 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering MASTER STATE

It means, I think, that a tap device is created and virtual link ha-xxx is created on top of it. (Of course, I'm not sure this is correct understanding.) But still, the communication between the two HA routers are not engaged.

Still open for comments and hints.

UPDATE 4

All right. Finally, I solved the problem. The communication between two keepalived processes were blocked because of ens2f0, and br-ens2f0. Originally, it was like below:

[root@network2 agent]# ovs-ofctl show br-ens2f0
OFPT_FEATURES_REPLY (xid=0x2): dpid:000090e2ba1f1ec4
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 1(ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     current:    COPPER AUTO_NEG
     advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     supported:  10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     speed: 0 Mbps now, 1000 Mbps max
 13(phy-br-ens2f0): addr:3e:77:de:63:a0:95
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

As you can see, the br-ens2f0 and ens2f0 interfaces are DOWN. So I did...

[root@network2 agent]# ip link set br-ens2f0 up
[root@network2 agent]# ip link set ens2f0 up

Then... finally.... the second keepalived turns into BACKUP mode.

Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Received higher prio advert
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Group(VG_1) Syncing instances to BACKUP state

What a long story. Anyway, problem solved.

Thank you fellow OpenStack guys. :-)

LESSONS LEARNED:

1: Most problems are from configurations

2: Watch out bridge settings

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ovs ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I don't know there the '1' came from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.

Any further hints or comments?

UPDATE 2

After re-setting the controller node and two network nodes, and after struggling with source codes, I found that the br-int and br-ens2f0 bridges has correct flow tables:

[root@network2 agent]# ovs-ofctl dump-flows br-ens2f0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1344.498s, table=0, n_packets=0, n_bytes=0, idle_age=1344, priority=1 actions=NORMAL
 cookie=0x0, duration=651.521s, table=0, n_packets=330, n_bytes=16832, idle_age=0, priority=4,in_port=9,dl_vlan=2 actions=mod_vlan_vid:401,NORMAL
 cookie=0x0, duration=1343.975s, table=0, n_packets=14, n_bytes=1164, idle_age=651, priority=2,in_port=9 actions=drop
[root@network2 agent]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1363.640s, table=0, n_packets=671, n_bytes=34602, idle_age=1, priority=1 actions=NORMAL
 cookie=0x0, duration=670.231s, table=0, n_packets=0, n_bytes=0, idle_age=670, priority=3,in_port=18,dl_vlan=401 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=1362.813s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=18 actions=drop
 cookie=0x0, duration=1362.032s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=19 actions=drop
 cookie=0x0, duration=1363.576s, table=23, n_packets=0, n_bytes=0, idle_age=1363, priority=0 actions=drop

As given by the flow tables, the traffic from br-ens2f0 with VLAN 401 is converted to VLAN 2, and traffic from br-int with VLAN 2 is converted to 401. That means, local vlan port 2 (which I first thought is the wrong value assigned) is converted to VLAN 401 before being injected to physical network, and vice versa.

However, as indicated by the n_packets values, that only works for one direction, and each HA router does not receive any keepalive messages from each other, and that makes each HA router MASTER at the same time.

And still, I'm seeing following log messages at /var/log/openvswitch/ovs-vswitchd.log.

2014-12-29T08:54:07.812Z|00101|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00102|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00103|netdev_linux|WARN|ha-d75e6f07-5e: removing policing failed: No such device

Further, the ovs-ofctl show br-int indicates the ha-XXX devices is down:

[root@network1 agent]# ovs-ofctl show br-int

OFPT_FEATURES_REPLY (xid=0x2): dpid:0000d67d91611247
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 29(int-br-ens2f0): addr:e6:97:65:12:a8:b2
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 30(int-br-ex): addr:56:04:61:d4:01:3a
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 32(ha-365b05db-b1): addr:56:04:61:d4:01:3a
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-int): addr:d6:7d:91:61:12:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

I think I have come very close to the final answer, but still needs help from others. Any comments and hints are welcomed.

UPDATE 3

OK. By enabling veth support in /etc/neutron/l3_agent.ini and restarting the l3 and openvswitch agent, the logs what I've shown in previously (removing policing failed:No such device) has totally been amortized. One notable difference with previous setting is that the ovs port name "ha-xxxx" is no longer used. Instead, I can see following logs:

Dec 29 19:45:42 network2 kernel: device tapf0afab77-ea entered promiscuous mode

Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_UP): ha-f0afab77-ea: link is not ready
Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ha-f0afab77-ea: link becomes ready
Dec 29 19:45:43 network2 Keepalived[9432]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 19:45:43 network2 Keepalived[9433]: Starting VRRP child process, pid=9434
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink reflector
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink command channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering gratuitous ARP shared channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Opening file '/var/lib/neutron/ha_confs/ed04d4e6-5f00-425d-b856-0cec3ab69ae8/keepalived.conf'.
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Configuration is using : 65206 Bytes
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Using LinkWatch kernel netlink reflector...
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 19:45:44 network2 avahi-daemon[788]: Registering new address record for fe80::e8e5:6dff:fea5:e912 on tapf0afab77-ea.*.
Dec 29 19:45:44 network2 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port tapf0afab77-ea tag=3
Dec 29 19:45:45 network2 ntpd[899]: Listen normally on 10 tapf0afab77-ea fe80::e8e5:6dff:fea5:e912 UDP 123
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 19:45:52 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering MASTER STATE

It means, I think, that a tap device is created and virtual link ha-xxx is created on top of it. (Of course, I'm not sure this is correct understanding.) But still, the communication between the two HA routers are not engaged.

Still open for comments and hints.

UPDATE 4

All right. Finally, I solved the problem. The communication between two keepalived processes were blocked because of ens2f0, and br-ens2f0. Originally, it was like below:

[root@network2 agent]# ovs-ofctl show br-ens2f0
OFPT_FEATURES_REPLY (xid=0x2): dpid:000090e2ba1f1ec4
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 1(ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     current:    COPPER AUTO_NEG
     advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     supported:  10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     speed: 0 Mbps now, 1000 Mbps max
 13(phy-br-ens2f0): addr:3e:77:de:63:a0:95
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

As you can see, the br-ens2f0 and ens2f0 interfaces are DOWN. So I did...

[root@network2 agent]# ip link set br-ens2f0 up
[root@network2 agent]# ip link set ens2f0 up

Then... finally.... the second keepalived turns into BACKUP mode.

Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Received higher prio advert
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Group(VG_1) Syncing instances to BACKUP state

What a long story. Anyway, problem solved.

Thank you fellow OpenStack guys. :-)

LESSONS LEARNED:

1: Most problems are from configurations

2: Watch out bridge settings

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ovs ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I don't know there the '1' came from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.

Any further hints or comments?

UPDATE 2

After re-setting the controller node and two network nodes, and after struggling with source codes, I found that the br-int and br-ens2f0 bridges has correct flow tables:

[root@network2 agent]# ovs-ofctl dump-flows br-ens2f0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1344.498s, table=0, n_packets=0, n_bytes=0, idle_age=1344, priority=1 actions=NORMAL
 cookie=0x0, duration=651.521s, table=0, n_packets=330, n_bytes=16832, idle_age=0, priority=4,in_port=9,dl_vlan=2 actions=mod_vlan_vid:401,NORMAL
 cookie=0x0, duration=1343.975s, table=0, n_packets=14, n_bytes=1164, idle_age=651, priority=2,in_port=9 actions=drop
[root@network2 agent]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1363.640s, table=0, n_packets=671, n_bytes=34602, idle_age=1, priority=1 actions=NORMAL
 cookie=0x0, duration=670.231s, table=0, n_packets=0, n_bytes=0, idle_age=670, priority=3,in_port=18,dl_vlan=401 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=1362.813s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=18 actions=drop
 cookie=0x0, duration=1362.032s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=19 actions=drop
 cookie=0x0, duration=1363.576s, table=23, n_packets=0, n_bytes=0, idle_age=1363, priority=0 actions=drop

As given by the flow tables, the traffic from br-ens2f0 with VLAN 401 is converted to VLAN 2, and traffic from br-int with VLAN 2 is converted to 401. That means, local vlan port 2 (which I first thought is the wrong value assigned) is converted to VLAN 401 before being injected to physical network, and vice versa.

However, as indicated by the n_packets values, that only works for one direction, and each HA router does not receive any keepalive messages from each other, and that makes each HA router MASTER at the same time.

And still, I'm seeing following log messages at /var/log/openvswitch/ovs-vswitchd.log.

2014-12-29T08:54:07.812Z|00101|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00102|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00103|netdev_linux|WARN|ha-d75e6f07-5e: removing policing failed: No such device

Further, the ovs-ofctl show br-int indicates the ha-XXX devices is down:

[root@network1 agent]# ovs-ofctl show br-int

OFPT_FEATURES_REPLY (xid=0x2): dpid:0000d67d91611247
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 29(int-br-ens2f0): addr:e6:97:65:12:a8:b2
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 30(int-br-ex): addr:56:04:61:d4:01:3a
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 32(ha-365b05db-b1): addr:56:04:61:d4:01:3a
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-int): addr:d6:7d:91:61:12:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

I think I have come very close to the final answer, but still needs help from others. Any comments and hints are welcomed.

UPDATE 3

OK. By enabling veth support in /etc/neutron/l3_agent.ini and restarting the l3 and openvswitch agent, the logs what I've shown in previously (removing policing failed:No such device) has totally been amortized. One notable difference with previous setting is that the ovs port name "ha-xxxx" is no longer used. Instead, I can see following logs:

Dec 29 19:45:42 network2 kernel: device tapf0afab77-ea entered promiscuous mode

Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_UP): ha-f0afab77-ea: link is not ready
Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ha-f0afab77-ea: link becomes ready
Dec 29 19:45:43 network2 Keepalived[9432]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 19:45:43 network2 Keepalived[9433]: Starting VRRP child process, pid=9434
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink reflector
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink command channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering gratuitous ARP shared channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Opening file '/var/lib/neutron/ha_confs/ed04d4e6-5f00-425d-b856-0cec3ab69ae8/keepalived.conf'.
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Configuration is using : 65206 Bytes
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Using LinkWatch kernel netlink reflector...
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 19:45:44 network2 avahi-daemon[788]: Registering new address record for fe80::e8e5:6dff:fea5:e912 on tapf0afab77-ea.*.
Dec 29 19:45:44 network2 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port tapf0afab77-ea tag=3
Dec 29 19:45:45 network2 ntpd[899]: Listen normally on 10 tapf0afab77-ea fe80::e8e5:6dff:fea5:e912 UDP 123
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 19:45:52 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering MASTER STATE

It means, I think, that a tap device is created and virtual link ha-xxx is created on top of it. (Of course, I'm not sure this is correct understanding.) But still, the communication between the two HA routers are not engaged.

Still open for comments and hints.

UPDATE 4

All right. Finally, I solved the problem. The communication between two keepalived processes were blocked because of ens2f0, and br-ens2f0. Originally, it was like below:

[root@network2 agent]# ovs-ofctl show br-ens2f0
OFPT_FEATURES_REPLY (xid=0x2): dpid:000090e2ba1f1ec4
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 1(ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     current:    COPPER AUTO_NEG
     advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     supported:  10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     speed: 0 Mbps now, 1000 Mbps max
 13(phy-br-ens2f0): addr:3e:77:de:63:a0:95
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

As you can see, the br-ens2f0 and ens2f0 interfaces are DOWN. So I did...

[root@network2 agent]# ip link set br-ens2f0 up
[root@network2 agent]# ip link set ens2f0 up

Then... finally.... the second keepalived turns into BACKUP mode.

Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Received higher prio advert
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Group(VG_1) Syncing instances to BACKUP state

What a long story. Anyway, problem solved.

Thank you fellow OpenStack guys. :-)

LESSONS LEARNED:UPDATE 5

1: Most problems are from configurations

2: Watch out bridge settings

I thought I have come to an end of this problem, but it was not. following command shut all keepalived processes down:

neutron router-gateway-set demo-router ext-net

with following log:

Dec 29 20:34:17 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 -- --if-exists del-port br-int tapa761b6ad-9e
Dec 29 20:34:17 network1 kernel: device tapa761b6ad-9e left promiscuous mode
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing address record for fe80::f0cd:1dff:fe59:648c on tapa761b6ad-9e.
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing workstation service for tapa761b6ad-9e.
Dec 29 20:34:18 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 -- --if-exists del-port br-int tap6030d7fc-01
Dec 29 20:34:18 network1 kernel: device tap6030d7fc-01 left promiscuous mode
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing address record for fe80::38ea:f0ff:fef9:aa39 on tap6030d7fc-01.
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing workstation service for tap6030d7fc-01.
Dec 29 20:34:18 network1 Keepalived[15220]: Stopping Keepalived v1.2.10 (06/10,2014)
Dec 29 20:34:18 network1 Keepalived_vrrp[15221]: VRRP_Instance(VR_1) sending 0 priority
Dec 29 20:34:18 network1 Keepalived_vrrp[15221]: Netlink: error: No such device, type=(21), seq=1419852677, pid=0
Dec 29 20:34:19 network1 ntpd[954]: Deleting interface #18 tap6030d7fc-01, fe80::38ea:f0ff:fef9:aa39#123, interface stats: received=0, sent=0, dropped=0, active_time=181 secs
Dec 29 20:34:19 network1 ntpd[954]: Deleting interface #17 tapa761b6ad-9e, fe80::f0cd:1dff:fe59:648c#123, interface stats: received=0, sent=0, dropped=0, active_time=181 secs

Any comments?

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ovs ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I don't know there the '1' came from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.

Any further hints or comments?

UPDATE 2

After re-setting the controller node and two network nodes, and after struggling with source codes, I found that the br-int and br-ens2f0 bridges has correct flow tables:

[root@network2 agent]# ovs-ofctl dump-flows br-ens2f0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1344.498s, table=0, n_packets=0, n_bytes=0, idle_age=1344, priority=1 actions=NORMAL
 cookie=0x0, duration=651.521s, table=0, n_packets=330, n_bytes=16832, idle_age=0, priority=4,in_port=9,dl_vlan=2 actions=mod_vlan_vid:401,NORMAL
 cookie=0x0, duration=1343.975s, table=0, n_packets=14, n_bytes=1164, idle_age=651, priority=2,in_port=9 actions=drop
[root@network2 agent]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1363.640s, table=0, n_packets=671, n_bytes=34602, idle_age=1, priority=1 actions=NORMAL
 cookie=0x0, duration=670.231s, table=0, n_packets=0, n_bytes=0, idle_age=670, priority=3,in_port=18,dl_vlan=401 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=1362.813s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=18 actions=drop
 cookie=0x0, duration=1362.032s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=19 actions=drop
 cookie=0x0, duration=1363.576s, table=23, n_packets=0, n_bytes=0, idle_age=1363, priority=0 actions=drop

As given by the flow tables, the traffic from br-ens2f0 with VLAN 401 is converted to VLAN 2, and traffic from br-int with VLAN 2 is converted to 401. That means, local vlan port 2 (which I first thought is the wrong value assigned) is converted to VLAN 401 before being injected to physical network, and vice versa.

However, as indicated by the n_packets values, that only works for one direction, and each HA router does not receive any keepalive messages from each other, and that makes each HA router MASTER at the same time.

And still, I'm seeing following log messages at /var/log/openvswitch/ovs-vswitchd.log.

2014-12-29T08:54:07.812Z|00101|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00102|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00103|netdev_linux|WARN|ha-d75e6f07-5e: removing policing failed: No such device

Further, the ovs-ofctl show br-int indicates the ha-XXX devices is down:

[root@network1 agent]# ovs-ofctl show br-int

OFPT_FEATURES_REPLY (xid=0x2): dpid:0000d67d91611247
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 29(int-br-ens2f0): addr:e6:97:65:12:a8:b2
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 30(int-br-ex): addr:56:04:61:d4:01:3a
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 32(ha-365b05db-b1): addr:56:04:61:d4:01:3a
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-int): addr:d6:7d:91:61:12:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

I think I have come very close to the final answer, but still needs help from others. Any comments and hints are welcomed.

UPDATE 3

OK. By enabling veth support in /etc/neutron/l3_agent.ini and restarting the l3 and openvswitch agent, the logs what I've shown in previously (removing policing failed:No such device) has totally been amortized. One notable difference with previous setting is that the ovs port name "ha-xxxx" is no longer used. Instead, I can see following logs:

Dec 29 19:45:42 network2 kernel: device tapf0afab77-ea entered promiscuous mode

Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_UP): ha-f0afab77-ea: link is not ready
Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ha-f0afab77-ea: link becomes ready
Dec 29 19:45:43 network2 Keepalived[9432]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 19:45:43 network2 Keepalived[9433]: Starting VRRP child process, pid=9434
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink reflector
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink command channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering gratuitous ARP shared channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Opening file '/var/lib/neutron/ha_confs/ed04d4e6-5f00-425d-b856-0cec3ab69ae8/keepalived.conf'.
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Configuration is using : 65206 Bytes
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Using LinkWatch kernel netlink reflector...
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 19:45:44 network2 avahi-daemon[788]: Registering new address record for fe80::e8e5:6dff:fea5:e912 on tapf0afab77-ea.*.
Dec 29 19:45:44 network2 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port tapf0afab77-ea tag=3
Dec 29 19:45:45 network2 ntpd[899]: Listen normally on 10 tapf0afab77-ea fe80::e8e5:6dff:fea5:e912 UDP 123
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 19:45:52 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering MASTER STATE

It means, I think, that a tap device is created and virtual link ha-xxx is created on top of it. (Of course, I'm not sure this is correct understanding.) But still, the communication between the two HA routers are not engaged.

Still open for comments and hints.

UPDATE 4

All right. Finally, I solved the problem. The communication between two keepalived processes were blocked because of ens2f0, and br-ens2f0. Originally, it was like below:

[root@network2 agent]# ovs-ofctl show br-ens2f0
OFPT_FEATURES_REPLY (xid=0x2): dpid:000090e2ba1f1ec4
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 1(ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     current:    COPPER AUTO_NEG
     advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     supported:  10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     speed: 0 Mbps now, 1000 Mbps max
 13(phy-br-ens2f0): addr:3e:77:de:63:a0:95
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

As you can see, the br-ens2f0 and ens2f0 interfaces are DOWN. So I did...

[root@network2 agent]# ip link set br-ens2f0 up
[root@network2 agent]# ip link set ens2f0 up

Then... finally.... the second keepalived turns into BACKUP mode.

Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Received higher prio advert
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Group(VG_1) Syncing instances to BACKUP state

UPDATE 5

I thought I have come to an end of this problem, but it was not. following command shut all keepalived processes down:

neutron router-gateway-set demo-router ext-net

with following log:

Dec 29 20:34:17 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 -- --if-exists del-port br-int tapa761b6ad-9e
Dec 29 20:34:17 network1 kernel: device tapa761b6ad-9e left promiscuous mode
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing address record for fe80::f0cd:1dff:fe59:648c on tapa761b6ad-9e.
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing workstation service for tapa761b6ad-9e.
Dec 29 20:34:18 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 -- --if-exists del-port br-int tap6030d7fc-01
Dec 29 20:34:18 network1 kernel: device tap6030d7fc-01 left promiscuous mode
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing address record for fe80::38ea:f0ff:fef9:aa39 on tap6030d7fc-01.
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing workstation service for tap6030d7fc-01.
Dec 29 20:34:18 network1 Keepalived[15220]: Stopping Keepalived v1.2.10 (06/10,2014)
Dec 29 20:34:18 network1 Keepalived_vrrp[15221]: VRRP_Instance(VR_1) sending 0 priority
Dec 29 20:34:18 network1 Keepalived_vrrp[15221]: Netlink: error: No such device, type=(21), seq=1419852677, pid=0
Dec 29 20:34:19 network1 ntpd[954]: Deleting interface #18 tap6030d7fc-01, fe80::38ea:f0ff:fef9:aa39#123, interface stats: received=0, sent=0, dropped=0, active_time=181 secs
Dec 29 20:34:19 network1 ntpd[954]: Deleting interface #17 tapa761b6ad-9e, fe80::f0cd:1dff:fe59:648c#123, interface stats: received=0, sent=0, dropped=0, active_time=181 secs

At this point, I thought that the veth config can be a source of this evil, and removed the configuration from all files. And found out that veth config was totally unnecessary. (Thus, you can ignore UPDATE 3). However, the command 'neutron router-interface-set demo-router ext-net' creates exactly the same problem.

Any comments?

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ovs ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I don't know there the '1' came from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.

Any further hints or comments?

UPDATE 2

After re-setting the controller node and two network nodes, and after struggling with source codes, I found that the br-int and br-ens2f0 bridges has correct flow tables:

[root@network2 agent]# ovs-ofctl dump-flows br-ens2f0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1344.498s, table=0, n_packets=0, n_bytes=0, idle_age=1344, priority=1 actions=NORMAL
 cookie=0x0, duration=651.521s, table=0, n_packets=330, n_bytes=16832, idle_age=0, priority=4,in_port=9,dl_vlan=2 actions=mod_vlan_vid:401,NORMAL
 cookie=0x0, duration=1343.975s, table=0, n_packets=14, n_bytes=1164, idle_age=651, priority=2,in_port=9 actions=drop
[root@network2 agent]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1363.640s, table=0, n_packets=671, n_bytes=34602, idle_age=1, priority=1 actions=NORMAL
 cookie=0x0, duration=670.231s, table=0, n_packets=0, n_bytes=0, idle_age=670, priority=3,in_port=18,dl_vlan=401 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=1362.813s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=18 actions=drop
 cookie=0x0, duration=1362.032s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=19 actions=drop
 cookie=0x0, duration=1363.576s, table=23, n_packets=0, n_bytes=0, idle_age=1363, priority=0 actions=drop

As given by the flow tables, the traffic from br-ens2f0 with VLAN 401 is converted to VLAN 2, and traffic from br-int with VLAN 2 is converted to 401. That means, local vlan port 2 (which I first thought is the wrong value assigned) is converted to VLAN 401 before being injected to physical network, and vice versa.

However, as indicated by the n_packets values, that only works for one direction, and each HA router does not receive any keepalive messages from each other, and that makes each HA router MASTER at the same time.

And still, I'm seeing following log messages at /var/log/openvswitch/ovs-vswitchd.log.

2014-12-29T08:54:07.812Z|00101|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00102|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00103|netdev_linux|WARN|ha-d75e6f07-5e: removing policing failed: No such device

Further, the ovs-ofctl show br-int indicates the ha-XXX devices is down:

[root@network1 agent]# ovs-ofctl show br-int

OFPT_FEATURES_REPLY (xid=0x2): dpid:0000d67d91611247
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 29(int-br-ens2f0): addr:e6:97:65:12:a8:b2
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 30(int-br-ex): addr:56:04:61:d4:01:3a
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 32(ha-365b05db-b1): addr:56:04:61:d4:01:3a
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-int): addr:d6:7d:91:61:12:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

I think I have come very close to the final answer, but still needs help from others. Any comments and hints are welcomed.

UPDATE 3

OK. By enabling veth support in /etc/neutron/l3_agent.ini and restarting the l3 and openvswitch agent, the logs what I've shown in previously (removing policing failed:No such device) has totally been amortized. One notable difference with previous setting is that the ovs port name "ha-xxxx" is no longer used. Instead, I can see following logs:

Dec 29 19:45:42 network2 kernel: device tapf0afab77-ea entered promiscuous mode

Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_UP): ha-f0afab77-ea: link is not ready
Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ha-f0afab77-ea: link becomes ready
Dec 29 19:45:43 network2 Keepalived[9432]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 19:45:43 network2 Keepalived[9433]: Starting VRRP child process, pid=9434
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink reflector
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink command channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering gratuitous ARP shared channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Opening file '/var/lib/neutron/ha_confs/ed04d4e6-5f00-425d-b856-0cec3ab69ae8/keepalived.conf'.
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Configuration is using : 65206 Bytes
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Using LinkWatch kernel netlink reflector...
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 19:45:44 network2 avahi-daemon[788]: Registering new address record for fe80::e8e5:6dff:fea5:e912 on tapf0afab77-ea.*.
Dec 29 19:45:44 network2 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port tapf0afab77-ea tag=3
Dec 29 19:45:45 network2 ntpd[899]: Listen normally on 10 tapf0afab77-ea fe80::e8e5:6dff:fea5:e912 UDP 123
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 19:45:52 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering MASTER STATE

It means, I think, that a tap device is created and virtual link ha-xxx is created on top of it. (Of course, I'm not sure this is correct understanding.) But still, the communication between the two HA routers are not engaged.

Still open for comments and hints.

UPDATE 4

All right. Finally, I solved the problem. The communication between two keepalived processes were blocked because of ens2f0, and br-ens2f0. Originally, it was like below:

[root@network2 agent]# ovs-ofctl show br-ens2f0
OFPT_FEATURES_REPLY (xid=0x2): dpid:000090e2ba1f1ec4
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 1(ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     current:    COPPER AUTO_NEG
     advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     supported:  10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     speed: 0 Mbps now, 1000 Mbps max
 13(phy-br-ens2f0): addr:3e:77:de:63:a0:95
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

As you can see, the br-ens2f0 and ens2f0 interfaces are DOWN. So I did...

[root@network2 agent]# ip link set br-ens2f0 up
[root@network2 agent]# ip link set ens2f0 up

Then... finally.... the second keepalived turns into BACKUP mode.

Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Received higher prio advert
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Group(VG_1) Syncing instances to BACKUP state

UPDATE 5

I thought I have come to an end of this problem, but it was not. following command shut all keepalived processes down:

neutron router-gateway-set demo-router ext-net

with following log:

Dec 29 20:34:17 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 -- --if-exists del-port br-int tapa761b6ad-9e
Dec 29 20:34:17 network1 kernel: device tapa761b6ad-9e left promiscuous mode
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing address record for fe80::f0cd:1dff:fe59:648c on tapa761b6ad-9e.
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing workstation service for tapa761b6ad-9e.
Dec 29 20:34:18 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 -- --if-exists del-port br-int tap6030d7fc-01
Dec 29 20:34:18 network1 kernel: device tap6030d7fc-01 left promiscuous mode
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing address record for fe80::38ea:f0ff:fef9:aa39 on tap6030d7fc-01.
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing workstation service for tap6030d7fc-01.
Dec 29 20:34:18 network1 Keepalived[15220]: Stopping Keepalived v1.2.10 (06/10,2014)
Dec 29 20:34:18 network1 Keepalived_vrrp[15221]: VRRP_Instance(VR_1) sending 0 priority
Dec 29 20:34:18 network1 Keepalived_vrrp[15221]: Netlink: error: No such device, type=(21), seq=1419852677, pid=0
Dec 29 20:34:19 network1 ntpd[954]: Deleting interface #18 tap6030d7fc-01, fe80::38ea:f0ff:fef9:aa39#123, interface stats: received=0, sent=0, dropped=0, active_time=181 secs
Dec 29 20:34:19 network1 ntpd[954]: Deleting interface #17 tapa761b6ad-9e, fe80::f0cd:1dff:fe59:648c#123, interface stats: received=0, sent=0, dropped=0, active_time=181 secs

At this point, I thought that the veth config can be a source of this evil, and removed the configuration from all files. And found out that veth config was totally unnecessary. (Thus, you can ignore UPDATE 3). However, the command 'neutron router-interface-set demo-router ext-net' creates exactly the same problem.

Any comments?

UPDATE 6 - NOW WORKING

Strange symptom that I mentioned in update 5 was raised by the wrong gateway_external_network_id value. (This value was outdated when I totally reset the neutron database.)

Fixing it, rebooting all the network nodes, and I tested again. Then, it works, finally. One unhappy thing about my setting is that interfaces such as br-ex, enp2s0f1 (bound to br-ex), br-ens2f0, ens2f0 (bound to br-ens2f0) are not up by default when I reboot. I should manually make them up by 'ip link set xxxx up'. I first thought that this value should be correctly set by neutron agents, but it was wrong guess. Am I missing something?

Anyway,thank you guys for all the great support.

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = vlan
flat,vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = flat,vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ovs ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I don't know there the '1' came from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.

Any further hints or comments?

UPDATE 2

After re-setting the controller node and two network nodes, and after struggling with source codes, I found that the br-int and br-ens2f0 bridges has correct flow tables:

[root@network2 agent]# ovs-ofctl dump-flows br-ens2f0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1344.498s, table=0, n_packets=0, n_bytes=0, idle_age=1344, priority=1 actions=NORMAL
 cookie=0x0, duration=651.521s, table=0, n_packets=330, n_bytes=16832, idle_age=0, priority=4,in_port=9,dl_vlan=2 actions=mod_vlan_vid:401,NORMAL
 cookie=0x0, duration=1343.975s, table=0, n_packets=14, n_bytes=1164, idle_age=651, priority=2,in_port=9 actions=drop
[root@network2 agent]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1363.640s, table=0, n_packets=671, n_bytes=34602, idle_age=1, priority=1 actions=NORMAL
 cookie=0x0, duration=670.231s, table=0, n_packets=0, n_bytes=0, idle_age=670, priority=3,in_port=18,dl_vlan=401 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=1362.813s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=18 actions=drop
 cookie=0x0, duration=1362.032s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=19 actions=drop
 cookie=0x0, duration=1363.576s, table=23, n_packets=0, n_bytes=0, idle_age=1363, priority=0 actions=drop

As given by the flow tables, the traffic from br-ens2f0 with VLAN 401 is converted to VLAN 2, and traffic from br-int with VLAN 2 is converted to 401. That means, local vlan port 2 (which I first thought is the wrong value assigned) is converted to VLAN 401 before being injected to physical network, and vice versa.

However, as indicated by the n_packets values, that only works for one direction, and each HA router does not receive any keepalive messages from each other, and that makes each HA router MASTER at the same time.

And still, I'm seeing following log messages at /var/log/openvswitch/ovs-vswitchd.log.

2014-12-29T08:54:07.812Z|00101|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00102|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00103|netdev_linux|WARN|ha-d75e6f07-5e: removing policing failed: No such device

Further, the ovs-ofctl show br-int indicates the ha-XXX devices is down:

[root@network1 agent]# ovs-ofctl show br-int

OFPT_FEATURES_REPLY (xid=0x2): dpid:0000d67d91611247
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 29(int-br-ens2f0): addr:e6:97:65:12:a8:b2
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 30(int-br-ex): addr:56:04:61:d4:01:3a
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 32(ha-365b05db-b1): addr:56:04:61:d4:01:3a
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-int): addr:d6:7d:91:61:12:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

I think I have come very close to the final answer, but still needs help from others. Any comments and hints are welcomed.

UPDATE 3

OK. By enabling veth support in /etc/neutron/l3_agent.ini and restarting the l3 and openvswitch agent, the logs what I've shown in previously (removing policing failed:No such device) has totally been amortized. One notable difference with previous setting is that the ovs port name "ha-xxxx" is no longer used. Instead, I can see following logs:

Dec 29 19:45:42 network2 kernel: device tapf0afab77-ea entered promiscuous mode

Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_UP): ha-f0afab77-ea: link is not ready
Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ha-f0afab77-ea: link becomes ready
Dec 29 19:45:43 network2 Keepalived[9432]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 19:45:43 network2 Keepalived[9433]: Starting VRRP child process, pid=9434
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink reflector
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink command channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering gratuitous ARP shared channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Opening file '/var/lib/neutron/ha_confs/ed04d4e6-5f00-425d-b856-0cec3ab69ae8/keepalived.conf'.
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Configuration is using : 65206 Bytes
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Using LinkWatch kernel netlink reflector...
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 19:45:44 network2 avahi-daemon[788]: Registering new address record for fe80::e8e5:6dff:fea5:e912 on tapf0afab77-ea.*.
Dec 29 19:45:44 network2 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port tapf0afab77-ea tag=3
Dec 29 19:45:45 network2 ntpd[899]: Listen normally on 10 tapf0afab77-ea fe80::e8e5:6dff:fea5:e912 UDP 123
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 19:45:52 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering MASTER STATE

It means, I think, that a tap device is created and virtual link ha-xxx is created on top of it. (Of course, I'm not sure this is correct understanding.) But still, the communication between the two HA routers are not engaged.

Still open for comments and hints.

UPDATE 4

All right. Finally, I solved the problem. The communication between two keepalived processes were blocked because of ens2f0, and br-ens2f0. Originally, it was like below:

[root@network2 agent]# ovs-ofctl show br-ens2f0
OFPT_FEATURES_REPLY (xid=0x2): dpid:000090e2ba1f1ec4
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 1(ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     current:    COPPER AUTO_NEG
     advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     supported:  10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     speed: 0 Mbps now, 1000 Mbps max
 13(phy-br-ens2f0): addr:3e:77:de:63:a0:95
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

As you can see, the br-ens2f0 and ens2f0 interfaces are DOWN. So I did...

[root@network2 agent]# ip link set br-ens2f0 up
[root@network2 agent]# ip link set ens2f0 up

Then... finally.... the second keepalived turns into BACKUP mode.

Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Received higher prio advert
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Group(VG_1) Syncing instances to BACKUP state

UPDATE 5

I thought I have come to an end of this problem, but it was not. following command shut all keepalived processes down:

neutron router-gateway-set demo-router ext-net

with following log:

Dec 29 20:34:17 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 -- --if-exists del-port br-int tapa761b6ad-9e
Dec 29 20:34:17 network1 kernel: device tapa761b6ad-9e left promiscuous mode
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing address record for fe80::f0cd:1dff:fe59:648c on tapa761b6ad-9e.
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing workstation service for tapa761b6ad-9e.
Dec 29 20:34:18 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 -- --if-exists del-port br-int tap6030d7fc-01
Dec 29 20:34:18 network1 kernel: device tap6030d7fc-01 left promiscuous mode
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing address record for fe80::38ea:f0ff:fef9:aa39 on tap6030d7fc-01.
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing workstation service for tap6030d7fc-01.
Dec 29 20:34:18 network1 Keepalived[15220]: Stopping Keepalived v1.2.10 (06/10,2014)
Dec 29 20:34:18 network1 Keepalived_vrrp[15221]: VRRP_Instance(VR_1) sending 0 priority
Dec 29 20:34:18 network1 Keepalived_vrrp[15221]: Netlink: error: No such device, type=(21), seq=1419852677, pid=0
Dec 29 20:34:19 network1 ntpd[954]: Deleting interface #18 tap6030d7fc-01, fe80::38ea:f0ff:fef9:aa39#123, interface stats: received=0, sent=0, dropped=0, active_time=181 secs
Dec 29 20:34:19 network1 ntpd[954]: Deleting interface #17 tapa761b6ad-9e, fe80::f0cd:1dff:fe59:648c#123, interface stats: received=0, sent=0, dropped=0, active_time=181 secs

At this point, I thought that the veth config can be a source of this evil, and removed the configuration from all files. And found out that veth config was totally unnecessary. (Thus, you can ignore UPDATE 3). However, the command 'neutron router-interface-set demo-router ext-net' creates exactly the same problem.

Any comments?

UPDATE 6 - NOW WORKING

Strange symptom that I mentioned in update 5 was raised by the wrong gateway_external_network_id value. (This value was outdated when I totally reset the neutron database.)

Fixing it, rebooting all the network nodes, and I tested again. Then, it works, finally. One unhappy thing about my setting is that interfaces such as br-ex, enp2s0f1 (bound to br-ex), br-ens2f0, ens2f0 (bound to br-ens2f0) are not up by default when I reboot. I should manually make them up by 'ip link set xxxx up'. I first thought that this value should be correctly set by neutron agents, but it was wrong guess. Am I missing something?

Anyway,thank you guys for all the great support.

neutron (Juno) l3 HA test failed

Hello. I'm testing neutron (Juno) L3 HA feature with following configurations, with one controller node and two network nodes. All the nodes are Centos 7 minimal installation.

controller node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = keystone
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
notify_nova_on_port_status_changes = True
notify_nova_on_port_data_changes = True
nova_url = http://controller:8774/v2
nova_region_name = regionOne
nova_admin_username = nova
nova_admin_tenant_id = 3c5abd3469af433db8fe2047c6d62033
nova_admin_password = NOVA_PASS
rabbit_host=localhost
rabbit_userid=guest
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri=http://controller:5000/v2.0
identity_uri=http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
connection = mysql://neutron:NEUTRON_PASS@controller/neutron
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = flat,vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

network node

/etc/neutron/neutron.conf

[DEFAULT]
verbose = True
core_plugin = ml2
service_plugins = router
auth_strategy = neutron
l3_ha = True
max_l3_agents_per_router = 3
min_l3_agents_per_router = 2
rabbit_host=controller
rabbit_password=RABBIT_PASS
rpc_backend=rabbit
[matchmaker_redis]
[matchmaker_ring]
[quotas]
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
[keystone_authtoken]
auth_uri = http://controller:5000/v2.0
identity_uri = http://controller:35357
admin_tenant_name = service
admin_user = neutron
admin_password = NEUTRON_PASS
[database]
[service_providers]

/etc/neutron/plugin.ini

[ml2]
type_drivers = flat,vlan
tenant_network_types = vlan
mechanism_drivers = openvswitch
[ml2_type_flat]
[ml2_type_vlan]
network_vlan_ranges = default:400:1000
[ml2_type_gre]
[ml2_type_vxlan]
[securitygroup]
enable_security_group = True

/etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

[ovs]
tenant_network_type = vlan
network_vlan_ranges = default:400:1000
enable_tunneling = False
bridge_mappings = default:br-ens2f0
default:br-ens2f0,external:br-ex
[agent]
[securitygroup]
firewall_driver = neutron.agent.linux.iptables_firewall.OVSHybridIptablesFirewallDriver
enable_security_group = True

/etc/neutron/l3_agent.ini

[DEFAULT]
verbose = True
interface_driver = neutron.agent.linux.interface.OVSInterfaceDriver
use_namespaces = True
gateway_external_network_id = ea0937e7-9cc3-4f4e-ba84-b29d1b718a84
external_network_bridge = br-ex
ha_confs_path = $state_path/ha_confs
ha_vrrp_auth_type = PASS
ha_vrrp_auth_password = ABC
ha_vrrp_advert_int = 2

With these configurations, on creating a router with HA enabled, following messages are shown on the controller.

+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| distributed           | False                                |
| external_gateway_info |                                      |
| ha                    | True                                 |
| id                    | f3ace776-91d6-4528-b603-9011db11f470 |
| name                  | demo-router                          |
| routes                |                                      |
| status                | ACTIVE                               |
| tenant_id             | b2e4e2e598614b5dbd878ae976728630     |
+-----------------------+--------------------------------------+

However, in the /var/log/neutron/server.log, you can find error messages indicating port binding failures:

2014-12-29 11:57:17.702 26747 INFO neutron.db.l3_hamode_db [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] Number of available agents lower than max_l3_agents_per_router. L3 agents available: 2
2014-12-29 11:57:17.950 26747 INFO neutron.wsgi [req-7b7135d6-e3a8-4841-baca-0b786d0f1b78 None] 10.24.148.21 - - [29/Dec/2014 11:57:17] "POST /v2.0/routers.json HTTP/1.1" 201 448 0.320873
2014-12-29 11:57:18.181 26747 WARNING neutron.plugins.ml2.managers [req-8c31ef8e-2452-42af-b590-0015b45a325e None] Failed to bind port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on host network1
2014-12-29 11:57:18.206 26747 WARNING neutron.plugins.ml2.plugin [req-8c31ef8e-2452-42af-b590-0015b45a325e None] In _notify_port_updated(), no bound segment for port 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:18.208 26747 WARNING neutron.plugins.ml2.managers [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] Failed to bind port bc42bc0d-d1f7-4440-8704-757a47cee268 on host network2
2014-12-29 11:57:18.227 26747 WARNING neutron.plugins.ml2.plugin [req-ae7d921b-8f89-466c-9023-73d8edb720ca None] In _notify_port_updated(), no bound segment for port bc42bc0d-d1f7-4440-8704-757a47cee268 on network 46725d33-cd6d-418b-9574-d45cf7e6e340
2014-12-29 11:57:20.756 26747 WARNING neutron.plugins.ml2.rpc [req-3b378bc3-7378-4d61-b4dd-832afbe0f941 None] Device 7ec5cb76-94c6-4e2e-a687-606d8fb34ce7 requested by agent ovs-agent-network1 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed
2014-12-29 11:57:21.574 26747 WARNING neutron.plugins.ml2.rpc [req-924e277b-93a4-4a26-9234-c3d7f2b849d1 None] Device bc42bc0d-d1f7-4440-8704-757a47cee268 requested by agent ovs-agent-network2 on network 46725d33-cd6d-418b-9574-d45cf7e6e340 not bound, vif_type: binding_failed

In the network nodes, we could also found following logs (/var/log/messages) identically.

Dec 29 11:57:19 network1 kernel: device ha-7ec5cb76-94 entered promiscuous mode
Dec 29 11:57:19 network1 systemd-sysctl: Overwriting earlier assignment of net/ipv4/conf/default/rp_filter in file '/etc/sysctl.d/99-sysctl.conf'.
Dec 29 11:57:19 network1 avahi-daemon[789]: Withdrawing workstation service for ha-7ec5cb76-94.
Dec 29 11:57:20 network1 Keepalived[10986]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 11:57:20 network1 Keepalived[10987]: Starting VRRP child process, pid=10988
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink reflector
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering Kernel netlink command channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Registering gratuitous ARP shared channel
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Opening file '/var/lib/neutron/ha_confs/f3ace776-91d6-4528-b603-9011db11f470/keepalived.conf'.
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Configuration is using : 65206 Bytes
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: Using LinkWatch kernel netlink reflector...
Dec 29 11:57:20 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 11:57:21 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-7ec5cb76-94 tag=4095
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 11:57:26 network1 Keepalived_vrrp[10988]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 11:57:28 network1 Keepalived_vrrp[10988]: VRRP_Instance(VR_1) Entering MASTER STATE

That means, the HA port named ha-7ec5cb76-94 is blocked (by 4095 plan tag). I tracked the reason, and found out why. (/var/log/openvswitch/ovs-vswitchd.log)

2014-12-29T02:48:45.489Z|00160|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:48:45.491Z|00161|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.299Z|00162|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:19.302Z|00163|bridge|INFO|bridge br-int: added interface ha-7ec5cb76-94 on port 6
2014-12-29T02:57:19.304Z|00164|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:19.457Z|00165|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.012Z|00166|bridge|WARN|could not open network device phy-br-ens2f0 (No such device)
2014-12-29T02:57:21.015Z|00167|bridge|WARN|could not open network device int-br-ens2f0 (No such device)
2014-12-29T02:57:21.017Z|00168|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-7ec5cb76-94 device failed: No such device
2014-12-29T02:57:21.017Z|00169|netdev_linux|WARN|ha-7ec5cb76-94: removing policing failed: No such device
2014-12-29T02:57:31.142Z|00170|ofproto|INFO|br-int: 1 flow_mods 10 s ago (1 adds)

Simply, the ports created are not found, and it was the reason that the ports are blocked. Ports on br-int and br-ens2f0 are not found, and those ports are not shown by ovs-ofctl show br-int or ova-ofctl show br-ens2f0. As those ports are not found, the plugins/openvswitch/agent/ovs-neutron-agent:treat_vif_port() method automatically block the port by calling self.port_dead().

My question is why the ports are not found, and not shown. And why there admin state is not automatically up.

Could you give me some hints, or any kind of suggestions?

UPDATE

Following the comment by @rahulrajvn, I merged the vs_neutron_plugin.ini into plugin.ini and restarted the neutron-openvswith-agent and neutron-l3-agent. Then, the problem that existing ovs ports are not found is finally resolved. However, another problem shows up:

Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Opening file '/var/lib/neutron/ha_confs/e394b625-e420-4500-b50d-3e65c95401b6/keepalived.conf'.
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Configuration is using : 65206 Bytes
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: Using LinkWatch kernel netlink reflector...
Dec 29 15:05:53 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 15:05:55 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port ha-66dbcd3c-59 tag=1
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 15:06:00 network1 Keepalived_vrrp[18327]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 15:06:02 network1 Keepalived_vrrp[18327]: VRRP_Instance(VR_1) Entering MASTER STATE

That is, the HA network is assigned a vlan number '1'. As our network does not allow VLAN numbers outside the range 400~1000, this value is definitely wrong. As the two HA network in the two neutron network nodes cannot communicate, both becomes a master.

I thought the HA vlan number should come from the default vlan range 'default:400:1000', but it's not. I don't know there the '1' came from. Looking into neutron database in controller node, the database shows that HA network segment's network_type is vlan, and physical_network is default.

Any further hints or comments?

UPDATE 2

After re-setting the controller node and two network nodes, and after struggling with source codes, I found that the br-int and br-ens2f0 bridges has correct flow tables:

[root@network2 agent]# ovs-ofctl dump-flows br-ens2f0
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1344.498s, table=0, n_packets=0, n_bytes=0, idle_age=1344, priority=1 actions=NORMAL
 cookie=0x0, duration=651.521s, table=0, n_packets=330, n_bytes=16832, idle_age=0, priority=4,in_port=9,dl_vlan=2 actions=mod_vlan_vid:401,NORMAL
 cookie=0x0, duration=1343.975s, table=0, n_packets=14, n_bytes=1164, idle_age=651, priority=2,in_port=9 actions=drop
[root@network2 agent]# ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=1363.640s, table=0, n_packets=671, n_bytes=34602, idle_age=1, priority=1 actions=NORMAL
 cookie=0x0, duration=670.231s, table=0, n_packets=0, n_bytes=0, idle_age=670, priority=3,in_port=18,dl_vlan=401 actions=mod_vlan_vid:2,NORMAL
 cookie=0x0, duration=1362.813s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=18 actions=drop
 cookie=0x0, duration=1362.032s, table=0, n_packets=0, n_bytes=0, idle_age=1362, priority=2,in_port=19 actions=drop
 cookie=0x0, duration=1363.576s, table=23, n_packets=0, n_bytes=0, idle_age=1363, priority=0 actions=drop

As given by the flow tables, the traffic from br-ens2f0 with VLAN 401 is converted to VLAN 2, and traffic from br-int with VLAN 2 is converted to 401. That means, local vlan port 2 (which I first thought is the wrong value assigned) is converted to VLAN 401 before being injected to physical network, and vice versa.

However, as indicated by the n_packets values, that only works for one direction, and each HA router does not receive any keepalive messages from each other, and that makes each HA router MASTER at the same time.

And still, I'm seeing following log messages at /var/log/openvswitch/ovs-vswitchd.log.

2014-12-29T08:54:07.812Z|00101|netdev_linux|INFO|ioctl(SIOCGIFHWADDR) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00102|netdev_linux|WARN|ioctl(SIOCGIFINDEX) on ha-d75e6f07-5e device failed: No such device
2014-12-29T08:54:08.776Z|00103|netdev_linux|WARN|ha-d75e6f07-5e: removing policing failed: No such device

Further, the ovs-ofctl show br-int indicates the ha-XXX devices is down:

[root@network1 agent]# ovs-ofctl show br-int

OFPT_FEATURES_REPLY (xid=0x2): dpid:0000d67d91611247
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 29(int-br-ens2f0): addr:e6:97:65:12:a8:b2
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 30(int-br-ex): addr:56:04:61:d4:01:3a
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 32(ha-365b05db-b1): addr:56:04:61:d4:01:3a
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-int): addr:d6:7d:91:61:12:47
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

I think I have come very close to the final answer, but still needs help from others. Any comments and hints are welcomed.

UPDATE 3

OK. By enabling veth support in /etc/neutron/l3_agent.ini and restarting the l3 and openvswitch agent, the logs what I've shown in previously (removing policing failed:No such device) has totally been amortized. One notable difference with previous setting is that the ovs port name "ha-xxxx" is no longer used. Instead, I can see following logs:

Dec 29 19:45:42 network2 kernel: device tapf0afab77-ea entered promiscuous mode

Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_UP): ha-f0afab77-ea: link is not ready
Dec 29 19:45:43 network2 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ha-f0afab77-ea: link becomes ready
Dec 29 19:45:43 network2 Keepalived[9432]: Starting Keepalived v1.2.10 (06/10,2014)
Dec 29 19:45:43 network2 Keepalived[9433]: Starting VRRP child process, pid=9434
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink reflector
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering Kernel netlink command channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Registering gratuitous ARP shared channel
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Opening file '/var/lib/neutron/ha_confs/ed04d4e6-5f00-425d-b856-0cec3ab69ae8/keepalived.conf'.
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Configuration is using : 65206 Bytes
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: Using LinkWatch kernel netlink reflector...
Dec 29 19:45:43 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 19:45:44 network2 avahi-daemon[788]: Registering new address record for fe80::e8e5:6dff:fea5:e912 on tapf0afab77-ea.*.
Dec 29 19:45:44 network2 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 set Port tapf0afab77-ea tag=3
Dec 29 19:45:45 network2 ntpd[899]: Listen normally on 10 tapf0afab77-ea fe80::e8e5:6dff:fea5:e912 UDP 123
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Transition to MASTER STATE
Dec 29 19:45:50 network2 Keepalived_vrrp[9434]: VRRP_Group(VG_1) Syncing instances to MASTER state
Dec 29 19:45:52 network2 Keepalived_vrrp[9434]: VRRP_Instance(VR_1) Entering MASTER STATE

It means, I think, that a tap device is created and virtual link ha-xxx is created on top of it. (Of course, I'm not sure this is correct understanding.) But still, the communication between the two HA routers are not engaged.

Still open for comments and hints.

UPDATE 4

All right. Finally, I solved the problem. The communication between two keepalived processes were blocked because of ens2f0, and br-ens2f0. Originally, it was like below:

[root@network2 agent]# ovs-ofctl show br-ens2f0
OFPT_FEATURES_REPLY (xid=0x2): dpid:000090e2ba1f1ec4
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
 1(ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     current:    COPPER AUTO_NEG
     advertised: 10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     supported:  10MB-HD 10MB-FD 100MB-HD 100MB-FD 1GB-FD COPPER AUTO_NEG AUTO_PAUSE
     speed: 0 Mbps now, 1000 Mbps max
 13(phy-br-ens2f0): addr:3e:77:de:63:a0:95
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-ens2f0): addr:90:e2:ba:1f:1e:c4
     config:     PORT_DOWN
     state:      LINK_DOWN
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0

As you can see, the br-ens2f0 and ens2f0 interfaces are DOWN. So I did...

[root@network2 agent]# ip link set br-ens2f0 up
[root@network2 agent]# ip link set ens2f0 up

Then... finally.... the second keepalived turns into BACKUP mode.

Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Received higher prio advert
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Instance(VR_1) Entering BACKUP STATE
Dec 29 20:10:10 network2 Keepalived_vrrp[12803]: VRRP_Group(VG_1) Syncing instances to BACKUP state

UPDATE 5

I thought I have come to an end of this problem, but it was not. following command shut all keepalived processes down:

neutron router-gateway-set demo-router ext-net

with following log:

Dec 29 20:34:17 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 -- --if-exists del-port br-int tapa761b6ad-9e
Dec 29 20:34:17 network1 kernel: device tapa761b6ad-9e left promiscuous mode
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing address record for fe80::f0cd:1dff:fe59:648c on tapa761b6ad-9e.
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing workstation service for tapa761b6ad-9e.
Dec 29 20:34:18 network1 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /bin/ovs-vsctl --timeout=10 -- --if-exists del-port br-int tap6030d7fc-01
Dec 29 20:34:18 network1 kernel: device tap6030d7fc-01 left promiscuous mode
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing address record for fe80::38ea:f0ff:fef9:aa39 on tap6030d7fc-01.
Dec 29 20:34:18 network1 avahi-daemon[789]: Withdrawing workstation service for tap6030d7fc-01.
Dec 29 20:34:18 network1 Keepalived[15220]: Stopping Keepalived v1.2.10 (06/10,2014)
Dec 29 20:34:18 network1 Keepalived_vrrp[15221]: VRRP_Instance(VR_1) sending 0 priority
Dec 29 20:34:18 network1 Keepalived_vrrp[15221]: Netlink: error: No such device, type=(21), seq=1419852677, pid=0
Dec 29 20:34:19 network1 ntpd[954]: Deleting interface #18 tap6030d7fc-01, fe80::38ea:f0ff:fef9:aa39#123, interface stats: received=0, sent=0, dropped=0, active_time=181 secs
Dec 29 20:34:19 network1 ntpd[954]: Deleting interface #17 tapa761b6ad-9e, fe80::f0cd:1dff:fe59:648c#123, interface stats: received=0, sent=0, dropped=0, active_time=181 secs

At this point, I thought that the veth config can be a source of this evil, and removed the configuration from all files. And found out that veth config was totally unnecessary. (Thus, you can ignore UPDATE 3). However, the command 'neutron router-interface-set demo-router ext-net' creates exactly the same problem.

Any comments?

UPDATE 6 - NOW WORKING

Strange symptom that I mentioned in update 5 was raised by the wrong gateway_external_network_id value. (This value was outdated when I totally reset the neutron database.)

Fixing it, rebooting all the network nodes, and I tested again. Then, it works, finally. One unhappy thing about my setting is that interfaces such as br-ex, enp2s0f1 (bound to br-ex), br-ens2f0, ens2f0 (bound to br-ens2f0) are not up by default when I reboot. I should manually make them up by 'ip link set xxxx up'. I first thought that this value should be correctly set by neutron agents, but it was wrong guess. Am I missing something?

Anyway,thank you guys for all the great support.