[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Openshift, vip-manager, and DHCP



Good Morning,

I’m curious if anyone is successfully running openshift in an environment where they manage their own dhcp clients and scopes. Our infrastructure recently had an issue and we are struggling to find a root cause. In our environment we run two vip-manager POD’s which manages 2 ip addresses.

One of our suspicions has led us to believe that keepalived doesn’t play nice with dhcp. As an example, if the dhcp client dies or renews it’s IP address, the vip-manager POD recognizes this event. He logs the VIP he’s managing as well as the IP assigned to the node is removed, however, keepalived continues to send out the VRRP’s as if he’s still MASTER for that IP.

This puts us in a bad spot, as the BACKUP keepalived never takes this IP address over and this IP is no longer assigned to anything. Here’s example log output from the POD that I forced this failure:

10.0.0.1 == address assigned to node via DHCP

10.0.0.2 == address assigned to vip_manager_VIP_1

10.0.0.3 == address assigned to vip_manager_VIP_2

10.1.4.1 == lbr0/tun0

  - Loading ip_vs module ...
  - Checking if ip_vs module is available ...
ip_vs                 140944  0
  - Module ip_vs is loaded.
  - Generating and writing config to /etc/keepalived/keepalived.conf
  - Starting failover services ...
Starting Healthcheck child process, pid=136
Initializing ipvs 2.6
Starting VRRP child process, pid=137
Netlink reflector reports IP 10.0.0.1 added
Netlink reflector reports IP 10.0.0.1 added
Netlink reflector reports IP 10.1.4.1 added
Netlink reflector reports IP 10.1.4.1 added
Netlink reflector reports IP 10.1.4.1 added
Netlink reflector reports IP 10.1.4.1 added
Registering Kernel netlink reflector
Registering Kernel netlink reflector
Registering Kernel netlink command channel
Registering Kernel netlink command channel
Registering gratuitous ARP shared channel
Opening file '/etc/keepalived/keepalived.conf'.
Opening file '/etc/keepalived/keepalived.conf'.
Configuration is using : 8733 Bytes
Truncating auth_pass to 8 characters
Truncating auth_pass to 8 characters
Configuration is using : 73522 Bytes
Using LinkWatch kernel netlink reflector...
VRRP_Instance(vip_manager_VIP_1) Entering BACKUP STATE
VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(9,10)]
VRRP_Instance(vip_manager_VIP_2) Transition to MASTER STATE
VRRP_Instance(vip_manager_VIP_2) Entering FAULT STATE
VRRP_Script(chk_vip_manager) succeeded
VRRP_Instance(vip_manager_VIP_2) prio is higher than received advert
VRRP_Instance(vip_manager_VIP_2) Transition to MASTER STATE
VRRP_Instance(vip_manager_VIP_2) Received lower prio advert, forcing new election
VRRP_Instance(vip_manager_VIP_2) Entering MASTER STATE
VRRP_Instance(vip_manager_VIP_2) setting protocol VIPs.
Netlink reflector reports IP 10.0.0.3 added
VRRP_Instance(vip_manager_VIP_2) Sending gratuitous ARPs on eno16780032 for 10.0.0.3
VRRP_Instance(vip_manager_VIP_2) Sending gratuitous ARPs on eno16780032 for 10.0.0.3

...<dhclient renews the ip address>...

Netlink reflector reports IP 10.0.0.1 removed
Netlink reflector reports IP 10.0.0.1 removed
Netlink reflector reports IP 10.0.0.3 removed
Netlink reflector reports IP 10.0.0.3 removed
Netlink reflector reports IP 10.0.0.1 added
Netlink reflector reports IP 10.0.0.1 added

And the other vip-manager pod is still receiving VRRP’s for 10.0.0.3, therefore never takes over this IP address, so effectively half of the traffic (pending DNS round-robin) is being lost at this point.

Our recovery option at this point is to restart the network, which would stop the VRRP packets long enough to cause a failover, or restart the effected POD.

The version of keepalived provided by RHEL is 10 minor revisions behind, I’m curious if there may be a benefit to getting this package updated. Pending any advice from anyone my next step in troubleshooting this would be to go about building my own version of the vip-manager with an upgraded version of keepalived to see if this issue continues.



-- 
John Skarbek


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]