[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Openshift HA environment - keepalived high number of close syscalls



Thanks, Ram!
I did notice this in the change log of keepalived 1.2.20 
http://www.keepalived.org/changelog.html

"Optimise closure of fds before invoking scripts.
Every time before a script was invoked, closeall() was called,
which would spin through 1024 file descriptors closing them, even
though the vast majority were not open, resulting in 1024 system
calls. To avoid that, open all sockets and file descriptors
(except fd 0/1/2) with the CLOEXEC flag set, so that the fds will
be closed by the kernel when the script is exec'd."

----- Original Message -----
From: Ram Ranganathan <rrangana redhat com>
To: Chuck Sochin <csochin westwardone net>
Cc: users <users lists openshift redhat com>
Sent: Wed, 06 Apr 2016 05:04:50 -0000 (UTC)
Subject: Re: Openshift HA environment - keepalived high number of close syscalls
Haven't seen this till you mentioned it. I can see the close calls in my local env. It looks like it happens in a new process - after a clone() syscall at about a couple of seconds apart. So it is likely part of the script that does the health check: 
     script "</dev/tcp/${ip}/${watch_port}"
But I don't see a slowdown on the cpu side on my instance - its running about 1% for the last 30 mins odd  so suspect that might have to do with the agent/sysdig in your case. 
Filing a bug would be good - spent some time right now but couldn't figure out what's causing it or if its a "feature". 
 
Thanks,
Ram//

On Tue, Apr 5, 2016 at 2:04 PM, Chuck Sochin <csochin westwardone net> wrote:
Using OSEv3.1.1

I'm looking to setup sysdig in our native HA openshift environment, but having issues getting the agent to run on our infra nodes hosting keepalived and ha-proxy -- agent runs without issue on all the other nodes in our env.

After the agent has been running about an hour or two, the node hangs and our hypervisor reports 100% cpu utilization. A power reset is the only option to bring the node back to life. The problem may be with keepalived doing an extremely large number(around 17 million in a minute) of "close" syscall operations, and it looks like those close operations are on any available fd. Is this expected behavior of keepalived running in an OSEv3.1.1 HA environment?

Thanks!




_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


--
Ram//
main(O,s){s=--O;10<putchar(3^O?97-(15&7183>>4*s)*(O++?-1:1):10)&&\
main(++O,s++);}

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]