ulimits or other limits?


I'm running 2 physical 768 GB memory nodes, OKD 3.11 on CentOS 7, systemd 219, kernel 3.10.0-957.27.2.el7.x86_64. Tuned has raised all limits to high numbers, like threads-max 6175921 and pid_max 458752.
I don't see any messages from the kernel in the journal.

"ulimit -Ha"
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 3087960
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 999999
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 3087960
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Problem is when I run 30+ PODs, some display the error and restart: failed to execute /bin/bash: Resource temporarily unavailable. Totally random...

I'm puzzled, I don't know where to start debugging.


