[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Catching kill due to oom



I have opened the issue here https://github.com/openshift/origin/issues/15032

On Tue, 4 Jul 2017 at 04:41 Ben Parees <bparees redhat com> wrote:
In this case the container being killed is probably the assemble-container, which is not even part of the build pod, unfortunately.  It's a container that is manually launched by the build-pod-container via direct access to the docker socket.  It is subject to the same cgroup constraints as the build pod (thus a frequent issue is running something like maven as part of your assemble script, and having it try to use more memory than the cgroup allows because maven sees the entire host memory as available.  Our s2i images try to configure maven more appropriately now to avoid that).

However if the container is being oom killed by the system itself (vs the process inside the container hitting an OOM and failing), i'm not sure what options we have to report that back on the build.  Perhaps there is a way for us to retrieve that information from the terminated container (as apparently k8s does for the pod-managed containers).  Can you open an issue against origin and we'll track it there?



On Mon, Jul 3, 2017 at 11:20 AM, Seth Jennings <sjenning redhat com> wrote:
Hey Andrew,  It is true that we don't generate a pod level event when
a container in the pod is OOM killed.   There is a container status in
the pod status that indicates with OOM with status.state.reason set to
OOMKilled.

status:
...
  containerStatuses:
  - containerID:
docker://f2389dccd11a6575aeccbc12d360bc02eb0d2cf67c0f8d439fda57637e916628
...
    state:
      terminated:
        containerID:
docker://f2389dccd11a6575aeccbc12d360bc02eb0d2cf67c0f8d439fda57637e916628
        exitCode: 1
        finishedAt: 2017-07-03T15:08:40Z
        reason: OOMKilled
        startedAt: 2017-07-03T15:08:40Z

Since builds have a restartPolicy: Never, the status isn't changed on
a restart, and you can see this in on the Pods tab in the Status
column in the web console.

Thanks,
Seth





On Sun, Jul 2, 2017 at 7:23 PM, Andrew Lau <andrew andrewklau com> wrote:
> Hi,
>
> I'm often seeing issues where builds are getting killed due to oom. I'm
> hoping to get some ideas on ways we could perhaps catch the OOM for the
> purpose of displaying some sort of useful message.
>
> Based on what I am seeing, a SIGKILL is being sent to the container, so it's
> not possible to catch anything like a SIGTERM from within the container to
> at least display an error message in the logs. Users are often left confused
> wondering why their build suddenly died.
>
> It's also not currently possible to configure the memory limit for the
> buildconfig in the web console.
>
> _______________________________________________
> users mailing list
> users lists openshift redhat com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>

_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users



--
Ben Parees | OpenShift


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]