[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: istio pods



Samuel,

    See below, the output from "oc describe node <node id>" for one of my worker nodes. In particular, I'm interested in the "Capacity" and "Allocatable" sections. In the Capacity section, it says that this is 2 CPUs. When I first noticed this, I had defined the workers using c5-xlarge machines - which have 4 vpcus. I thought that maybe OpenShift itself is reserving 2 CPUs for itself. But I then rebuilt the cluster with c5-2xlarge machines, and the output you see, below, is from that. It shows 2 CPUs as well. So this seems like OpenShift isn't recognizing the additional hardware. How do I fix this?

name:               ip-10-0-156-206.us-west-1.compute.internal
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m4.large
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-west-1
                    failure-domain.beta.kubernetes.io/zone=us-west-1b
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-10-0-156-206
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
                    node.openshift.io/os_id=rhcos
Annotations:        machine.openshift.io/machine: openshift-machine-api/two-4z45k-worker-us-west-1b-t4dln
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-f4169460716c78be83ccb2609dd91fc3
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-f4169460716c78be83ccb2609dd91fc3
                    machineconfiguration.openshift.io/state: Done
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 10 Oct 2019 07:30:18 -0400
Taints:             <none>
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Thu, 10 Oct 2019 07:46:01 -0400   Thu, 10 Oct 2019 07:30:18 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Thu, 10 Oct 2019 07:46:01 -0400   Thu, 10 Oct 2019 07:30:18 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Thu, 10 Oct 2019 07:46:01 -0400   Thu, 10 Oct 2019 07:30:18 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Thu, 10 Oct 2019 07:46:01 -0400   Thu, 10 Oct 2019 07:31:19 -0400   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   10.0.156.206
  Hostname:     ip-10-0-156-206.us-west-1.compute.internal
  InternalDNS:  ip-10-0-156-206.us-west-1.compute.internal
Capacity:
 attachable-volumes-aws-ebs:  39
 cpu:                         2
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      8162888Ki
 pods:                        250
Allocatable:
 attachable-volumes-aws-ebs:  39
 cpu:                         1500m
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      7548488Ki
 pods:                        250
System Info:
 Machine ID:                              23efe37e2b244bd788bb8575cd340bfd
 System UUID:                             ec25c230-d02c-99cf-0540-bad276c8cc73
 Boot ID:                                 1f3a064f-a24e-4bdf-b6b0-c7fd3019757e
 Kernel Version:                          4.18.0-80.11.2.el8_0.x86_64
 OS Image:                                Red Hat Enterprise Linux CoreOS 42.80.20191001.0 (Ootpa)
 Operating System:                        linux
 Architecture:                            amd64
 Container Runtime Version:               cri-o://1.14.10-0.21.dev.rhaos4.2.git0d4a906.el8
 Kubelet Version:                         v1.14.6+d3a139f63
 Kube-Proxy Version:                      v1.14.6+d3a139f63
ProviderID:                               aws:///us-west-1b/i-07e495331f3f25ac0
Non-terminated Pods:                      (18 in total)
  Namespace                               Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                               ----                                        ------------  ----------  ---------------  -------------  ---
  openshift-cluster-node-tuning-operator  tuned-9qnjd                                 10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         16m
  openshift-dns                           dns-default-skgfs                           110m (7%)     0 (0%)      70Mi (0%)        512Mi (6%)     16m
  openshift-image-registry                image-registry-584f455476-9q7b8             100m (6%)     0 (0%)      256Mi (3%)       0 (0%)         16m
  openshift-image-registry                node-ca-74h7h                               10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         15m
  openshift-ingress                       router-default-85b6848bdf-679xf             100m (6%)     0 (0%)      256Mi (3%)       0 (0%)         16m
  openshift-machine-config-operator       machine-config-daemon-6ht2m                 20m (1%)      0 (0%)      50Mi (0%)        0 (0%)         15m
  openshift-marketplace                   certified-operators-5cb88dd798-lcfvc        10m (0%)      0 (0%)      100Mi (1%)       0 (0%)         17m
  openshift-marketplace                   community-operators-7f8987f496-8rgzq        10m (0%)      0 (0%)      100Mi (1%)       0 (0%)         17m
  openshift-marketplace                   redhat-operators-cd495bc4f-fcm5t            10m (0%)      0 (0%)      100Mi (1%)       0 (0%)         17m
  openshift-monitoring                    alertmanager-main-1                         100m (6%)     100m (6%)   225Mi (3%)       25Mi (0%)      12m
  openshift-monitoring                    kube-state-metrics-6b66989cb7-nlqbm         30m (2%)      0 (0%)      120Mi (1%)       0 (0%)         17m
  openshift-monitoring                    node-exporter-bhrhz                         10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         16m
  openshift-monitoring                    openshift-state-metrics-6bf647b484-sfgjs    120m (8%)     0 (0%)      190Mi (2%)       0 (0%)         17m
  openshift-monitoring                    prometheus-adapter-66d6b69459-bcq5p         10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         13m
  openshift-monitoring                    prometheus-k8s-1                            430m (28%)    200m (13%)  1134Mi (15%)     50Mi (0%)      14m
  openshift-multus                        multus-jlmwx                                10m (0%)      0 (0%)      150Mi (2%)       0 (0%)         16m
  openshift-sdn                           ovs-hdml2                                   200m (13%)    0 (0%)      400Mi (5%)       0 (0%)         16m
  openshift-sdn                           sdn-vwh22                                   100m (6%)     0 (0%)      200Mi (2%)       0 (0%)         16m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests      Limits
  --------                    --------      ------
  cpu                         1390m (92%)   300m (20%)
  memory                      3451Mi (46%)  587Mi (7%)
  ephemeral-storage           0 (0%)        0 (0%)
  attachable-volumes-aws-ebs  0             0
Events:                       <none>



On Wed, Oct 9, 2019 at 8:30 AM Samuel Martín Moro <faust64 gmail com> wrote:
You have two master nodes? You'ld rather go with 1 or 3. With 2 masters, your etcd quorum is 2. If you lose a master, the API would be unavailable.

Now ... c5-xlarge should be fine.
Not sure why your console doesn't show everything (and not familiar with that dashboard yet). as a wild guess, probably some delay collecting metrics. Though you should see something, eventually.

If you use "oc describe node <node-name>", you should see the reservations (requests & limits) for that node.
Might be able to figure out what's eating up your resources.

Depending on what openshift components you're deploying, you may already be using quite a lot.
Especially if you don't have infra nodes, and did deploy EFK and/or hawkular/cassandra. Prometheus could use some resources as well.
Meanwhile, istio itself can ship with more or less components, ...

If using EFK: you may be able to lower resources requests/limits for ElasticSearch
If using Hawkular: same remark regarding Cassandra
Hard to say, without seeing it. But you can probably free up some resources here and there.


Good luck,

Regards.

On Wed, Oct 9, 2019 at 1:47 PM Just Marvin <marvin the cynical robot gmail com> wrote:
Samuel,

    So it is CPU. But, I destroyed the cluster, gave it machines with twice as much memory, retried and got the same problem:

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  16s (x4 over 3m12s)  default-scheduler  0/4 nodes are available: 2 Insufficient cpu, 2 node(s) had taints that the pod didn't tolerate.


    I'm guessing that the two pods with taints are the two master nodes, but the other two are c5-xlarge machines. But here is maybe a relevant observation. As soon as I log into the cluster, I see this on my main dashboard.

image.png

    Is there perhaps a problem with the CPU resource monitoring that is causing my problems?

Regards,
Marvin

On Sun, Oct 6, 2019 at 3:52 PM Samuel Martín Moro <faust64 gmail com> wrote:
you can use "oc describe pod <pod-name>", to figure out what's going on with your pod.
could be that you're out of cpu/memory.

Regards.

On Sun, Oct 6, 2019 at 9:27 PM Just Marvin <marvin the cynical robot gmail com> wrote:
Hi,

[zaphod oc3027208274 ocp4.2-aws]$ oc get pods
NAME                              READY   STATUS    RESTARTS   AGE
istio-citadel-7cb44f4bb-tccql     1/1     Running   0          9m35s
istio-galley-75599dbc67-b4mgx     1/1     Running   0          8m41s
istio-policy-56476c984b-c7t8j     0/2     Pending   0          8m23s
istio-telemetry-d5bbd7d7b-v8kjq   0/2     Pending   0          8m24s
jaeger-5d9dfdfb67-mv8mp           2/2     Running   0          8m45s
prometheus-685bdbdc45-hmb9f       2/2     Running   0          9m17s
[zaphod oc3027208274 ocp4.2-aws]$ 


    The pods in Pending state don't seem to be moving forward. The operator logs aren't showing anything informative about why this might be. Is this normal, or if there is a problem, and if so, how would I figure out the cause?

Regards,
Marvin
_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


--
Samuel Martín Moro
{EPITECH.} 2011

"Nobody wants to say how this works.
 Maybe nobody knows ..."
                      Xorg.conf(5)


--
Samuel Martín Moro
{EPITECH.} 2011

"Nobody wants to say how this works.
 Maybe nobody knows ..."
                      Xorg.conf(5)

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]