[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Enabling Cluster Metrics



----- Original Message -----
> From: "Alejandro Nieto Boza" <ale90nb gmail com>
> To: "Clayton Coleman" <ccoleman redhat com>, mwringe redhat com
> Cc: "users" <users lists openshift redhat com>
> Sent: Thursday, February 11, 2016 3:02:57 AM
> Subject: Re: Enabling Cluster Metrics
> 
> I'm running without persistent storage.
> 
> When the pods are "turned on" more than 20 minutes they change to state
> "Running" and working. Is it possible that it due to insufficient memory?

20 minutes seems like a long time here. The containers are running a full web server and database which will take up some memory resources.

> I've watch their state for a day and the pods are working. I will try in
> bigger scenarios when I can and I will post if the error appears again.
> 
> 
> Now, I've got this problem (also I've got it previously):
> 
> I've launched a pod especifying requests and limits for cpu and memory but
> when I watch on the pod overview pages, the value of metrics graphs is 0
> (with any pod, not only with this).

So before you launch a pod with limits, you can get graphs, and once you deploy a pod with limits you don't see any graphs for anything anymore?

I have seen the issue where pods with limits having zero values, but that has been due to pods with limits being configured to run on a separate node and the certificates for that node were not configured properly.

> 
> 
> Heapster logs:
> 
> W0210 18:11:48.637940       1 reflector.go:224] /tmp/gopath/src/
> k8s.io/heapster/sources/pods.go:173: watch of *api.Pod ended with: 401: The
> event in requested index is outdated and cleared (the requested history has
> been cleared [2322574/2322059]) [2323573]
> 
> 
> # curl -X GET https://hawkular-metrics.example.com/hawkular/metrics/status
> -k
> 
> {"MetricsService":"STARTED","Implementation-Version":"0.12.0.Final"....

Ok, good, this means that the Hawkular Metrics and Cassandra containers are running properly.

> 
> Here curl doesn't get JSON object:
> 
> # curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
> 
>        -H "Hawkular-tenant: test"
> 
>        -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics
> -k| python -m json.tool
> 
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
>  Current
> 
>                                  Dload  Upload   Total   Spent    Left
>  Speed
> 
> 100    68  100    68    0     0    110      0 --:--:-- --:--:-- --:--:--
> 110
> 
> No JSON object could be decoded

Are you sure the 'test' project exists and it has pods running within it?

If you are running the metrics components in openshift-infra can you please set the Hawkular-tenant value to 'openshift-infra'? If the metric components are running here then there should be something showing up at that url.


> 
> It seems like the problem isn't due to certificates issues, the node IP
> appears correctly in certificates.
> 
> 
> 
> 
> El mié., 10 feb. 2016 17:56, Clayton Coleman <ccoleman redhat com> escribió:
> 
> > I don't know what unconfigured table means (beyond maybe your tables need
> > to be recreated because you have an old version) but I bet Matt does.
> >
> > On Feb 10, 2016, at 10:50 AM, Alejandro Nieto Boza <ale90nb gmail com>
> > wrote:
> >
> > Thanks, the Openshift DNS wasn't running correctly. Now the error doesn't
> > appear but...
> >
> > Now I've an error (this error have already appears to me in other
> > scenarios).
> >
> > This is the state of my metrics pods:
> >
> > # oc get pods
> > NAME                         READY     STATUS      RESTARTS   AGE
> > hawkular-cassandra-1-j09f6   1/1       Running     0          10m
> > hawkular-metrics-xpa33       0/1       Error       1          10m
> > heapster-42vyz               0/1       Error       2          10m
> > metrics-deployer-e5e3v       0/1       Completed   0          12m
> >
> >
> > # oc get pods
> > NAME                         READY     STATUS             RESTARTS   AGE
> > hawkular-cassandra-1-j09f6   1/1       Running            0          12m
> > hawkular-metrics-xpa33       0/1       Completed          2          12m
> > heapster-42vyz               0/1       CrashLoopBackOff   4          12m
> > metrics-deployer-e5e3v       0/1       Completed          0          15m
> >
> >
> > The pod hawkular-metrics change its state between completed and error (?)
> >
> >
> > These are some logs of hawkular-metrics pod:
> >
> >  # oc logs hawkular-metrics-xpa33
> > 15:22:08,104 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-1)
> > MSC000001: Failed to start service
> > jboss.deployment.unit."hawkular-metrics-api-jaxrs.war":
> > org.jboss.msc.service.StartException in service
> > jboss.deployment.unit."hawkular-metrics-api-jaxrs.war": Failed to start
> > service
> >         at
> > org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1904)
> >         at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >         at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >         at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.lang.IllegalStateException: Container is down
> > ...............
> >
> > 15:22:08,211 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-1)
> > MSC000001: Failed to start service
> > jboss.serverManagement.controller.management.http:
> > org.jboss.msc.service.StartException in service
> > jboss.serverManagement.controller.management.http: Failed to start service
> >         at
> > org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1904)
> > ...............
> >
> >
> > 15:29:35,416 FATAL
> > [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle]
> > (metricsservice-lifecycle-thread) HAWKMETRICS200006: An error occurred
> > trying to connect to the Cassandra cluster:
> > com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured
> > table retentions_idx
> >         at
> > com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35)
> > .................
> >
> >
> >
> > And obviously heapster cannot connect to hawkular-metrics:
> >
> > # oc logs heapster-42vyz
> > Could not connect to https://hawkular-metrics:443/hawkular/metrics/status.
> > Curl exit code: 7. Status Code 000
> > 'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible
> > [HTTP status code: 000. Curl exit code 7]. Retrying.
> >
> >
> > hawkular-cassandra logs don't show errors.
> >
> >
> >
> > 2016-02-10 14:50 GMT+01:00 Clayton Coleman <ccoleman redhat com>:
> >
> >> Can you try from one of your nodes to reach the nameserver directly and
> >> via the proxy?
> >>
> >>     dig @<your master ip> kubernetes.default.svc.cluster.local
> >>     dig @172.30.0.1 kubernetes.default.svc.cluster.local
> >>
> >>
> >>
> >> On Feb 10, 2016, at 8:40 AM, Alejandro Nieto Boza <ale90nb gmail com>
> >> wrote:
> >>
> >> It's like you said.
> >>
> >> Test logs:
> >> # oc logs test
> >>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> >>  Current
> >>                                  Dload  Upload   Total   Spent    Left
> >>  Speed
> >>   0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--
> >>     0curl: (6) Could not resolve host: kubernetes; Unknown error
> >>
> >>
> >>
> >>
> >> Test2 logs:
> >> # oc logs test2
> >> nameserver "172.30.0.1"
> >> nameserver "another-ip"
> >>
> >>
> >>
> >>
> >> # oc get svc/kubernetes -n default
> >> NAME         CLUSTER_IP   EXTERNAL_IP   PORT(S)                 SELECTOR
> >>   AGE
> >> kubernetes   "172.30.0.1"   <none>        443/TCP,53/UDP,53/TCP   <none>
> >>     92d
> >> search test.svc.cluster.local svc.cluster.local cluster.local test.es
> >> options ndots:5
> >>
> >>
> >>
> >>
> >>
> >>
> >> 2016-02-10 14:01 GMT+01:00 Clayton Coleman <ccoleman redhat com>:
> >>
> >>> That seems to indicate that inside the deployment container DNS is not
> >>> working.  Can you do the following to check:
> >>>
> >>>     oc run --image centos:7 test --generator=run-pod/v1 --restart=Never
> >>> -- curl https://kubernetes
> >>>     oc logs test
> >>>
> >>> And then
> >>>
> >>>     oc run --image centos:7 test2 --generator=run-pod/v1 --restart=Never
> >>> -- cat /etc/resolv.conf
> >>>     oc logs test2
> >>>
> >>> The latter should have a nameserver pointing to the master by its
> >>> service IP - the command:
> >>>
> >>>     oc get svc/kubernetes -n default
> >>>
> >>> Should show that same IP
> >>>
> >>> On Feb 10, 2016, at 7:39 AM, Alejandro Nieto Boza <ale90nb gmail com>
> >>> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I've been following the following steps to deploy metrics:
> >>>
> >>> https://docs.openshift.org/latest/install_config/cluster_metrics.html
> >>>
> >>> When I run the following command:
> >>>
> >>>
> >>> oc process -f metrics.yaml -v \
> >>> HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.example.com,USE_PERSISTENT_STORAGE=false
> >>> \
> >>> | oc create -f -
> >>>
> >>>
> >>> I get the following error:
> >>>
> >>> Creating the Cassandra Certificate Secrets configuration json file
> >>> +++ base64
> >>> ++++ echo hawkular-cassandra
> >>> +++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra.truststore
> >>> +++ base64
> >>> ++++ echo RjR--747mUzmTS-
> >>> +++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra.pem
> >>> ++ echo
> >>> ++ echo 'Creating the Cassandra Certificate Secrets configuration json
> >>> file'
> >>> ++ cat
> >>> +++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra.cert
> >>> +++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra-ca.cert
> >>> Creating Hawkular Metrics & Cassandra Secrets
> >>> ++ echo 'Creating Hawkular Metrics & Cassandra Secrets'
> >>> ++ oc create -f /etc/deploy/_output/hawkular-metrics-secrets.json
> >>> unable to connect to a server to handle "secrets": Get
> >>> https://kubernetes.default.svc:443/api: dial tcp: lookup
> >>> kubernetes.default.svc: no such host
> >>>
> >>>
> >>>
> >>>
> >>> # oc get pods
> >>> NAME                     READY     STATUS    RESTARTS   AGE
> >>> metrics-deployer-7gcpd   0/1       Error     0          39m
> >>>
> >>>
> >>> How can I know if my kubernetes master URL is
> >>> https://kubernetes.default.svc:443 or is another URL?
> >>>
> >>> My Openshift installation isn't an update.
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> users lists openshift redhat com
> >>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
> >>>
> >>>
> >>
> >
> 


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]