[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Metrics - Could not connect to Cassandra cluster



This happened to us.  The problem is probably that you have your metrics replication controllers set to pull the latest versions of the images.  (I think this is the default.  Bad!)  The current latest version needs different configuration, so your existing configuration no longer works.  You probably had this problem for a long time but didn't notice until some component of the system restarted for some reason, triggering a new image pull.

We fixed this by changing the images specified in the replication controllers.  For example, in rc/hawkular-metrics, we changed

    image: openshift/origin-metrics-hawkular-metrics:latest

to

    image: openshift/origin-metrics-hawkular-metrics:v1.2.1

While I was debugging, I restarted hawkular-cassandra, so it got upgraded, too.  I don't know if it had already gotten upgraded; if yours hasn't, then you can avoid losing data.  So, I had to set the :v1.2.1 tag on all three components (hawkular-cassandra, hawkular-metrics, and heapster) and also delete all data (both the data directory and the commitlog directory) on the hawkular-cassandra PV.  In order to delete that data, I had to find the mountpoint on the node where the hawkular-cassandra pod was running and delete the files from the host side.  Because hawkular-cassandra was failing, I was unable to use `oc rsh` to get in.

On Sat, Oct 22, 2016 at 2:32 PM, Miloslav Vlach <miloslav vlach rohlik cz> wrote:
Hi,

I don’t know why is on one server problem with connection to the casandra database.

The hawkular write 

19:27:15,354 WARN [org.hawkular.alerts.engine.impl.CassCluster] (ServerService Thread Pool -- 75) Could not connect to Cassandra cluster - assuming is not up yet. Cause: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.TransportException: [/127.0.0.1] Cannot connect))


But the endpoint is not 127.0.0.1:9042

On the other server outside cluster 

19:26:54,909 WARN [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200003: Could not connect to Cassandra cluster - assuming its not up yet: All host(s) tried for query failed (tried: hawkular-cassandra/172.30.155.228:9042 (com.datastax.driver.core.exceptions.TransportException: [hawkular-cassandra/172.30.155.228] Cannot connect))

but after a few second it connects to the casandra.

Know somebody where is the problem ?

Instalation performed via ansible. All works before restart.

Thanks Mila

_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users




--
Alex Wauck // DevOps Engineer

E X O S I T E 
www.exosite.com 

Making Machines More Human.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]