[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: weird issue with etcd



hello

yes.. i have a external balancer in front of my masters for HA as doc says.

i don’t have any balancer in front of my etcd servers for masters connection, it’s not necessary right? masters will try all etcd availables it one is down right?

i don’t know why but none of my masters were able to connect to the second etcd instance, but using telnet from their shell worked .. so it was not a net o fw issue..


best regards.

> El 13 jun 2016, a las 17:53, Clayton Coleman <ccoleman redhat com> escribió:
> 
> I have not seen that particular issue.  Do you have a load balancer in
> between your masters and etcd?
> 
> On Fri, Jun 10, 2016 at 5:55 AM, Julio Saura <jsaura hiberus com> wrote:
>> hello
>> 
>> i have an origin 3.1 installation working cool so far
>> 
>> today one of my etcd nodes ( 1 of 2 ) crashed and i started having problems..
>> 
>> i noticed on one of my master nodes that it was not able to connect to second etcd server and that the etcd server was not able to promote as leader..
>> 
>> 
>> un 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 is starting a new election at term 10048
>> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 became candidate at term 10049
>> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 received vote from 12c8a31c8fcae0d4 at term 10049
>> jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 [logterm: 8, index: 4600461] sent vote request to bf80ee3a26e8772c at term 10049
>> jun 10 11:09:56 openshift-balancer02 etcd[47218]: got unexpected response error (etcdserver: request timed out)
>> 
>> my masters logged that they were not able to connect to the etcd
>> 
>> er.go:218] unexpected ListAndWatch error: pkg/storage/cacher.go:161: Failed to list *extensions.Job: error #0: dial tcp X.X.X.X:2379: connection refused
>> 
>> so i tried a simple test, just telnet from masters to the etcd node port ..
>> 
>> [root openshift-master01 log]# telnet X.X.X.X 2379
>> Trying X.X.X.X...
>> Connected to X.X.X.X.
>> Escape character is '^]’
>> 
>> so i was able to connect from masters.
>> 
>> i was not able to recover my oc masters until the first etcd node rebooted .. so it seems my etcd “cluster” is not working without the first node ..
>> 
>> any clue?
>> 
>> thanks
>> 
>> 
>> _______________________________________________
>> users mailing list
>> users lists openshift redhat com
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]