[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: weird issue with etcd



regarding the certs, i used ansible to install origin so i guess ansible should have done it right …


El 21 jun 2016, a las 15:29, Julio Saura <jsaura hiberus com> escribió:

hello

yes, they are synced with and internal NTP server .. 

gonna try ectdctl thanks!


El 21 jun 2016, a las 15:20, Jason DeTiberus <jdetiber redhat com> escribió:

On Tue, Jun 21, 2016 at 7:28 AM, Julio Saura <jsaura hiberus com> wrote:
yes

working

[root openshift-master01 ~]# telnet XXXXX 2380
Trying XXXX...
Connected to XXXX.
Escape character is '^]'.
^CConnection closed by foreign host.


Have you verified that time is syncd between the hosts? I'd also check
the peer certs between the hosts... Can you connect to the hosts using
etcdctl? There should be a status command that will give you more
information.



El 21 jun 2016, a las 13:21, Jason DeTiberus <jdetiber redhat com> escribió:

Did you verify connectivity over the peering port as well (2380)?

On Jun 21, 2016 7:17 AM, "Julio Saura" <jsaura hiberus com> wrote:

hello

same problem

jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
F0621 13:11:03.155246   59618 auth.go:141] error #0: dial tcp XXXX:2379:
connection refused ( the one i rebooted )
jun 21 13:11:03 openshift-master01 atomic-openshift-master-api[59618]:
error #1: client: etcd member https://YYYY:2379 has no leader

i rebooted the etcd server and my master is not able to use other one

still able to connect from both masters using telnet to the etcd port ..

any clue? this is weird.


El 14 jun 2016, a las 9:28, Julio Saura <jsaura hiberus com> escribió:

hello

yes is correct .. it was the first thing i checked ..

first master

etcdClientInfo:
ca: master.etcd-ca.crt
certFile: master.etcd-client.crt
keyFile: master.etcd-client.key
urls:
 - https://openshift-balancer01:2379
 - https://openshift-balancer02:2379


second master

etcdClientInfo:
ca: master.etcd-ca.crt
certFile: master.etcd-client.crt
keyFile: master.etcd-client.key
urls:
 - https://openshift-balancer01:2379
 - https://openshift-balancer02:2379

dns names resolve in both masters

Best regards and thanks!


El 13 jun 2016, a las 18:45, Scott Dodson <sdodson redhat com>
escribió:

Can you verify the connection information etcdClientInfo section in
/etc/origin/master/master-config.yaml is correct?

On Mon, Jun 13, 2016 at 11:56 AM, Julio Saura <jsaura hiberus com>
wrote:
hello

yes.. i have a external balancer in front of my masters for HA as doc
says.

i don’t have any balancer in front of my etcd servers for masters
connection, it’s not necessary right? masters will try all etcd availables
it one is down right?

i don’t know why but none of my masters were able to connect to the
second etcd instance, but using telnet from their shell worked .. so it was
not a net o fw issue..


best regards.

El 13 jun 2016, a las 17:53, Clayton Coleman <ccoleman redhat com>
escribió:
credentials from
I have not seen that particular issue.  Do you have a load balancer
in
between your masters and etcd?

On Fri, Jun 10, 2016 at 5:55 AM, Julio Saura <jsaura hiberus com>
wrote:
hello

i have an origin 3.1 installation working cool so far

today one of my etcd nodes ( 1 of 2 ) crashed and i started having
problems..

i noticed on one of my master nodes that it was not able to connect
to second etcd server and that the etcd server was not able to promote as
leader..


un 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4 is
starting a new election at term 10048
jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
became candidate at term 10049
jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
received vote from 12c8a31c8fcae0d4 at term 10049
jun 10 11:09:55 openshift-balancer02 etcd[47218]: 12c8a31c8fcae0d4
[logterm: 8, index: 4600461] sent vote request to bf80ee3a26e8772c at term
10049
jun 10 11:09:56 openshift-balancer02 etcd[47218]: got unexpected
response error (etcdserver: request timed out)

my masters logged that they were not able to connect to the etcd

er.go:218] unexpected ListAndWatch error: pkg/storage/cacher.go:161:
Failed to list *extensions.Job: error #0: dial tcp X.X.X.X:2379: connection
refused

so i tried a simple test, just telnet from masters to the etcd node
port ..

[root openshift-master01 log]# telnet X.X.X.X 2379
Trying X.X.X.X...
Connected to X.X.X.X.
Escape character is '^]’

so i was able to connect from masters.

i was not able to recover my oc masters until the first etcd node
rebooted .. so it seems my etcd “cluster” is not working without the first
node ..

any clue?

thanks


_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users





-- 
Jason DeTiberus

_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]