[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Certificate Problem?



Yeah, I have been trying to nail the problem.   Try turning off the nodes that aren't on the master server.  (I just happen to have a node on my master machine).   When I do that lots of my issues go away.   Need to go back and test the router / registry deploy again.  Still trying to figure out what the issue is.    BTW my DNS points to master.   

I did pretty fastidious install from the the installation guide and was very careful about meeting the pre-requisites.

Cheers
Justin


On 25/08/2015, at 9:36 am, Jason Brooks <jbrooks redhat com> wrote:



----- Original Message -----
From: "Clayton Coleman" <ccoleman redhat com>
To: "Justin Wood" <justin wood sixtree co nz>
Cc: "users" <users lists openshift redhat com>
Sent: Saturday, August 22, 2015 3:24:03 PM
Subject: Re: Certificate Problem?

Hrm - if oc get pods is the thing that is causing the registry to be
unable to connect to the master, that implies something deeper (and
possibly more fundamental) about the network connections in play, or a
bad proxy or other component of the system.  I haven't ever seen a
failure like that that wasn't related to networking (it's very
unlikely there is anything in the master or client code triggering
this).  Do you have a firewall or other system daemon monitoring the
server?  Are packets being dropped anywhere?  Are your iptables rules
being managed by something else?

On Sat, Aug 22, 2015 at 4:47 PM, Justin Wood <justin wood sixtree co nz>
wrote:
Ok I played around with this a lot.   In my original situation I had
mistakenly only assigned one core of my 8 core 2.2 Ghz i7 and the registry
stayed pending and the system generally misbehaved.  With two or 4 cores
assigned the registry gets created pretty quickly unless you are impatient
and keep hitting ‘oc get pods’ to see how things are going.   If you do
that then it fails with "ExitCode:255 “

I'm getting errors like this w/ vagrant-based hosts & openshift-ansible,
and giving significantly more resources to the VMs, and avoiding
"oc get pods" all together isn't preventing the ExitCode:255 for me.

Jason


[root master ~]# oc logs docker-registry-1-deploy
F0822 16:01:16.406786       1 deployer.go:64] couldn't get deployment
default/docker-registry-1: Get


e.g.

[root master ~]# oadm registry
--config=/etc/openshift/master/admin.kubeconfig
--credentials=/etc/openshift/master/openshift-registry.kubeconfig
--images='registry.access.redhat.com/openshift3/ose-${component}:${version}'
deploymentconfigs/docker-registry
services/docker-registry
[root master ~]# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED
STATUS              PORTS               NAMES
[root master ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-deploy   0/1       Running   0          8s
[root master ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-deploy   1/1       Running   0          16s
[root master ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-deploy   1/1       Running   0          18s
[root master ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-deploy   1/1       Running   0          23s
[root master ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-deploy   1/1       Running   0          29s
[root master ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-deploy   1/1       Running   0          31s
[root master ~]# oc get pods
NAME                       READY     STATUS    RESTARTS   AGE
docker-registry-1-deploy   1/1       Running   0          33s
[root master ~]# oc get pods
NAME                       READY     STATUS         RESTARTS   AGE
docker-registry-1-deploy   0/1       ExitCode:255   0          42s
[root master ~]# oc build-logs docker-registry-1-deploy
Error from server: build "docker-registry-1-deploy" not found
[root master ~]# oc get pods
NAME                       READY     STATUS         RESTARTS   AGE
docker-registry-1-deploy   0/1       ExitCode:255   0          1m
[root master ~]# oc logs docker-registry-1-deploy
F0822 16:01:16.406786       1 deployer.go:64] couldn't get deployment
default/docker-registry-1: Get
https://master.sixtree.com:8443/api/v1/namespaces/default/replicationcontrollers/docker-registry-1:
dial tcp: i/o timeout









On 21/08/2015, at 11:04 am, Justin Wood <justin wood sixtree co nz> wrote:

That was a clever idea!   I execed into the registry deploy pod and hit
that URL but never go a response before it was killed Something I was
doing perhaps kept it alive longer and hey presto my registry was
created.  I then exec d into the registry and it worked but too a long
time to return:

[root master ~]# oc rsh docker-registry-1-8aycp
<.com:8443/api/v1/namespaces/default/replicationcontrollers/docker-registry-1
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "User \"system:anonymous\" cannot get replicationcontrollers
in project \"default\"",
"reason": "Forbidden",
"details": {
  "name": "docker-registry-1",
  "kind": "replicationcontrollers"
},
"code": 403

I checked my VM’s and they are a bit under speced.  I’ll give them another
core and some more RAM and let everyone know how that goes.

Thanks for your help!
Justin

On 21/08/2015, at 10:29 am, Clayton Coleman <ccoleman redhat com> wrote:

can you create a pod, exec into it, and then try pinging the master
(to verify the pods can reach back to the master)?

On Thu, Aug 20, 2015 at 6:17 PM, Justin Wood <justin wood sixtree co nz>
wrote:
Yes.  I also get a successful answer from on the URL that’s timing out

[root node1 ~]# curl -k
https://master.example.com:8443/api/v1/namespaces/default/replicationcontrollers/docker-registry-1
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {},
"status": "Failure",
"message": "User \"system:anonymous\" cannot get replicationcontrollers
in project \"default\"",
"reason": "Forbidden",
"details": {
 "name": "docker-registry-1",
 "kind": "replicationcontrollers"
},
"code": 403

I’m looking for a way to bump the login level up.

Justin

On 21/08/2015, at 10:04 am, Clayton Coleman <ccoleman redhat com>
wrote:

Does master.example.com resolve from your node?  Is the IP address the
same as your master instance?

On Thu, Aug 20, 2015 at 5:48 PM, Justin Wood
<justin wood example co nz> wrote:
Ok here’s what I get.

[root master ~]# oc logs docker-registry-1-deploy
F0820 17:35:02.953324       1 deployer.go:64] couldn't get deployment
default/docker-registry-1: Get
https://master.example.com:8443/api/v1/namespaces/default/replicationcontrollers/docker-registry-1:
dial tcp: i/o timeout

[root master ~]# oc get pods
NAME                       READY     STATUS         RESTARTS   AGE
docker-registry-1-deploy   0/1       ExitCode:255   0          3m


Aug 21 09:14:18 master.example.com openshift-master[1466]: 2015/08/21
09:14:18 etcdserver: saved snapshot at index 20002
Aug 21 09:34:31 master.example.com openshift-master[1466]: I0821
09:34:31.317638 1466 controller.go:72] Ignoring change for
DeploymentConfig default/docker-registry:1; no existing Deployment
found
Aug 21 09:34:31 master.example.com openshift-master[1466]: I0821
09:34:31.702437 1466 factory.go:214] About to try and schedule pod
docker-registry-1-deploy
Aug 21 09:34:31 master.example.com openshift-master[1466]: I0821
09:34:31.703204 1466 factory.go:312] Attempting to bind
docker-registry-1-deploy to node1.example.com
Aug 21 09:34:33 master.example.com openshift-master[1466]: I0821
09:34:33.492440    1466 controller.go:85] Ignoring DeploymentConfig
change for default/docker-registry:1 (latestVersion=1); same as
Deployment default/docker-registry-1

I took the firewall on node1 down, just for good measure and tried
again, but got the same result

Justin

On 21/08/2015, at 9:31 am, Clayton Coleman <ccoleman redhat com>
wrote:

Hrm, the TLS error may be a red herring.  Pull the logs for the
deploy
pod - oc logs docker-registry-1-deploy

On Thu, Aug 20, 2015 at 5:29 PM, Justin Wood
<justin wood example co nz> wrote:
Thanks Clayton.  This is what I have

...
serviceAccountConfig:
managedNames:
- default
- builder
- deployer
masterCA: ca.crt
privateKeyFile: serviceaccounts.private.key
publicKeyFiles:
- serviceaccounts.public.key
servingInfo:
bindAddress: 0.0.0.0:8443
certFile: master.server.crt
clientCA: ca.crt
keyFile: master.server.key
maxRequestsInFlight: 500
requestTimeoutSeconds: 3600


and I was running the command as system:admin

[root master ~]# oc whoami
system:admin


Cheers
Justin

On 21/08/2015, at 8:40 am, Clayton Coleman <ccoleman redhat com>
wrote:

Hrm, check that you have "masterCA" set under the
serviceAccountConfig field in your master-config.yaml

On Thu, Aug 20, 2015 at 4:05 PM, Justin Wood
<justin wood example co nz> wrote:
Hi All

I just did a fresh install of OpenShift using this guide

https://docs.openshift.com/enterprise/3.0/admin_guide/install/advanced_install.html

and everything comes up as it should but when I try to deploy a
registry it fails

The logs indicate that I need to address some certificate issue.
Where do I had trusted certs configure it to just use plain
http?

Here are the logs

Aug 20 19:26:18 master.example.com openshift-master[1466]: [676ns]
[676ns] About to list directory
Aug 20 19:26:18 master.example.com openshift-master[1466]:
[819.978876ms] [819.9782ms] List extracted
Aug 20 19:26:18 master.example.com openshift-master[1466]:
[819.989248ms] [10.372µs] List filtered
Aug 20 19:26:18 master.example.com openshift-master[1466]:
[819.989814ms] [566ns] END
Aug 20 19:26:18 master.example.com openshift-master[1466]: I0820
19:26:18.298101    1466 trace.go:57] Trace "List *api.PodList"
(started 2015-08-20 19:26:17.394538848 +1200 NZST):
Aug 20 19:26:18 master.example.com openshift-master[1466]: [490ns]
[490ns] About to list directory
Aug 20 19:26:18 master.example.com openshift-master[1466]:
[903.534372ms] [903.533882ms] List extracted
Aug 20 19:26:18 master.example.com openshift-master[1466]:
[903.537414ms] [3.042µs] List filtered
Aug 20 19:26:18 master.example.com openshift-master[1466]:
[903.537779ms] [365ns] END
Aug 20 19:26:19 master.example.com openshift-master[1466]: I0820
19:26:19.363015    1466 common.go:66] Self IP: 172.16.63.129.
Aug 20 19:29:50 master.example.com openshift-master[1466]: I0820
19:29:50.900598 1466 controller.go:72] Ignoring change for
DeploymentConfig default/docker-registry:1; no existing Deployment
found
Aug 20 19:29:51 master.example.com openshift-master[1466]: I0820
19:29:51.014624    1466 factory.go:214] About to try and schedule
pod docker-registry-1-deploy
Aug 20 19:29:51 master.example.com openshift-master[1466]: I0820
19:29:51.014842    1466 factory.go:312] Attempting to bind
docker-registry-1-deploy to node1.example.com
Aug 20 19:30:21 master.example.com openshift-master[1466]: I0820
19:30:21.843904 1466 controller.go:85] Ignoring DeploymentConfig
change for default/docker-registry:1 (latestVersion=1); same as
Deployment default/docker-registry-1
Aug 20 19:32:22 master.example.com openshift-master[1466]: I0820
19:32:22.844859 1466 controller.go:85] Ignoring DeploymentConfig
change for default/docker-registry:1 (latestVersion=1); same as
Deployment default/docker-registry-1

Aug 20 19:33:35 master.example.com openshift-master[1466]:
2015/08/20 19:33:35 http: TLS handshake error from
172.16.63.129:56385: remote error: unknown certificate authority

Aug 20 19:34:23 master.example.com openshift-master[1466]: I0820
19:34:23.951961 1466 controller.go:85] Ignoring DeploymentConfig
change for default/docker-registry:1 (latestVersion=1); same as
Deployment default/docker-registry-1
Aug 20 19:36:24 master.example.com openshift-master[1466]: I0820
19:36:24.873571 1466 controller.go:85] Ignoring DeploymentConfig
change for default/docker-registry:1 (latestVersion=1); same as
Deployment default/docker-registry-1
Aug 20 19:37:03 master.example.com openshift-master[1466]: I0820
19:37:03.750158    1466 replication_controller.go:370] Replication
Controller has been deleted default/docker-registry-1
Aug 20 19:37:21 master.example.com openshift-master[1466]: I0820
19:37:21.932608    1466 controller.go:72] Ignoring change for
DeploymentConfig default/docker-registry:1; no existing Deployment
found

Cheers
Justin

_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users



--
Clayton Coleman | Lead Engineer, OpenShift




--
Clayton Coleman | Lead Engineer, OpenShift




--
Clayton Coleman | Lead Engineer, OpenShift




--
Clayton Coleman | Lead Engineer, OpenShift





--
Clayton Coleman | Lead Engineer, OpenShift

_______________________________________________
users mailing list
users lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]