[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Cannot Add Nodes to District



On Thu, Jul 31, 2014 at 9:11 AM, Brenton Leanhardt <bleanhar redhat com> wrote:
+++ Charles Simpson [30/07/14 11:45 -0400]:

I had the same problem in the last couple of days, but hadn't posted
anything because I couldn't figure out _why_ I was having the problem.

I had the same symptons where I couldn't add nodes to a district. I _could_
`oo-mco ping`, but _could not_ make any mcollective rpc call. For example,
`oo-mco rpc rpcutil inventory` would fail. I finally got it to work by
changing `direct_addressing = 0` to `direct_addressing = 1` in
`/opt/rh/ruby193/root/etc/mcollective/server.cfg` and restarting
ruby193-mcollective on the nodes.

That parameter used to be set to 1 in OpenShift Enterprise 1 [oe1], but was
changed to 0 in Enterprise 2 [oe2]. The mcollective documentation says that
it _should_ be set to 1 [mc].

To be clear, it says it should usually be turned on, however it's not
the default. :)

For a bit of background on the subject both OpenShift Online and
Enterprise saw serious problems with direct addressing mode in
combination with ActiveMQ.  The problem manifested it self as Nodes
that would not respond to requests until the ActiveMQ server was
restarted.

You can read more on that issue here:
https://tickets.puppetlabs.com/browse/MCO-104

Thanks for the reasoning behind the change, having to restart ActiveMQ is bad.
 

To my knowledge no OpenShift code relies on direct addressing.  Would
you mind pasting the output of oo-accept-broker and oo-accept-node on
your systems?
 
Hostnames are replaced, but outputs (with direct_addressing = 0) follow.

    [root broker ~]# oo-accept-broker
    PASS


    [root node ~]# oo-accept-node
    PASS


    [root broker ~]# oo-mco ping
    node.domain             time=128.65 ms


    ---- ping statistics ----
    1 replies max: 128.65 min: 128.65 avg: 128.65

   
    [root broker ~]# oo-mco rpc rpcutil inventory
    Discovering hosts using the mc method for 2 second(s) .... 1

     | [ >                                                           ] 0 / 1




    Finished processing 0 / 1 hosts in 12004.76 ms


    No response from:

       node.domain
 
   
    [root broker ~]# oo-mco rpc openshift get_all_gears
    Discovering hosts using the mc method for 2 second(s) .... 1

     | [ >                                                           ] 0 / 1^C




    Finished processing 0 / 1 hosts in 86240.18 ms


    No response from:

      node.domain


After changing direct_addressing = 1 on the node and restarting ruby193-mcollective:

    [root node ~]# oo-accept-node
    PASS


    [root broker ~]# oo-mco ping
    node.domain             time=130.01 ms


    ---- ping statistics ----
    1 replies max: 130.01 min: 130.01 avg: 130.01


    [root broker ~]# oo-mco rpc openshift get_all_gears
    Discovering hosts using the mc method for 2 second(s) .... 1

     * [ ==========================================================> ] 1 / 1


    node.domain
       Exit Code: 0
          Output: {}



    Finished processing 1 / 1 hosts in 43.02 ms



I would be interested in knowing the version of the both the
ruby193-mcollective-common and ruby193-rubygem-stomp RPMs on your
systems as well if there are any gems installed outside of RPM.

ruby193-mcollective-common-2.4.1-2.el6oso.noarch
ruby193-rubygem-stomp-1.2.14-1.el6oso.noarch

Systems provisioned using puppet, no gems installed outside of RPM.
 

In a recent thread on the users list a similar issue was reportedly
solve by a configuration fix:

http://lists.openshift.redhat.com/openshift-archives/users/2014-July/msg00035.html

server.cfg on my nodes have:

  plugin.yaml = /opt/rh/ruby193/root/etc/mcollective/facts.yaml

so I think it doesn't look like that problem. Also I don't know how mcollective would have started working with direct_addressing = 1 if the facts.yaml file was wrong.



--Brenton




[oe1]:
https://access.redhat.com/documentation/en-US/OpenShift_Enterprise/1/html/Deployment_Guide/Installing_and_Configuring_MCollective_on_Node_Hosts.html
[oe2]:
https://access.redhat.com/documentation/en-US/OpenShift_Enterprise/2/html/Deployment_Guide/Installing_and_Configuring_MCollective_on_Node_Hosts.html
[mc]:
http://docs.puppetlabs.com/mcollective/configure/server.html#directaddressing


On Wed, Jul 30, 2014 at 11:37 AM, Kevin Conaway <kevin conaway gmail com>
wrote:

I'm wrapping up Broker+Node install following the comprehensive deployment
guide.  I'm following the Post Install section right now and am trying to
add my node host to a district I created

 Running

oo-admin-ctl-district -c add-node -n ps-test -a

Returns


{"_id"=>"53d90698ae257ddfb1000001",
 "uuid"=>"53d90698ae257ddfb1000001",
 "available_uids"=>"<6000 uids hidden>",
 "name"=>"ps-test",
 "platform"=>"linux",
 "gear_size"=>"small",
 "available_capacity"=>6000,
 "max_uid"=>6999,
 "max_capacity"=>6000,
 "active_servers_size"=>0,
 "updated_at"=>2014-07-30 14:52:08 UTC,
 "created_at"=>2014-07-30 14:52:08 UTC}

ERROR OUTPUT:
No available nodes for profile 'small'

I see the following on the node host mcollective log:

D, [2014-07-30T08:32:34.668572 #4322] DEBUG -- : runnerstats.rb:49:in
`received' Incrementing total stat
D, [2014-07-30T08:32:34.668724 #4322] DEBUG -- : pluginmanager.rb:83:in
`[]' Returning cached plugin security_plugin with class
MCollective::Security::Psk
D, [2014-07-30T08:32:34.668877 #4322] DEBUG -- : runnerstats.rb:38:in
`validated' Incrementing validated stat
D, [2014-07-30T08:32:34.668977 #4322] DEBUG -- : pluginmanager.rb:83:in
`[]' Returning cached plugin security_plugin with class
MCollective::Security::Psk
D, [2014-07-30T08:32:34.669118 #4322] DEBUG -- : pluginmanager.rb:83:in
`[]' Returning cached plugin security_plugin with class
MCollective::Security::Psk
D, [2014-07-30T08:32:34.669245 #4322] DEBUG -- : base.rb:117:in `block (2
levels) in validate_filter?' Passing based on agent openshift
D, [2014-07-30T08:32:34.669372 #4322] DEBUG -- : base.rb:153:in
`validate_filter?' Message passed the filter checks
D, [2014-07-30T08:32:34.669466 #4322] DEBUG -- : runnerstats.rb:26:in
`passed' Incrementing passed stat
D, [2014-07-30T08:32:34.669550 #4322] DEBUG -- : runner.rb:94:in
`agentmsg' Handling message for agent 'discovery' on collective
'mcollective'
D, [2014-07-30T08:32:34.669632 #4322] DEBUG -- : agents.rb:119:in
`dispatch' Dispatching a message to agent discovery
D, [2014-07-30T08:32:34.669759 #4322] DEBUG -- : activemq.rb:329:in
`receive' Waiting for a message from ActiveMQ
D, [2014-07-30T08:32:34.669864 #4322] DEBUG -- : pluginmanager.rb:83:in
`[]' Returning cached plugin discovery_agent with class
MCollective::Agent::Discovery
D, [2014-07-30T08:32:34.670109 #4322] DEBUG -- : pluginmanager.rb:83:in
`[]' Returning cached plugin security_plugin with class
MCollective::Security::Psk
D, [2014-07-30T08:32:34.670216 #4322] DEBUG -- : pluginmanager.rb:83:in
`[]' Returning cached plugin security_plugin with class
MCollective::Security::Psk
D, [2014-07-30T08:32:34.670390 #4322] DEBUG -- : base.rb:168:in
`create_reply' Encoded a message for request
68f5055f83cf51b5a632538121309737
D, [2014-07-30T08:32:34.670511 #4322] DEBUG -- : pluginmanager.rb:83:in
`[]' Returning cached plugin connector_plugin with class
MCollective::Connector::Activemq
D, [2014-07-30T08:32:34.670625 #4322] DEBUG -- : activemq.rb:362:in
`publish' Sending a broadcast message to ActiveMQ target
'/queue/mcollective.reply.ps-openshift-broker.eng.jiveland.com_30659' with
headers '{"timestamp"=>"1406734354000", "expires"=>"1406734424000"}'
D, [2014-07-30T08:32:34.670914 #4322] DEBUG -- : runnerstats.rb:56:in
`block in sent' Incrementing replies stat


If I try to add the node manually via the hostname (or IP), I get the
following error

/usr/sbin/oo-admin-ctl-district:215:in `block in <main>': undefined method
`casecmp' for nil:NilClass (NoMethodError)
from /usr/sbin/oo-admin-ctl-district:178:in `block in collate_errors'
 from /usr/sbin/oo-admin-ctl-district:176:in `each'
from /usr/sbin/oo-admin-ctl-district:176:in `collate_errors'
 from /usr/sbin/oo-admin-ctl-district:213:in `<main>'



_______________________________________________
dev mailing list
dev lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev



_______________________________________________
dev mailing list
dev lists openshift redhat com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]