[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: OS 1.0.7 eating memory like candies after halloween



On Wed, Nov 4, 2015 at 9:26 AM, Clayton Coleman <ccoleman redhat com> wrote:
> It's likely these are low level etcd client errors, TCP connection
> refused, or another low level HTTP/TCP golang library error, if etcd
> is down.  It's possible one of those busts the handler loops.  I would
> hope all of our controllers have rate limiters on retry.

I forgot to mention: the controller which generates the status updates
is using a rate limiter for retries.

> On Wed, Nov 4, 2015 at 8:06 AM, Dan Mace <dmace redhat com> wrote:
>> On Tue, Nov 3, 2015 at 9:17 PM, Clayton Coleman <ccoleman redhat com> wrote:
>>> You can recursively delete the entire events structure safely - it
>>> *should* be /kubernetes.io/events - if that's where the events are,
>>> then do a recursive delete of /kubernetes.io/events (be very careful,
>>> this is rm -rf).
>>>
>>> Derek is working an issue right now that would prevent these from
>>> overwhelming the server.  I *think* the deployment config problem is
>>> covered by an issue about not changing the status if there is nothing
>>> to write.  Dan, can you verify that your change would potentially
>>> prevent deployments from hot looping like this (do we have a rate
>>> limiter on the image change trigger retry queue)?
>>
>>
>> Hm. I'm not entirely clear yet why there are events in this case. The
>> message is just updated on the deploymentConfig, not part of an event.
>> Are these events failures to update the deploymentConfig?
>>
>> Philippe, could you please paste a complete example of one of these
>> stacked events so I can see all the fields? The one you listed seems
>> like a summary/truncated overview.
>>
>>> On Tue, Nov 3, 2015 at 9:11 PM, Philippe Lafoucrière
>>> <philippe lafoucriere tech-angels com> wrote:
>>>> Ok, the issue in indeed in our etcd.
>>>> I managed to list the keys, and I can see a LOT of "Deployment config
>>>> \\\"gemnasium-nginx-rails\\\" blocked by multiple errors:\\n\\n\\t* \\n\\t*
>>>> \\n\\t*"
>>>> The Glusterfs issue we had in the other thread has generated a lot of
>>>> failure events, and apparently openshift can't handle them. Is there a way
>>>> to clean these events for testing?
>>>>
>>>> Thanks


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]