[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: logging-es errors: shards failed



The logging-ops instance will contain the logs from /var/log/messages* and the "default", "openshift" and "openshift-infra" name spaces only.

On Fri, Jul 15, 2016 at 3:28 PM, Alex Wauck <alexwauck exosite com> wrote:
I also tried to fetch the logs from our logging-ops ES instance.  That also met with failure.  Searching for "kubernetes_namespace_name: logging" there lead to "No results found".

On Fri, Jul 15, 2016 at 2:48 PM, Peter Portante <pportant redhat com> wrote:
Well, we don't send ES logs to itself.  I think you can create a
feedback loop that breaks the whole thing down.
-peter

On Fri, Jul 15, 2016 at 3:39 PM, Luke Meyer <lmeyer redhat com> wrote:
> They surely do. Although it would probably be easiest here to just get them
> from `oc logs` against the ES pod, especially if we can't trust ES storage.
>
> On Fri, Jul 15, 2016 at 3:26 PM, Peter Portante <pportant redhat com> wrote:
>>
>> Eric, Luke,
>>
>> Do the logs from the ES instance itself flow into that ES instance?
>>
>> -peter
>>
>> On Fri, Jul 15, 2016 at 12:14 PM, Alex Wauck <alexwauck exosite com>
>> wrote:
>> > I'm not sure that I can.  I clicked the "Archive" link for the
>> > logging-es
>> > pod and then changed the query in Kibana to "kubernetes_container_name:
>> > logging-es-cycd8veb && kubernetes_namespace_name: logging".  I got no
>> > results, instead getting this error:
>> >
>> > Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.12
>> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
>> > capacity 1000) on
>> >
>> > org elasticsearch search action SearchServiceTransportAction$23 6b1f2699]
>> > Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.14
>> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
>> > capacity 1000) on
>> >
>> > org elasticsearch search action SearchServiceTransportAction$23 66b9a5fb]
>> > Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.15
>> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
>> > capacity 1000) on
>> > org elasticsearch search action SearchServiceTransportAction$23 512820e]
>> > Index: unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.29
>> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
>> > capacity 1000) on
>> >
>> > org elasticsearch search action SearchServiceTransportAction$23 3dce96b9]
>> > Index: unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.30
>> > Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
>> > capacity 1000) on
>> >
>> > org elasticsearch search action SearchServiceTransportAction$23 2f774477]
>> >
>> > When I initially clicked the "Archive" link, I saw a lot of messages
>> > with
>> > the kubernetes_container_name "logging-fluentd", which is not what I
>> > expected to see.
>> >
>> >
>> > On Fri, Jul 15, 2016 at 10:44 AM, Peter Portante <pportant redhat com>
>> > wrote:
>> >>
>> >> Can you go back further in the logs to the point where the errors
>> >> started?
>> >>
>> >> I am thinking about possible Java HEAP issues, or possibly ES
>> >> restarting for some reason.
>> >>
>> >> -peter
>> >>
>> >> On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček <lvlcek redhat com>
>> >> wrote:
>> >> > Also looking at this.
>> >> > Alex, is it possible to investigate if you were having some kind of
>> >> > network connection issues in the ES cluster (I mean between
>> >> > individual
>> >> > cluster nodes)?
>> >> >
>> >> > Regards,
>> >> > Lukáš
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >> On 15 Jul 2016, at 17:08, Peter Portante <pportant redhat com>
>> >> >> wrote:
>> >> >>
>> >> >> Just catching up on the thread, will get back to you all in a few
>> >> >> ...
>> >> >>
>> >> >> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz
>> >> >> <ewolinet redhat com>
>> >> >> wrote:
>> >> >>> Adding Lukas and Peter
>> >> >>>
>> >> >>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer <lmeyer redhat com>
>> >> >>> wrote:
>> >> >>>>
>> >> >>>> I believe the "queue capacity" there is the number of parallel
>> >> >>>> searches
>> >> >>>> that can be queued while the existing search workers operate. It
>> >> >>>> sounds like
>> >> >>>> it has plenty of capacity there and it has a different reason for
>> >> >>>> rejecting
>> >> >>>> the query. I would guess the data requested is missing given it
>> >> >>>> couldn't
>> >> >>>> fetch shards it expected to.
>> >> >>>>
>> >> >>>> The number of shards is a multiple (for redundancy) of the number
>> >> >>>> of
>> >> >>>> indices, and there is an index created per project per day. So
>> >> >>>> even
>> >> >>>> for a
>> >> >>>> small cluster this doesn't sound out of line.
>> >> >>>>
>> >> >>>> Can you give a little more information about your logging
>> >> >>>> deployment?
>> >> >>>> Have
>> >> >>>> you deployed multiple ES nodes for redundancy, and what are you
>> >> >>>> using
>> >> >>>> for
>> >> >>>> storage? Could you attach full ES logs? How many OpenShift nodes
>> >> >>>> and
>> >> >>>> projects do you have? Any history of events that might have
>> >> >>>> resulted
>> >> >>>> in lost
>> >> >>>> data?
>> >> >>>>
>> >> >>>> On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck
>> >> >>>> <alexwauck exosite com>
>> >> >>>> wrote:
>> >> >>>>>
>> >> >>>>> When doing searches in Kibana, I get error messages similar to
>> >> >>>>> "Courier
>> >> >>>>> Fetch: 919 of 2020 shards failed".  Deeper inspection reveals
>> >> >>>>> errors
>> >> >>>>> like
>> >> >>>>> this: "EsRejectedExecutionException[rejected execution (queue
>> >> >>>>> capacity 1000)
>> >> >>>>> on
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> org elasticsearch search action SearchServiceTransportAction$23 14522b8e]".
>> >> >>>>>
>> >> >>>>> A bit of investigation lead me to conclude that our Elasticsearch
>> >> >>>>> server
>> >> >>>>> was not sufficiently powerful, but I spun up a new one with four
>> >> >>>>> times the
>> >> >>>>> CPU and RAM of the original one, but the queue capacity is still
>> >> >>>>> only 1000.
>> >> >>>>> Also, 2020 seems like a really ridiculous number of shards.  Any
>> >> >>>>> idea what's
>> >> >>>>> going on here?
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>>
>> >> >>>>> Alex Wauck // DevOps Engineer
>> >> >>>>>
>> >> >>>>> E X O S I T E
>> >> >>>>> www.exosite.com
>> >> >>>>>
>> >> >>>>> Making Machines More Human.
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> _______________________________________________
>> >> >>>>> users mailing list
>> >> >>>>> users lists openshift redhat com
>> >> >>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>> >> >>>>>
>> >> >>>>
>> >> >>>>
>> >> >>>> _______________________________________________
>> >> >>>> users mailing list
>> >> >>>> users lists openshift redhat com
>> >> >>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Alex Wauck // DevOps Engineer
>> >
>> > E X O S I T E
>> > www.exosite.com
>> >
>> > Making Machines More Human.
>
>



--
Alex Wauck // DevOps Engineer

E X O S I T E 
www.exosite.com 

Making Machines More Human.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]