[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: logging-es errors: shards failed



They surely do. Although it would probably be easiest here to just get them from `oc logs` against the ES pod, especially if we can't trust ES storage.

On Fri, Jul 15, 2016 at 3:26 PM, Peter Portante <pportant redhat com> wrote:
Eric, Luke,

Do the logs from the ES instance itself flow into that ES instance?

-peter

On Fri, Jul 15, 2016 at 12:14 PM, Alex Wauck <alexwauck exosite com> wrote:
> I'm not sure that I can.  I clicked the "Archive" link for the logging-es
> pod and then changed the query in Kibana to "kubernetes_container_name:
> logging-es-cycd8veb && kubernetes_namespace_name: logging".  I got no
> results, instead getting this error:
>
> Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.12
> Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> capacity 1000) on
> org elasticsearch search action SearchServiceTransportAction$23 6b1f2699]
> Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.14
> Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> capacity 1000) on
> org elasticsearch search action SearchServiceTransportAction$23 66b9a5fb]
> Index: unrelated-project.92c37428-11f6-11e6-9c83-020b5091df01.2016.07.15
> Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> capacity 1000) on
> org elasticsearch search action SearchServiceTransportAction$23 512820e]
> Index: unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.29
> Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> capacity 1000) on
> org elasticsearch search action SearchServiceTransportAction$23 3dce96b9]
> Index: unrelated-project.f38ac6ff-3e42-11e6-ab71-020b5091df01.2016.06.30
> Shard: 2 Reason: EsRejectedExecutionException[rejected execution (queue
> capacity 1000) on
> org elasticsearch search action SearchServiceTransportAction$23 2f774477]
>
> When I initially clicked the "Archive" link, I saw a lot of messages with
> the kubernetes_container_name "logging-fluentd", which is not what I
> expected to see.
>
>
> On Fri, Jul 15, 2016 at 10:44 AM, Peter Portante <pportant redhat com>
> wrote:
>>
>> Can you go back further in the logs to the point where the errors started?
>>
>> I am thinking about possible Java HEAP issues, or possibly ES
>> restarting for some reason.
>>
>> -peter
>>
>> On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček <lvlcek redhat com> wrote:
>> > Also looking at this.
>> > Alex, is it possible to investigate if you were having some kind of
>> > network connection issues in the ES cluster (I mean between individual
>> > cluster nodes)?
>> >
>> > Regards,
>> > Lukáš
>> >
>> >
>> >
>> >
>> >> On 15 Jul 2016, at 17:08, Peter Portante <pportant redhat com> wrote:
>> >>
>> >> Just catching up on the thread, will get back to you all in a few ...
>> >>
>> >> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz <ewolinet redhat com>
>> >> wrote:
>> >>> Adding Lukas and Peter
>> >>>
>> >>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer <lmeyer redhat com> wrote:
>> >>>>
>> >>>> I believe the "queue capacity" there is the number of parallel
>> >>>> searches
>> >>>> that can be queued while the existing search workers operate. It
>> >>>> sounds like
>> >>>> it has plenty of capacity there and it has a different reason for
>> >>>> rejecting
>> >>>> the query. I would guess the data requested is missing given it
>> >>>> couldn't
>> >>>> fetch shards it expected to.
>> >>>>
>> >>>> The number of shards is a multiple (for redundancy) of the number of
>> >>>> indices, and there is an index created per project per day. So even
>> >>>> for a
>> >>>> small cluster this doesn't sound out of line.
>> >>>>
>> >>>> Can you give a little more information about your logging deployment?
>> >>>> Have
>> >>>> you deployed multiple ES nodes for redundancy, and what are you using
>> >>>> for
>> >>>> storage? Could you attach full ES logs? How many OpenShift nodes and
>> >>>> projects do you have? Any history of events that might have resulted
>> >>>> in lost
>> >>>> data?
>> >>>>
>> >>>> On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck <alexwauck exosite com>
>> >>>> wrote:
>> >>>>>
>> >>>>> When doing searches in Kibana, I get error messages similar to
>> >>>>> "Courier
>> >>>>> Fetch: 919 of 2020 shards failed".  Deeper inspection reveals errors
>> >>>>> like
>> >>>>> this: "EsRejectedExecutionException[rejected execution (queue
>> >>>>> capacity 1000)
>> >>>>> on
>> >>>>>
>> >>>>> org elasticsearch search action SearchServiceTransportAction$23 14522b8e]".
>> >>>>>
>> >>>>> A bit of investigation lead me to conclude that our Elasticsearch
>> >>>>> server
>> >>>>> was not sufficiently powerful, but I spun up a new one with four
>> >>>>> times the
>> >>>>> CPU and RAM of the original one, but the queue capacity is still
>> >>>>> only 1000.
>> >>>>> Also, 2020 seems like a really ridiculous number of shards.  Any
>> >>>>> idea what's
>> >>>>> going on here?
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>> Alex Wauck // DevOps Engineer
>> >>>>>
>> >>>>> E X O S I T E
>> >>>>> www.exosite.com
>> >>>>>
>> >>>>> Making Machines More Human.
>> >>>>>
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> users mailing list
>> >>>>> users lists openshift redhat com
>> >>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>> >>>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> users mailing list
>> >>>> users lists openshift redhat com
>> >>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>> >
>
>
>
>
> --
>
> Alex Wauck // DevOps Engineer
>
> E X O S I T E
> www.exosite.com
>
> Making Machines More Human.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]