[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: logging-es errors: shards failed



Can you go back further in the logs to the point where the errors started?

I am thinking about possible Java HEAP issues, or possibly ES
restarting for some reason.

-peter

On Fri, Jul 15, 2016 at 11:37 AM, Lukáš Vlček <lvlcek redhat com> wrote:
> Also looking at this.
> Alex, is it possible to investigate if you were having some kind of network connection issues in the ES cluster (I mean between individual cluster nodes)?
>
> Regards,
> Lukáš
>
>
>
>
>> On 15 Jul 2016, at 17:08, Peter Portante <pportant redhat com> wrote:
>>
>> Just catching up on the thread, will get back to you all in a few ...
>>
>> On Fri, Jul 15, 2016 at 10:08 AM, Eric Wolinetz <ewolinet redhat com> wrote:
>>> Adding Lukas and Peter
>>>
>>> On Fri, Jul 15, 2016 at 8:07 AM, Luke Meyer <lmeyer redhat com> wrote:
>>>>
>>>> I believe the "queue capacity" there is the number of parallel searches
>>>> that can be queued while the existing search workers operate. It sounds like
>>>> it has plenty of capacity there and it has a different reason for rejecting
>>>> the query. I would guess the data requested is missing given it couldn't
>>>> fetch shards it expected to.
>>>>
>>>> The number of shards is a multiple (for redundancy) of the number of
>>>> indices, and there is an index created per project per day. So even for a
>>>> small cluster this doesn't sound out of line.
>>>>
>>>> Can you give a little more information about your logging deployment? Have
>>>> you deployed multiple ES nodes for redundancy, and what are you using for
>>>> storage? Could you attach full ES logs? How many OpenShift nodes and
>>>> projects do you have? Any history of events that might have resulted in lost
>>>> data?
>>>>
>>>> On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck <alexwauck exosite com> wrote:
>>>>>
>>>>> When doing searches in Kibana, I get error messages similar to "Courier
>>>>> Fetch: 919 of 2020 shards failed".  Deeper inspection reveals errors like
>>>>> this: "EsRejectedExecutionException[rejected execution (queue capacity 1000)
>>>>> on
>>>>> org elasticsearch search action SearchServiceTransportAction$23 14522b8e]".
>>>>>
>>>>> A bit of investigation lead me to conclude that our Elasticsearch server
>>>>> was not sufficiently powerful, but I spun up a new one with four times the
>>>>> CPU and RAM of the original one, but the queue capacity is still only 1000.
>>>>> Also, 2020 seems like a really ridiculous number of shards.  Any idea what's
>>>>> going on here?
>>>>>
>>>>> --
>>>>>
>>>>> Alex Wauck // DevOps Engineer
>>>>>
>>>>> E X O S I T E
>>>>> www.exosite.com
>>>>>
>>>>> Making Machines More Human.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users lists openshift redhat com
>>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users lists openshift redhat com
>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]