Re: logging-es errors: shards failed

I believe the "queue capacity" there is the number of parallel searches that can be queued while the existing search workers operate. It sounds like it has plenty of capacity there and it has a different reason for rejecting the query. I would guess the data requested is missing given it couldn't fetch shards it expected to.

The number of shards is a multiple (for redundancy) of the number of indices, and there is an index created per project per day. So even for a small cluster this doesn't sound out of line.

Can you give a little more information about your logging deployment? Have you deployed multiple ES nodes for redundancy, and what are you using for storage? Could you attach full ES logs? How many OpenShift nodes and projects do you have? Any history of events that might have resulted in lost data?

On Thu, Jul 14, 2016 at 4:06 PM, Alex Wauck <alexwauck exosite com> wrote:
When doing searches in Kibana, I get error messages similar to "Courier Fetch: 919 of 2020 shards failed".  Deeper inspection reveals errors like this: "EsRejectedExecutionException[rejected execution (queue capacity 1000) on org elasticsearch search action SearchServiceTransportAction$23 14522b8e]".

A bit of investigation lead me to conclude that our Elasticsearch server was not sufficiently powerful, but I spun up a new one with four times the CPU and RAM of the original one, but the queue capacity is still only 1000.  Also, 2020 seems like a really ridiculous number of shards.  Any idea what's going on here?

Alex Wauck

E X O S I T E 

Making Machines More Human.

