[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Do openshift can keep Track of absolutely all service activity, in a High Availability (many replicas) Scenario?

On Wed, Oct 12, 2016 at 8:42 AM, Ricardo Aguirre Reyes | BEEVA MX <ricardo aguirre contractor beeva com> wrote:

I  have a doubt regarded the  openshift ability to keep Track of absolutely all service activity, in a High Availability (many replicas) Scenario.

We are working in  build a microService that will  communicate through TCP (sockets) to Mainframe. 
We will  run several  Pods as replicas in  order t o achieve High Availability. 
We know that figuring everything in this way each transaction can be logged for the answering pod. 
Then we can store logging messages in elasticSearch and theoretically we can get even  dead pods (is this true?); we can  aggregated them based on application labels.

Yes, things can crash, and you can have situations where a pod is healthy (according to its health checks), accepts a request for processing, and subsequently fails.
Using multiple pods there will never be messages dropped on the floor because at least one pod will be up to answer.

It may be useful to think about the failure modes in the path between your microservices and your mainframe service:

1. Top of rack switch failure
2. Cable failure
3. Power failure
4. Router pod failure
5. OpenShift node failure (application pod failure)

There are a lot of things to consider when building reliable, high-performance distributed systems. This checklist is helpful: https://monkey.org/~marius/checklist.pdf

Keep in mind TCP has checksum & retry mechanisms (handling line noise, dropped packets, transient network blips)  but they do not handle re-opening a broken connection and re-trying requests automatically.  Therefore, your service will need to handle this somehow.  And there's no such thing as exactly-once systems, so your system should be idempotent.
But we do not know what happen at example if a message was already assigned to the pod1 and then if it goes done before receiveing the reply from the Destination.

If you open a connection and initiate a request, but your process crashes before the response is received, then the destination server will send a reply but your operating system won't know how to handle the request, since nothing is holding the socket open anymore.  This manifests as a TCP RST (Reset) being sent to the mainframe.

This wouldn't account for the case where a pod receives a reply but crashes before completing its processing - for that, you need something more sophisticated.

Does the openshift High Availability mechanism will resend the last message to another available pod, since "it knows" that is down.

The problem is that my service cannot lost any message and it must record every activity.

I think many people have had success using Apache Kafka, which is a distributed message queue (more precisely a replicated commit log). It persists messages for some defined interval, allowing your application to replay messages in order to ensure that nothing gets dropped.

Jonathan Yu, P.Eng. / Software Engineer, OpenShift by Red Hat / Twitter (@jawnsy) is the quickest way to my heart

“A master in the art of living draws no sharp distinction between his work and his play; his labor and his leisure; his mind and his body; his education and his recreation. He hardly knows which is which. He simply pursues his vision of excellence through whatever he is doing, and leaves others to determine whether he is working or playing. To himself, he always appears to be doing both.” — L. P. Jacks, Education through Recreation (1932), p. 1

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]