[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Single OpenShift cluster/instance across datacenter?



> On Dec 16, 2015, at 2:39 PM, Srinivas Naga Kotaru (skotaru) <skotaru cisco com> wrote:
>
> Clayton
>
> Thanks for clarification. Need some clarity on 1st point
>
> Say for example if we have 2 DC’c for distance is 50 miles away and round trip is less than a second.


Unfortunately for splitting etcd across DCs you need <10ms latency to
have a chance of preserving the efficiency of the cluster.


> We want to host only prod apps in these 2 DC for HA purpose
>
> What is your consideration ?
>
> A). Single cluster - span nodes across both the DC’s use node selector for pods placement equally. Have dedicated master and etcd and put equally in each data center

If you only have 2 DCs, you have to divide up the 3 or 5 etcds into
those two failure domains, but whichever side has the majority will be
the only DC that can keep accepting changes after a write: ie

DC 1: 1 etcd
DC 2: 2 etcd

If the connection between them is severed, only DC2 can accept changes
- DC1 will not be able to make changes.  If DC2 is lost, you'll need
to bring up 2 more etcd in DC1 before it can make changes.  If DC1 is
lost, you'll continue to be able to make changes in DC2.

> B) Single Cluster - Span nodes across both the data centers but limit management layer ( masters and etcd0 to single data center for latency requirements and to avoid split brain in case one DC disconnected to other one)

This is only better than A if latency is >10ms

> C). Each DC has its own cluster installation - if client want HA than create pods both the clusters and use global routing to tight-up with open shift exposed routing URl’S. Technically 2 open shift routing URLs to be mapped to a single global virtual.
>
> In last case, client has to deal with 2 API end points(oc, console, API)  while doing any operation ( code deploy, life cycle management, administration)

Yes.  Option D might be:

Use option A or B in DC1 and then run a single DR cluster in DC2 that
has only a subset of apps that cannot tolerate any downtime.

Option E is a variant of A and is usually have the third etcd member
outside the two datacenters, so any one datacenter can be lost.
That's harder for most people.


>
> --
> Srinivas Kotaru
>
>
>
>
>
>
>> On 12/16/15, 10:16 AM, "Clayton Coleman" <ccoleman redhat com> wrote:
>>
>> On Tue, Dec 15, 2015 at 2:23 PM, Srinivas Naga Kotaru (skotaru)
>> <skotaru cisco com> wrote:
>>> Thanks Clayton. That helps but still have few questions and need your expert comment
>>>
>>> 1. What is the impact of etcd non availability in case connectivity between DC broker for few minutes or hours in worst case? Will it impact run time or only provision time?
>>
>> It will only affect receiving changes to the cluster topology - the
>> cluster will stay in the same state.  Of note - if a node is restarted
>> and can't contact the master it won't alter the running state of the
>> system, which may mean that apps don't come back on the nodes after a
>> reboot until the node can reach the master (there are various efforts
>> going on to address this, but it is a limitation today).
>>
>>>
>>> 2. How to deal with code deployment scenarios in case of multi cluster approach? Will they have to deal with individual app by app?
>>
>> You would - depending on where you have your images, you can build the
>> app once, then roll the same image out to all clusters (push if
>> necessary, or pull otherwise).
>>
>>>
>>> 3. Since each app has few routing exposed URL’s in case of multi cluster setup, putting a global load balancer and forward to open shift routing URL is the right approach?
>>
>> Yes, that's the most common option (each shard of the app is in a
>> different cluster and has different wildcards,
>> app.cluster-east1.mycompany.com, app.cluster-east2.mycompany.com,
>> app.cluster-east3.mycompany.com).  You can also have the same
>> wildcards (app.cluster.mycompany.com) and simply treat the routers on
>> each cluster as identical backends.
>>
>>
>>> --
>>> Srinivas Kotaru
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> On 12/15/15, 10:44 AM, "Clayton Coleman" <ccoleman redhat com> wrote:
>>>>
>>>> On Tue, Dec 15, 2015 at 1:24 PM, Srinivas Naga Kotaru (skotaru)
>>>> <skotaru cisco com> wrote:
>>>>> We are running into similar issue similar to below issue.
>>>>>
>>>>> http://stackoverflow.com/questions/34194602/single-kubernetes-openshift-cluster-instance-across-datacenters
>>>>>
>>>>> Pondering whether to go with single cluster span across 3-4 data centers and
>>>>> go with each cluster dedicated to local DC. Each one has its own pros and
>>>>> cons but multiple clusters approach creating more management/operational
>>>>> overhead to both platform and client teams.
>>>>>
>>>>> Why we hating multiple clusters approach?
>>>>>
>>>>> Each cluster has its own API end point. Clients has to use different API end
>>>>> points while working with each DC pods or life cycle management. They might
>>>>> don’t like it
>>>>> While provisioning apps, pods need to be created on each cluster and should
>>>>> tie-up up with another global routing layer
>>>>> If an application has few pods running from each cluster, how they
>>>>> communicate unless they create additional service groups or  other apps
>>>>> talking this app, will they have deal with multiple service and routing
>>>>> groups? Communication with in the app pods or inter application
>>>>> communication is complex.
>>>>> To mask the multiple clusters and API end points, we have to build a uber
>>>>> style a common orchestration, routing and client interface where clients can
>>>>> ignore backend topology and use a common interface for their day to day job.
>>>>>
>>>>> I heard master place also can’t span across data centers, particularly etcd
>>>>> due to its latency requirements, to be co related in same location. Is it a
>>>>> still a problem if data centers are connected with decent network infra?
>>>>
>>>> Etcd definitely needs low latency between instances.  If you can
>>>> deliver <5ms ping between datacenters, that's not an issue.  If you're
>>>> higher than that, failover and write performance will suffer (in a
>>>> cluster, each write has to be acknowledged by all nodes).  That
>>>> doesn't mean that you can't run an HA setup inside a single data
>>>> center and have nodes in other data centers - you just need to assess
>>>> the impact of losing network between datacenters (what the chance of
>>>> failure is) and what the outcome is in the event of failure.
>>>>
>>>> I've suggested doing both - run a cluster in each data center for
>>>> things that absolutely must be able to survive multiple datacenter
>>>> losses, and have those apps be deployed to each separate cluster.
>>>> Then run a single cluster (led by one data center) for other use cases
>>>> (general dev use, staging, preprod, etc).


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]