[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: questions on haproxy and front proxy layer

Ram, can you elaborate on the min/max settings for the number of gears for an app?. rhc app create does not seem to take options. I didnt seem to find any config file that had it.
And just wanted to confirm mentioning min=X,max=X would keep the gear number fixed at X at all situations . I am trying to avoid the syncing between haproxy to the router clones after the initial set up.


From: Ram Ranganathan <ramr redhat com>
To: meghdoot bhattacharya <meghdoot_b yahoo com>
Cc: Mike McGrath <mmcgrath redhat com>; "dev lists openshift redhat com" <dev lists openshift redhat com>
Sent: Thursday, January 10, 2013 1:52 PM
Subject: Re: questions on haproxy and front proxy layer

Comments/Answers inline - tagged RR. 



On Jan 10, 2013, at 12:32 AM, meghdoot bhattacharya wrote:

Ram appreciate your response.

Is there any downtime of haproxy when the config changes? I never used it but looks like they have -st/-sf to smartly handle config reloads without dropping traffic.

RR:   This is how we do it today -- config change and then reload with -sf in the port proxy running on the node [outside the gear] as well  for the 
       case of  haproxy (running as a router for scalable apps).  But this is something that works well for connections that don't persist over time - not all
       traffic is equal.  Works fine for http/https traffic with some caveats for persistent connections/keepalives but let's keep that aside for a bit.

       In the port proxy case, if the proxied port is to a mysql service, then killing the connection has a much greater impact. So there's some issues there. 

So, back to the LB set up. Our current set up has hardware LB load balance between hundreds of nodes and each node has a apache that acts like a reverse proxy to several downstream app servers. So, for a moment if we ignore SSL termination and the question of reducing proxy layers, what is the suggestion to handle such a set up with Openshift to solve for scaling with whatever is available now?
Even if I get external balancer to load balance between the nodes as usual and have apache at each of them point to the haproxy [as opposed to only one apache node], it is a single haproxy taking all the load. And we cannot take chances with our customers. Ideally if we could have haproxy gear clones, then maybe we could have the traffic split between multiple haproxy that points to same set of nodes. So, what choices I have? Also I dont think running N different PaaS systems with the same application is going to fly with our operations. Opinions from you guys will help a lot.
RR:  Unfortunately not a straight forward solution - some ideas below. 
       Using multiple scaled apps would be the simplest solution and you could even use one of those (or a couple if you need) to work as federator to 
       push code to all the others when an application is deployed to it.  Really talking app clones here w/ a single DNS entry pointing to clones. 
       That would also take care of redundancy and remove any single failure or choke points, since those seem to be your major concerns.

       You don't need N different PaaS Systems -- its just 1 PaaS with multiple copies of the app.  

       If that does not work for you, then this would really need  a custom home-grown solution on your end:  
          1.  As Mike mentioned, just take a list of all the serving gears and add them to your Load Balancer. Of course auto-scaling is something you'd need to handle 
               yourself + need to keep the routing entries in sync.
          2.  There's other variants of the above approach - using routing clones.  Example - create a bunch of tiny applications (could be scaled apps or  
               DIY apps running haproxy - this is your routing clone) and then have a custom script maybe puppet/chef  to push the configuration from your
               app's haproxy gear to these haproxy clones.  Register all these clones in your LB or use DNS. 

And on a separate note, since we know the load profile of our apps if we want to have 100 gears created for the scaled app launch, do we use manual scaling? And in that case do we call add-gear 100 times in the single node [the one with haproxy and apache named virtual host] or do we have to go to the individual nodes and run add-gear?

RR:  There's a min/max setting for scalable apps - so you set this to 100,100 [that means you basically start w/ a 100 and stay on that].  
       Depending on which version of the software (origin/enterprise), you could use the web ui or might need the rest api to set that and then 
       may need to send a request to make the broker satisfy the app dependencies (simplest way may be to add/remove a cartridge or 
       I guess you can try a scale event add/remove gear). 


From: Ram Ranganathan <ramr redhat com>
To: meghdoot bhattacharya <meghdoot_b yahoo com>
Cc: Mike McGrath <mmcgrath redhat com>; "dev lists openshift redhat com" <dev lists openshift redhat com>
Sent: Wednesday, January 9, 2013 5:25 PM
Subject: Re: questions on haproxy and front proxy layer

Hi Meghdoot, 
    Comments/answers inline tagged RR. 
    Also on a related note, we are in very early stages of drafting up a proposal (PEP)  for the routing and scaling pieces, which would 
    include our next potential steps and we would welcome any comments/feedback once its out there. 



On Jan 9, 2013, at 1:00 PM, meghdoot bhattacharya wrote:

Thx Mike for your insights.

I want to get a better understanding on couple of your comments.

On the question 3, I think you are indicating external load balancer routing to the gears directly. So, are you indicating the config information from haproxy is exported to the external load balancer dynamically. And when new gears get created as part of scaling or get moved around in districts I am guessing haproxy configs change automatically. So, in those scenario if I am bypassing apache and haproxy loadbalancer then I still need to monitor and update external balancer dynamically , correct?
RR:  If you do go down the route of reading the list of haproxy configured "servers"/gears to route to, then you will have to monitor any config changes and 
      modify your external load balancer appropriately.  And yes, when gears get added/removed (as part of scaling) + on gear moves, the haproxy config is 
       automagically updated. 

In above set up I am assuming haproxy port proxy is still needed for external to internal mapping but can you tell me does it matter if the SSL termination [from external F5 LB say] happens within the app server in a gear? Does port proxy care whether its http or https?

RR:  The port proxy (also runs haproxy but I'd rather call it the port proxy to avoid confusion) is just a plain ole tcp proxy - it basically
       routes to your application internal ip/port (from the "externally" exposed port). 
       And the port proxy doesn't really care whether its http[s] traffic - a "passthrough-filter"!!

       Just to clarify things, in the existing infrastructure/code today we don't do the SSL termination inside the app server (on the gear). 
       Rather we do our SSL termination at the FrontEnd proxy layer (apache w/ virtual hosts) which in turn proxies the request to the gears. 
       So today, that is the haproxy server running in the gear for a scaled application and in the case of a non-scaled apps its just the app server running 
       in that gear. 

Also dont you think apache mod_proxy does not scale like haproxy and it hurts sitting before haproxy. I think with SSL termination event mpm cannot be used. The worker mpm still falls short of the single event loop of haproxy/nginx I think. How is apache configured for openshift? Preforked or worker mpm?

Its great to know that you guys are thinking on this and would definitely love to hear the solution. Any rough timelines?

RR: Its preforked. And yes that is one of the issues we have today with scale amongst others (websockets not being the least).
     Now as re: websockets, we do currently have an experimental event loop driven proxy server (Node.js based) - ports 8000/8443. And that does 
     definitely scale better - but its a solution we are field-testing right now.
      But vis-a-vis using haproxy in front of apache, that's not a viable solution for a variety of reasons as Mike mentioned -  security + resource purposes +
      dynamic reloading of routes/downtime - we don't want an app that's scaling to affect traffic/routing to other apps/gears on that node.  
      That's the rationale for "containing" the haproxy router within the scope of the gear. 

     And as mentioned earlier, we are really early stages of drafting up a proposal for the routing/scaling bits. 

Our policies will further complicate the set up. We dont allow non SSL connections in general between nodes. So, haproxy  LB contacting directly to a gear in a separate node over http is a challenge. External load balancer to gear is a plus in this situation where the app server itself can do the SSL termination.

RR:  Hmm, that might be an issue - couple of reasons - one, as mentioned above is that we do the SSL termination on the front-end proxy 
       layer (Apache) and not at the app server. That could however be solved with running the backend content server (app server) w/ SSL
       termination and proxying https. 
       The bigger issue however is that the inter-node communication is not just restricted to http[s], it could well be another service 
        which is not really secure on-the-wire. For example, a gear running a service like mongo/mysql/postgres/memcached etc that needs to be accessed 
        remotely by the application server. 

      rmillner mentioned IPSec as a possible solution you can use on this front - which is really neat as that should work generically.  HTH.


From: Mike McGrath <mmcgrath redhat com>
To: meghdoot bhattacharya <meghdoot_b yahoo com>
Cc: "dev lists openshift redhat com" <dev lists openshift redhat com>
Sent: Wednesday, January 9, 2013 10:03 AM
Subject: Re: questions on haproxy and front proxy layer

On Tue, 8 Jan 2013, meghdoot bhattacharya wrote:

> Hi,
>      I had a few follow up questions based on my observations running openshift on Fedora 17 with broker/node running in same host [as per Krishna's blog post].
> If you folks can clarify and give your comments as seem fit it would be great. I am looking at deploying it on premise.
> 1. Creating multiple scaled apps result in dedicated haproxy load balancer per scaled app. However, both the apps had apache mod_proxy act as the front end proxy against their named virtual host? In this
> fashion the effects of dedicated haproxy is greatly reduced given that we are sharing the traffic in front proxy layer? Or in multi node scenario, would there be only one haproxy loadbalancer  per node?

Every application would get its own dedicated haproxy setup for security
and resource purposes.

> 2. Haproxy load balancer is using haproxy port proxy to reach to the apps directly running in different gears. So, in that case what is the benefit of running apache in those nodes with a named virtual
> host of the gear other than to bypass haproxy for debug purpose? I messed with the named virtual host name and both haproxy and haproxy stats page worked fine proving apache was not in play. The intent
> is not to run that apache in production unless that node may be also hosting a haproxy load balancer itself. Hmm...

To ensure changes to my haproxy balancer do not impact other applications
on restart, via resource constraints, etc.

> 3. On premise lets say there is just one domain www.yyy.com and we really may not need dns support. And as in our case we have multi level hardware load balancers already in place. In that set up,
> ideally I want to kill at least one proxy layer, maybe the apache proxy layer? Can you guys suggest how will that setup look? Can instead of one haproxy load balancer in a gear can  there be multiple
> clone haproxy gears and F5 loadbalances to haproxies which then routes to the nodes. These are secondary level F5 that we can kill in future but initially how we can work while keeping them?

You can always get a list of the slave gears from the haproxy gears and
bypass two proxy layers by contacting the gears directly.

> 4. Following up on the last question if we keep all three proxy layers, F5, front proxy [or middle in this case] and haproxy.... does F5 say load balance between multiple apache(or even nginx) which then
> points to haproxy... In that set up also front proxy might have to use external IP/port of haproxy and use haproxy port proxy to hit the haproxy web balancer gear.
> I guess I am struggling to figure out the best set up in questions 3 and 4. We can modify openshift as necessary but getting some comments will definitely help

You're welcome to figure out what works for you in this scenario.  Don't
feel like you have to constrain yourself to just what OpenShift provides
it's a tool and you can use it in the best way for you.

Having said that, the front end balancer / HA haproxy layer is something
that's been requested by several users and parts would be useful in
OpenShift Online.  We're trying to take feedback we've gotten and come up
with a solution that works best for everyone (or possibly provide
different options for people that have different needs).

> 5. Haproxy latest version is now supporting SSL termination. Does that change anything in openshift? So, we dont need an apache front proxy to SSL terminate say...

It doesn't today but we're tracking what will be useful here.


> Thx,
> --Meghdoot

dev mailing list
dev lists openshift redhat com

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]